Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > [Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file

Reply
Thread Tools

[Python3] Reading a binary file and wrtiting the bytes verbatim in an utf-8 file

 
 
Guest
Posts: n/a
 
      04-23-2010
Hello.

I have to read the contents of a binary file (a PNG file exactly), and
dump it into an RTF file.

The RTF-file has been opened with codecs.open in utf-8 mode.

As I expected, the utf-8 decoder chokes on some combinations of bits;
how can I tell python to dump the bytes as they are, without
interpreting them?

Thanks.

--
Fabrice DELENTE
 
Reply With Quote
 
 
 
 
Chris Rebert
Guest
Posts: n/a
 
      04-23-2010
On Fri, Apr 23, 2010 at 9:22 AM, <(E-Mail Removed)-one.org> wrote:
> I have to read the contents of a binary file (a PNG file exactly), and
> dump it into an RTF file.
>
> The RTF-file has been opened with codecs.open in utf-8 mode.
>
> As I expected, the utf-8 decoder


You mean encoder.

> chokes on some combinations of bits;


Well yeah, it's supposed to be getting *characters*, not bytes.

> how can I tell python to dump the bytes as they are, without
> interpreting them?


Go around the encoder and write bytes directly to the file:

# Disclaimer: Completely untested

import codecs

raw_rtf = open("path/to/rtf.rtf", 'w')
png = open("path/to/png.png", 'r')
writer_factory = codecs.getwriter('utf-8')

encoded_rtf = writer_factory(raw_rtf)
encoded_rtf.write(u"whatever text we want") # use unicode
# ...write more text...

# flush buffers
encoded_rtf.reset()
raw_rtf.flush()

raw_rtf.write(png.read()) # write from bytes to bytes

raw_rtf.close()
#END code

I have no idea how you'd go about reading the contents of such a file
in a sensible way.

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
 
 
 
Chris Rebert
Guest
Posts: n/a
 
      04-23-2010
On Fri, Apr 23, 2010 at 9:48 AM, Chris Rebert <(E-Mail Removed)> wrote:
> On Fri, Apr 23, 2010 at 9:22 AM, *<(E-Mail Removed)-one.org> wrote:
>> I have to read the contents of a binary file (a PNG file exactly), and
>> dump it into an RTF file.

<snip>
>> how can I tell python to dump the bytes as they are, without
>> interpreting them?

>
> Go around the encoder and write bytes directly to the file:
>
> # Disclaimer: Completely untested

<snip>
> encoded_rtf.write(u"whatever text we want") # use unicode


Erm, sorry, since you're apparently using Python 3.x, that line should
have been just:

encoded_rtf.write("whatever text we want") # use unicode

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
Guest
Posts: n/a
 
      04-23-2010
Thanks, I'll try this.

> I have no idea how you'd go about reading the contents of such a file
> in a sensible way.


The purpose is to embed PNG pictures in an RTF file that will be read
by OpenOffice. It seems that OpenOffice reads RTF in 8-bit, so it
should be ok.

The RTF is produced from a TeX source file encoded in UTF-8, that's
why I mix unicode and 8-bit.

--
Fabrice DELENTE
 
Reply With Quote
 
Antoine Pitrou
Guest
Posts: n/a
 
      04-25-2010

Hello,

> I have to read the contents of a binary file (a PNG file exactly), and
> dump it into an RTF file.
>
> The RTF-file has been opened with codecs.open in utf-8 mode.


You should use the built-in open() function. codecs.open() is outdated in
Python 3.

> As I expected, the utf-8 decoder chokes on some combinations of bits;
> how can I tell python to dump the bytes as they are, without
> interpreting them?


Well, the one thing you have to be careful about is to flush text buffers
before writing binary data. But, for example:

>>> f = open("TEST", "w", encoding='utf8')
>>> f.write("héhé")

4
>>> f.flush()
>>> f.buffer.write(b"\xff\x00")

2
>>> f.close()


gives you:

$ hexdump -C TEST
00000000 68 c3 a9 68 c3 a9 ff 00 |h..h....|

(utf-8 encoded text and then two raw bytes which are invalid utf-

Another possibility is to open the file in binary mode and do the
encoding yourself when writing text. This might actually be a better
solution, since I'm not sure RTF uses utf-8 by default.

Regards

Antoine.


 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      04-25-2010
Antoine Pitrou, 25.04.2010 02:16:
> Another possibility is to open the file in binary mode and do the
> encoding yourself when writing text. This might actually be a better
> solution, since I'm not sure RTF uses utf-8 by default.


That's a lot cleaner as it doesn't use two interfaces to write to the same
file, and doesn't rely on any specific coordination between those two
interfaces.

Stefan

 
Reply With Quote
 
Guest
Posts: n/a
 
      04-25-2010
> Another possibility is to open the file in binary mode and do the
> encoding yourself when writing text. This might actually be a better
> solution, since I'm not sure RTF uses utf-8 by default.


Yes, thanks for this suggestion, it seems the best to me. Actually RTF
is not UTF-8 encoded, it's 8-bit and maybe even ASCII only. Every
unicode char has to be encoded as an escape sequence (\u2022 for
example).

Thanks again.

--
Fabrice DELENTE
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Skipping bytes while reading a binary file? Lionel Python 2 02-06-2009 01:20 PM
Re: Skipping bytes while reading a binary file? MRAB Python 3 02-05-2009 11:51 PM
Wrtiting to a file in LDIF format dakin999 Perl Misc 5 06-19-2008 05:32 PM
Packet wrtiting problem John Rampling Computer Information 1 04-21-2005 09:28 AM
Wrtiting AVI files with Python JamesT Python 1 07-30-2003 08:01 AM



Advertisments