Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > read text file byte by byte

Reply
Thread Tools

read text file byte by byte

 
 
sjdevnull@yahoo.com
Guest
Posts: n/a
 
      12-14-2009
On Dec 13, 5:56*pm, "Rhodri James" <(E-Mail Removed)>
wrote:
> On Sun, 13 Dec 2009 06:44:54 -0000, Steven D'Aprano *
>
> <(E-Mail Removed)> wrote:
> > On Sat, 12 Dec 2009 22:15:50 -0800, daved170 wrote:

>
> >> Thank you all.
> >> Dennis I really liked you solution for the issue but I have two question
> >> about it:
> >> 1) My origin file is Text file and not binary

>
> > That's a statement, not a question.

>
> >> 2) I need to read each time 1 byte.

>
> > f = open(filename, 'r') *# open in text mode
> > f.read(1) *# read one byte

>
> The OP hasn't told us what version of Python he's using on what OS. *On *
> Windows, text mode will compress the end-of-line sequence into a single *
> "\n". *In Python 3.x, f.read(1) will read one character, which may be more *
> than one byte depending on the encoding.


The 3.1 documentation specifies that file.read returns bytes:

file.read([size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as
a string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire
as close to size bytes as possible. Also note that when in non-
blocking mode, less data than was requested may be returned, even if
no size parameter was given.

Does it need fixing?
 
Reply With Quote
 
 
 
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      12-14-2009
On Sun, 13 Dec 2009 22:56:55 -0800 (PST), "(E-Mail Removed)"
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:


>
> The 3.1 documentation specifies that file.read returns bytes:
>
> file.read([size])
> Read at most size bytes from the file (less if the read hits EOF
> before obtaining size bytes). If the size argument is negative or
> omitted, read all data until EOF is reached. The bytes are returned as
> a string object. An empty string is returned when EOF is encountered
> immediately. (For certain files, like ttys, it makes sense to continue
> reading after an EOF is hit.) Note that this method may call the
> underlying C function fread() more than once in an effort to acquire
> as close to size bytes as possible. Also note that when in non-
> blocking mode, less data than was requested may be returned, even if
> no size parameter was given.
>
> Does it need fixing?


I'm still running 2.5 (Maybe next spring I'll see if all the third
party libraries I have exist in 2.6 versions)... BUT...

"... are returned as a string object..." Aren't "strings" in 3.x now
unicode? Which would imply, to me, that the interpretation of the
contents will not be plain bytes.
--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
 
 
 
sjdevnull@yahoo.com
Guest
Posts: n/a
 
      12-14-2009
On Dec 14, 1:57*pm, Dennis Lee Bieber <(E-Mail Removed)> wrote:
> On Sun, 13 Dec 2009 22:56:55 -0800 (PST), "(E-Mail Removed)"
> <(E-Mail Removed)> declaimed the following in
> gmane.comp.python.general:
>
>
>
>
>
>
>
> > The 3.1 documentation specifies that file.read returns bytes:

>
> > file.read([size])
> > * * Read at most size bytes from the file (less if the read hits EOF
> > before obtaining size bytes). If the size argument is negative or
> > omitted, read all data until EOF is reached. The bytes are returned as
> > a string object. An empty string is returned when EOF is encountered
> > immediately. (For certain files, like ttys, it makes sense to continue
> > reading after an EOF is hit.) Note that this method may call the
> > underlying C function fread() more than once in an effort to acquire
> > as close to size bytes as possible. Also note that when in non-
> > blocking mode, less data than was requested may be returned, even if
> > no size parameter was given.

>
> > Does it need fixing?

>
> * * * * I'm still running 2.5 (Maybe next spring I'll see if all the third
> party libraries I have exist in 2.6 versions)... BUT...
>
> * * * * "... are returned as a string object..." Aren't "strings" in 3.x now
> unicode? Which would imply, to me, that the interpretation of the
> contents will not be plain bytes.


I'm not even concerned (yet) about how the data is interpreted after
it's read. First I'm trying to clarify what exactly gets read.

The post I was replying to said "In Python 3.x, f.read(1) will read
one character, which may be more than one byte depending on the
encoding."

That seems at odds with the documentation saying "Read at most size
bytes from the file"--the fact that it's documented to read "size"
bytes rather than "size" (possibly multibyte) characters is emphasized
by the later language saying that the underlying C fread() call may be
called enough times to read as close to size bytes as possible.

If the poster I was replying to is correct, it seems like a
documentation update is in order. As a long-time programmer, I would
be very surprised to make a call to f.read(X) and have it return more
than X bytes if I hadn't read this here.
 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      12-14-2009
On Sun, 13 Dec 2009 22:56:55 -0800, (E-Mail Removed) wrote:

> The 3.1 documentation specifies that file.read returns bytes:


> Does it need fixing?


There are no file objects in 3.x. The file() function no longer
exists. The return value from open(), will be an instance of
_io.<something> depending upon the mode, e.g. _io.TextIOWrapper for 'r',
_io.BufferedReader for 'rb', _io.BufferedRandom for 'w+b', etc.

http://docs.python.org/3.1/library/io.html

io.IOBase.read() doesn't exist, io.RawIOBase.read(n) reads n bytes,
io.TextIOBase.read(n) reads n characters.


 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      12-14-2009
On Mon, 14 Dec 2009 03:14:11 +0000, MRAB wrote:

>>> You originally stated that you want to "scramble" the bytes -- if
>>> you mean to implement some sort of encryption algorithm you should know
>>> that most of them work in blocks as the "key" is longer than one byte.

>>
>> Block ciphers work in blocks. Stream ciphers work on bytes, regardless of
>> the length of the key.
>>

> It's still more efficient to read in blocks, even if you're going to
> process the bytes one at a time.


That's fine for a file. If you're reading from a pipe, socket, etc, you
typically want to take what you can get when you can get it (although this
is easier said than done in Python), rather than waiting for a complete
"block". This is often a primary reason for choosing a stream cipher over
a block cipher, as it eliminates the need to add and remove padding for
intermittent data flows.

 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      12-15-2009
En Mon, 14 Dec 2009 18:09:52 -0300, Nobody <(E-Mail Removed)> escribi:
> On Sun, 13 Dec 2009 22:56:55 -0800, (E-Mail Removed) wrote:
>
>> The 3.1 documentation specifies that file.read returns bytes:

>
>> Does it need fixing?

>
> There are no file objects in 3.x. The file() function no longer
> exists. The return value from open(), will be an instance of
> _io.<something> depending upon the mode, e.g. _io.TextIOWrapper for 'r',
> _io.BufferedReader for 'rb', _io.BufferedRandom for 'w+b', etc.
>
> http://docs.python.org/3.1/library/io.html
>
> io.IOBase.read() doesn't exist, io.RawIOBase.read(n) reads n bytes,
> io.TextIOBase.read(n) reads n characters.


So basically this section [1] should not exist, or be completely rewritten?
At least the references to C stdio library seem wrong to me.

[1] http://docs.python.org/3.1/library/s...l#file-objects

--
Gabriel Genellina

 
Reply With Quote
 
sjdevnull@yahoo.com
Guest
Posts: n/a
 
      12-15-2009
On Dec 14, 4:09*pm, Nobody <(E-Mail Removed)> wrote:
> On Sun, 13 Dec 2009 22:56:55 -0800, (E-Mail Removed) wrote:
> > The 3.1 documentation specifies that file.read returns bytes:
> > Does it need fixing?

>
> There are no file objects in 3.x.


Then the documentation definitely needs fixing; the excerpt I posted
earlier is from the 3.1 documentation's section about file objects:
http://docs.python.org/3.1/library/s...l#file-objects

Which begins:

"5.9 File Objects

File objects are implemented using Cs stdio package and can be
created with the built-in open() function. File objects are also
returned by some other built-in functions and methods, such as os.popen
() and os.fdopen() and the makefile() method of socket objects."

(It goes on to describe the read method's operation on bytes that I
quoted upthread.)

Sadly I'm not familiar enough with 3.x to suggest an appropriate edit.
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      12-15-2009
On 12/14/2009 7:37 PM, Gabriel Genellina wrote:
> En Mon, 14 Dec 2009 18:09:52 -0300, Nobody <(E-Mail Removed)> escribi:
>> On Sun, 13 Dec 2009 22:56:55 -0800, (E-Mail Removed) wrote:
>>
>>> The 3.1 documentation specifies that file.read returns bytes:

>>
>>> Does it need fixing?

>>
>> There are no file objects in 3.x. The file() function no longer
>> exists. The return value from open(), will be an instance of
>> _io.<something> depending upon the mode, e.g. _io.TextIOWrapper for 'r',
>> _io.BufferedReader for 'rb', _io.BufferedRandom for 'w+b', etc.
>>
>> http://docs.python.org/3.1/library/io.html
>>
>> io.IOBase.read() doesn't exist, io.RawIOBase.read(n) reads n bytes,
>> io.TextIOBase.read(n) reads n characters.

>
> So basically this section [1] should not exist, or be completely rewritten?
> At least the references to C stdio library seem wrong to me.
>
> [1] http://docs.python.org/3.1/library/s...l#file-objects


I agree.
http://bugs.python.org/issue7508

Terry Jan Reedy




 
Reply With Quote
 
daved170
Guest
Posts: n/a
 
      12-15-2009
On 13 דצמבר, 22:39, Dennis Lee Bieber <(E-Mail Removed)> wrote:
> On Sat, 12 Dec 2009 22:15:50 -0800 (PST), daved170 <(E-Mail Removed)>
> declaimed the following in gmane.comp.python.general:
>
> > Thank you all.
> > Dennis I really liked you solution for the issue but I have two
> > question about it:
> > 1) My origin file is Text file and not binary

>
> * * * * Do you need to process the bytes in the file as they are? Or do you
> accept changes in line-endings (M$ Windows "text" files use <cr><lf> as
> line ending, but if you read it in Python as "text" <cr><lf> is
> converted to a single <lf>.
>
> > 2) I need to read each time 1 byte. I didn't see that on your example
> > code.

>
> * * * * You've never explained why you need to READ 1 byte at a time, vs
> reading a block (I chose 1KB) and processing each byte IN THE BLOCK.
> After all, if you do use 1 byte I/O, your program is going to be very
> slow, as each read is blocking (suspends) while asking the O/S for the
> next character in the file (this depends upon the underlying I/O library
> implementation -- I suspect any modern I/O system is still reading some
> block size [256 to 4K] and then returning parts of that block as
> needed). OTOH, reading a block at a time makes for one suspension and
> then a lot of data to be processed however you want.
>
> * * * * You originally stated that you want to "scramble" the bytes -- if
> you mean to implement some sort of encryption algorithm you should know
> that most of them work in blocks as the "key" is longer than one byte.
>
> * * * * My sample reads in chunks, then the scramble function XORs each byte
> with the corresponding byte in the supplied key string, finally
> rejoining all the now individual bytes into a single chunk for
> subsequent output.
> --
> * * * * Wulfraed * * * * Dennis Lee Bieber * * * * * * * KD6MOG
> * * * * (E-Mail Removed) * * *HTTP://wlfraed.home.netcom.com/


Hi All,
As I read again your comments and the codes you posted I realize that
I was mistaken.
I don't need to read the file byte by byte. you all right. I do need
to scramble each byte. So I'll do as you said - I'll read blocks and
scramble each byte in the block.
And now for my last question in this subject.
Lets say that my file contains the following line: "Hello World".
I read it using the read(1024) as you suggested in your sample.
Now, how can I XOR it with 0xFF for example?
Thanks again
Dave
 
Reply With Quote
 
sjdevnull@yahoo.com
Guest
Posts: n/a
 
      12-15-2009
On Dec 14, 11:44*pm, Terry Reedy <(E-Mail Removed)> wrote:
> On 12/14/2009 7:37 PM, Gabriel Genellina wrote:
>
>
>
> > En Mon, 14 Dec 2009 18:09:52 -0300, Nobody <(E-Mail Removed)> escribi:
> >> On Sun, 13 Dec 2009 22:56:55 -0800, (E-Mail Removed) wrote:

>
> >>> The 3.1 documentation specifies that file.read returns bytes:

>
> >>> Does it need fixing?

>
> >> There are no file objects in 3.x. The file() function no longer
> >> exists. The return value from open(), will be an instance of
> >> _io.<something> depending upon the mode, e.g. _io.TextIOWrapper for 'r',
> >> _io.BufferedReader for 'rb', _io.BufferedRandom for 'w+b', etc.

>
> >>http://docs.python.org/3.1/library/io.html

>
> >> io.IOBase.read() doesn't exist, io.RawIOBase.read(n) reads n bytes,
> >> io.TextIOBase.read(n) reads n characters.

>
> > So basically this section [1] should not exist, or be completely rewritten?
> > At least the references to C stdio library seem wrong to me.

>
> > [1]http://docs.python.org/3.1/library/stdtypes.html#file-objects

>
> I agree.http://bugs.python.org/issue7508
>
> Terry Jan Reedy


Thanks, Terry.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
File.read(fname) vs. File.read(fname,File.size(fname)) Alex Dowad Ruby 4 05-01-2010 08:20 AM
Read a file line by line and write each line to a file based on the5th byte scad C++ 23 05-17-2009 06:11 PM
how to read/write a characters stream which is either of one byte/2 byte Deep C Programming 6 02-28-2007 01:03 PM
read/write data byte-per-byte to and from a socket crash.test.dummy Java 1 02-17-2006 06:18 AM
Read Text File and split them to individual text file Krish ASP .Net 1 10-20-2005 03:39 PM



Advertisments