Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > String Fomat Conversion

Reply
Thread Tools

String Fomat Conversion

 
 
mcg
Guest
Posts: n/a
 
      01-27-2005
Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6


Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003

 
Reply With Quote
 
 
 
 
Stephen Thorne
Guest
Posts: n/a
 
      01-27-2005
On 26 Jan 2005 20:53:02 -0800, mcg <(E-Mail Removed)> wrote:
> Investigating python day 1:
>
> Data in file:
> x y
> 1 2
> 3 4
> 5 6
>
> Want to read file into an array of pairs.
>
> in c: scanf("%d %d",&x,&y)---store x y in array, loop.
>
> How do I do this in python??
> In the actual application, the pairs are floating pt i.e. -1.003


f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Regards,
Stephen Thorne
 
Reply With Quote
 
 
 
 
Steven Bethard
Guest
Posts: n/a
 
      01-27-2005
Stephen Thorne wrote:
> f = file('input', 'r')
> labels = f.readline() # consume the first line of the file.
>
> Easy Option:
> for line in f.readlines():
> x, y = line.split()
> x = float(x)
> y = float(y)
>
> Or, more concisely:
> for line in f.readlines():
> x, y = map(float, line.split())


Somewhat more memory efficient:

lines_iter = iter(file('input'))
labels = lines_iter.next()
for line in lines_iter:
x, y = [float(f) for f in line.split()]

By using the iterator instead of readlines, I read only one line from
the file into memory at once, instead of all of them. This may or may
not matter depending on the size of your files, but using iterators is
generally more scalable, though of course it's not always possible.

I also opted to use a list comprehension instead of map, but this is
totally a matter of personal preference -- the performance differences
are probably negligible.

Steve
 
Reply With Quote
 
Dennis Benzinger
Guest
Posts: n/a
 
      01-27-2005
mcg wrote:
> Investigating python day 1:
>
> Data in file:
> x y
> 1 2
> 3 4
> 5 6
>
>
> Want to read file into an array of pairs.
>
> in c: scanf("%d %d",&x,&y)---store x y in array, loop.
>
> How do I do this in python??
> In the actual application, the pairs are floating pt i.e. -1.003
>


Either do what the other posters wrote, or if you really like scanf
try the following Python module:

Scanf --- a pure Python scanf-like module
http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/

Bye,
Dennis
 
Reply With Quote
 
Stephen Thorne
Guest
Posts: n/a
 
      01-27-2005
On Thu, 27 Jan 2005 00:02:45 -0700, Steven Bethard
<(E-Mail Removed)> wrote:
> Stephen Thorne wrote:
> > f = file('input', 'r')
> > labels = f.readline() # consume the first line of the file.
> >
> > Easy Option:
> > for line in f.readlines():
> > x, y = line.split()
> > x = float(x)
> > y = float(y)
> >
> > Or, more concisely:
> > for line in f.readlines():
> > x, y = map(float, line.split())

>
> Somewhat more memory efficient:
>
> lines_iter = iter(file('input'))
> labels = lines_iter.next()
> for line in lines_iter:
> x, y = [float(f) for f in line.split()]
>
> By using the iterator instead of readlines, I read only one line from
> the file into memory at once, instead of all of them. This may or may
> not matter depending on the size of your files, but using iterators is
> generally more scalable, though of course it's not always possible.


I just did a teensy test. All three options used exactly the same
amount of total memory.

I did all I did in the name of clarity, considering the OP was on his
first day with python. How I would actually write it would be:

inputfile = file('input','r')
inputfile.readline()
data = [map(float, line.split()) for line in inputfile]

Notice how you don't have to call iter() on it, you can treat it as an
iterable to begin with.

Stephen.
 
Reply With Quote
 
Steven Bethard
Guest
Posts: n/a
 
      01-27-2005
Stephen Thorne wrote:
> I did all I did in the name of clarity, considering the OP was on his
> first day with python. How I would actually write it would be:
>
> inputfile = file('input','r')
> inputfile.readline()
> data = [map(float, line.split()) for line in inputfile]
>
> Notice how you don't have to call iter() on it, you can treat it as an
> iterable to begin with.


Beware of mixing iterator methods and readline:

http://docs.python.org/lib/bltin-file-objects.html

next( )
...In order to make a for loop the most efficient way of looping
over the lines of a file (a very common operation), the next() method
uses a hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline()) does
not work right.

I haven't tested your code in particular, but this warning was enough to
make me generally avoid mixing iter methods and other methods.

Steve
 
Reply With Quote
 
Alex Martelli
Guest
Posts: n/a
 
      01-27-2005
Steven Bethard <(E-Mail Removed)> wrote:
...
> Beware of mixing iterator methods and readline:


_mixing_, yes. But -- starting the iteration after some other kind of
reading (readline, or read(N), etc) -- is OK...


> http://docs.python.org/lib/bltin-file-objects.html
>
> next( )
> ...In order to make a for loop the most efficient way of looping
> over the lines of a file (a very common operation), the next() method
> uses a hidden read-ahead buffer. As a consequence of using a read-ahead
> buffer, combining next() with other file methods (like readline()) does
> not work right.
>
> I haven't tested your code in particular, but this warning was enough to
> make me generally avoid mixing iter methods and other methods.


Yeah, I know... it's hard to explain exactly what IS a problem and what
isn't -- not to mention that this IS to some extent a matter of the file
object's implementation and the docs can't/don't want to constrain the
implementer's future freedom, should it turn out to matter. Sigh.

In the Nutshell (2nd ed), which is not normative and thus gives me a tad
more freedom, I have tried to be a tiny bit more specific, taking
advantage, also, of the fact that I'm now addressing the 2.3 and 2.4
implementations, only. Quoting from my current draft (pardon the XML
markup...):

"""
interrupting such a loop prematurely (e.g., with <c>break</c>), or
calling <r>f</r><c>.next()</c> instead of <r>f</r><c>.readline()</c>,
leaves the file's current position at an arbitrary value. If you want
to switch from using <r>f</r> as an iterator to calling other reading
methods on <r>f</r>, be sure to set the file's current position to a
known value by appropriately calling <r>f</r><c>.seek</c>.
"""

I hope this concisely indicates that the problem (in today's current
implementations) is only with switching FROM iteration TO other
approaches to reading, and (if the file is seekable) there's nothing so
problematic here that a good old 'seek' won't cure...


Alex
 
Reply With Quote
 
Steven Bethard
Guest
Posts: n/a
 
      01-27-2005
Alex Martelli wrote:
> Steven Bethard <(E-Mail Removed)> wrote:
> ...
>
>>Beware of mixing iterator methods and readline:

>

[snip]
>
> I hope this concisely indicates that the problem (in today's current
> implementations) is only with switching FROM iteration TO other
> approaches to reading, and (if the file is seekable) there's nothing so
> problematic here that a good old 'seek' won't cure...


Thanks for the clarification!

Steve
 
Reply With Quote
 
Jeff Shannon
Guest
Posts: n/a
 
      01-27-2005
Stephen Thorne wrote:

> On Thu, 27 Jan 2005 00:02:45 -0700, Steven Bethard
> <(E-Mail Removed)> wrote:
>
>>By using the iterator instead of readlines, I read only one line from
>>the file into memory at once, instead of all of them. This may or may
>>not matter depending on the size of your files, but using iterators is
>>generally more scalable, though of course it's not always possible.

>
> I just did a teensy test. All three options used exactly the same
> amount of total memory.


I would presume that, for a small file, the entire contents of the
file will be sucked into the read buffer implemented by the underlying
C file library. An iterator will only really save memory consumption
when the file size is greater than that buffer's size.

Actually, now that I think of it, there's probably another copy of the
data at Python level. For readlines(), that copy is the list object
itself. For iter and iter.next(), it's in the iterator's read-ahead
buffer. So perhaps memory savings will occur when *that* buffer size
is exceeded. It's also quite possible that both buffers are the same
size...

Anyhow, I'm sure that the fact that they use the same size for your
test is a reflection of buffering. The next question is, which
provides the most *conceptual* simplicity? (The answer to that one, I
think, depends on how your brain happens to see things...)

Jeff Shannon
Technician/Programmer
Credit International

 
Reply With Quote
 
enigma
Guest
Posts: n/a
 
      01-27-2005
Do you really need to use the iter function here? As far as I can
tell, a file object is already an iterator. The file object
documentation says that, "[a] file object is its own iterator, for
example iter(f) returns f (unless f is closed)." It doesn't look like
it makes a difference one way or the other, I'm just curious.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Whatis the best file fomat tgcans1a2007 Digital Photography 10 09-23-2007 08:02 AM
DVD fomat brenda Computer Support 9 09-19-2007 09:47 PM
Re-fomat Hdd & xp Home jb@home.co.uk Computer Support 6 01-09-2006 06:39 PM
Fomat C partitiion Vikrushn@gmail.com Computer Support 2 01-01-2006 10:05 AM
New exam fomat question with 70-292 and 296 L C MCSE 3 07-28-2004 05:20 PM



Advertisments