Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Question about file objects...

Reply
Thread Tools

Question about file objects...

 
 
J
Guest
Posts: n/a
 
      12-02-2009
Something that came up in class...

when you are pulling data from a file using f.next(), the file is read
one line at a time.

What was explained to us is that Python iterates the file based on a
carriage return as the delimiter.
But what if you have a file that has one line of text, but that one
line has 16,000 items that are comma delimited?

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??

Cheers
Jeff

--

Ogden Nash - "The trouble with a kitten is that when it grows up,
it's always a cat." -
http://www.brainyquote.com/quotes/au...gden_nash.html
 
Reply With Quote
 
 
 
 
nn
Guest
Posts: n/a
 
      12-02-2009
On Dec 2, 9:14*am, J <dreadpiratej...@gmail.com> wrote:
> Something that came up in class...
>
> when you are pulling data from a file using f.next(), the file is read
> one line at a time.
>
> What was explained to us is that Python iterates the file based on a
> carriage return as the delimiter.
> But what if you have a file that has one line of text, but that one
> line has 16,000 items that are comma delimited?
>
> Is there a way to read the file, one item at a time, delimited by
> commas WITHOUT having to read all 16,000 items from that one line,
> then split them out into a list or dictionary??
>
> Cheers
> Jeff
>
> --
>
> Ogden Nash *- "The trouble with a kitten is that when it grows up,
> it's always a cat." -http://www.brainyquote.com/quotes/authors/o/ogden_nash.html


File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(<number_of_bytes>) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(",") on the returned string. That should give you a nice
list of everything.
 
Reply With Quote
 
 
 
 
J
Guest
Posts: n/a
 
      12-02-2009
On Wed, Dec 2, 2009 at 09:27, nn <> wrote:
>> Is there a way to read the file, one item at a time, delimited by
>> commas WITHOUT having to read all 16,000 items from that one line,
>> then split them out into a list or dictionary??


> File iteration is a convenience since it is the most common case. If
> everything is on one line, you will have to handle record separators
> manually by using the .read(<number_of_bytes>) method on the file
> object and searching for the comma. If everything fits in memory the
> straightforward way would be to read the whole file with .read() and
> use .split(",") on the returned string. That should give you a nice
> list of everything.


Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...
which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want... and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!

Cheers
Jeff



--

Charles de Gaulle - "The better I get to know men, the more I find
myself loving dogs." -
http://www.brainyquote.com/quotes/au...de_gaulle.html
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      12-02-2009
J wrote:
> On Wed, Dec 2, 2009 at 09:27, nn <> wrote:
>>> Is there a way to read the file, one item at a time, delimited by
>>> commas WITHOUT having to read all 16,000 items from that one line,
>>> then split them out into a list or dictionary??

>
>> File iteration is a convenience since it is the most common case. If
>> everything is on one line, you will have to handle record separators
>> manually by using the .read(<number_of_bytes>) method on the file
>> object and searching for the comma. If everything fits in memory the
>> straightforward way would be to read the whole file with .read() and
>> use .split(",") on the returned string. That should give you a nice
>> list of everything.

>
> Agreed. The confusion came because the guy teaching said that
> iterating the file is delimited by a carriage return character...


If he said exactly that, he is not exactly correct. File iteration looks
for line ending character(s), which depends on the system or universal
newline setting.

> which to me sounds like it's an arbitrary thing that can be changed...
>
> I was already thinking that I'd have to read it in small chunks and
> search for the delimiter i want... and reading the whole file into a
> string and then splitting that would would be nice, until the file is
> so large that it starts taking up significant amounts of memory.
>
> Anyway, thanks both of you for the explanations... I appreciate the help!


I would not be surprised if a generic file chunk generator were posted
somewhere. It would be a good entry for the Python Cookbook, if not
there already.

tjr

 
Reply With Quote
 
r0g
Guest
Posts: n/a
 
      12-03-2009
J wrote:
> Something that came up in class...
>
> when you are pulling data from a file using f.next(), the file is read
> one line at a time.
>
> What was explained to us is that Python iterates the file based on a
> carriage return as the delimiter.
> But what if you have a file that has one line of text, but that one
> line has 16,000 items that are comma delimited?
>
> Is there a way to read the file, one item at a time, delimited by
> commas WITHOUT having to read all 16,000 items from that one line,
> then split them out into a list or dictionary??
>
> Cheers
> Jeff
>



Generators are good way of dealing with that sort of thing...

http://dalkescientific.com/writings/NBN/generators.html

Have the generator read in large chunks from file in binary mode then
use string searching/splitting to dole out records one at a time,
topping up the cache when needed.

Roger.
 
Reply With Quote
 
nn
Guest
Posts: n/a
 
      12-03-2009
On Dec 2, 6:56*pm, Terry Reedy <tjre...@udel.edu> wrote:
> J wrote:
> > On Wed, Dec 2, 2009 at 09:27, nn <prueba...@latinmail.com> wrote:
> >>> Is there a way to read the file, one item at a time, delimited by
> >>> commas WITHOUT having to read all 16,000 items from that one line,
> >>> then split them out into a list or dictionary??

>
> >> File iteration is a convenience since it is the most common case. If
> >> everything is on one line, you will have to handle record separators
> >> manually by using the .read(<number_of_bytes>) method on the file
> >> object and searching for the comma. If everything fits in memory the
> >> straightforward way would be to read the whole file with .read() and
> >> use .split(",") on the returned string. That should give you a nice
> >> list of everything.

>
> > Agreed. The confusion came because the guy teaching said that
> > iterating the file is delimited by a carriage return character...

>
> If he said exactly that, he is not exactly correct. File iteration looks
> for line ending character(s), which depends on the system or universal
> newline setting.
>
> > which to me sounds like it's an arbitrary thing that can be changed...

>
> > I was already thinking that I'd have to read it in small chunks and
> > search for the delimiter i want... *and reading the whole file into a
> > string and then splitting that would would be nice, until the file is
> > so large that it starts taking up significant amounts of memory.

>
> > Anyway, thanks both of you for the explanations... I appreciate the help!

>
> I would not be surprised if a generic file chunk generator were posted
> somewhere. It would be a good entry for the Python Cookbook, if not
> there already.
>
> tjr


There should be but writing one isn't too difficult:

def chunker(file_obj):
parts=['']
while True:
fdata=file_obj.read(8192)
if not fdata: break
parts=(parts[-1]+fdata).split(',')
for col in parts[:-1]:
yield col
yield parts[-1]

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting JPG file ppt file [Powerpoint] file zxcvar Digital Photography 7 06-22-2009 07:54 PM
How to create MDF file (.mdf) file from XML file. Dave ASP .Net 1 06-07-2007 11:32 PM
question about .h file and .cpp file,also compiler question key9 C++ 7 09-13-2006 06:45 PM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM
An Automated process of watching a network file folder, reading a file in it and deleting the file using ASP.NET ? Luis Esteban Valencia Muņoz ASP .Net 3 06-04-2005 10:56 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57