Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > best(fastest) way to send and get lists from files

Reply
Thread Tools

best(fastest) way to send and get lists from files

 
 
Abrahams, Max
Guest
Posts: n/a
 
      01-31-2008

I've looked into pickle, dump, load, save, readlines(), etc.

Which is the best method? Fastest? My lists tend to be around a thousand to a million items.

Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.

thanks
 
Reply With Quote
 
 
 
 
Yu-Xi Lim
Guest
Posts: n/a
 
      01-31-2008
Abrahams, Max wrote:
> I've looked into pickle, dump, load, save, readlines(), etc.
>
> Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
>
> Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.
>
> thanks


1) Why don't you time them with the timeit module?
http://docs.python.org/lib/module-timeit.html

Results will vary with the specific data you have, and your hardware
speed, but if it's a lot of data, it's most likely going to be the
latter that's the bottleneck. A compact binary format will help
alleviate this.

If you're reading a lot of data into memory, you might have to deal with
your OS swap/virtual memory.

2) "Best" depends on what your data is and what you're doing with it.

Are you reinventing a flat-file database? There are better solutions for
databases.

If you're just reformatting data to pass to another program, say, for
scientific computation, the portability may be more of an issue. Number
crunching the resultant data may be even more time consuming such that
the time spent writing/reading it becomes insignificant.
 
Reply With Quote
 
 
 
 
Paddy
Guest
Posts: n/a
 
      02-01-2008
On Jan 31, 7:34 pm, "Abrahams, Max" <(E-Mail Removed)> wrote:
> I've looked into pickle, dump, load, save, readlines(), etc

I've used the following sometimes:

from pprint import pprint as pp
print "data = \\"
pp(data)

That created a python file that could be read as a module, but there
are limitations on the __repr__ of the data.

- Paddy.
P.S. I never timed it - it was fast enough, and the data was readable.

 
Reply With Quote
 
Nick Craig-Wood
Guest
Posts: n/a
 
      02-05-2008
Abrahams, Max <(E-Mail Removed)> wrote:
>
> I've looked into pickle, dump, load, save, readlines(), etc.
>
> Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
>
> Binary and text files are both okay, text would be preferred in
> general unless there's a significant speed boost from something
> binary.


You could try the marshal module which is very vast, lightweight and
built in.

http://www.python.org/doc/current/li...e-marshal.html

It makes a binary format though, and it will only dump "simple"
objects - see the page above. It is what python uses internally to
make .pyc files from .py I believe.

------------------------------------------------------------
#!/usr/bin/python

import os
from marshal import dump, load
from timeit import Timer

def write(N, file_name = "z.marshal"):
L = range(N)
out = open(file_name, "wb")
dump(L, out)
out.close()
print "Written %d bytes for list size %d" % (os.path.getsize(file_name), N)

def read(N):
inp = open("z.marshal", "rb")
L = load(inp)
inp.close()
assert len(L) == N

for log_N in range(7):
N = 10**log_N
loops = 10
write(N)
print "Read back %d items in" % N, Timer("read(%d)" % N, "from __main__ import read").repeat(1, loops)[0]/loops, "s"
------------------------------------------------------------

Produces

$ ./test-marshal.py
Written 10 bytes for list size 1
Read back 1 items in 4.14133071899e-05 s
Written 55 bytes for list size 10
Read back 10 items in 4.31060791016e-05 s
Written 505 bytes for list size 100
Read back 100 items in 8.23020935059e-05 s
Written 5005 bytes for list size 1000
Read back 1000 items in 0.000352478027344 s
Written 50005 bytes for list size 10000
Read back 10000 items in 0.00165479183197 s
Written 500005 bytes for list size 100000
Read back 100000 items in 0.0175776958466 s
Written 5000005 bytes for list size 1000000
Read back 1000000 items in 0.175704598427 s

--
Nick Craig-Wood <(E-Mail Removed)> -- http://www.craig-wood.com/nick
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
common elements between list of lists and lists antar2 Python 2 07-17-2008 09:19 AM
Bug with lists of pairs of lists and append() Gabriel Zachmann Python 5 10-01-2007 09:39 AM
Bug with lists of pairs of lists and append() Gabriel Zachmann Python 2 10-01-2007 09:37 AM
Re: Lists of lists and tuples, and finding things within them Daniel Nogradi Python 3 11-10-2006 07:57 AM
List of lists of lists of lists... =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==?= Python 5 05-15-2006 11:47 AM



Advertisments