Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to write fast into a file in python?

Reply
Thread Tools

How to write fast into a file in python?

 
 
Steven D'Aprano
Guest
Posts: n/a
 
      05-18-2013
On Fri, 17 May 2013 21:18:15 +0300, Carlos Nepomuceno wrote:

> I thought there would be a call to format method by "'%d\n' % i". It
> seems the % operator is a lot faster than format. I just stopped using
> it because I read it was going to be deprecated. Why replace such a
> great and fast operator by a slow method? I mean, why format is been
> preferred over %?


That is one of the most annoying, pernicious myths about Python, probably
second only to "the GIL makes Python slow" (it actually makes it fast).

String formatting with % is not deprecated. It will not be deprecated, at
least not until Python 4000.

The string format() method has a few advantages: it is more powerful,
consistent and flexible, but it is significantly slower.

Probably the biggest disadvantage to % formatting, and probably the main
reason why it is "discouraged", is that it treats tuples specially.
Consider if x is an arbitrary object, and you call "%s" % x:

py> "%s" % 23 # works
'23'
py> "%s" % [23, 42] # works
'[23, 42]'

and so on for *almost* any object. But if x is a tuple, strange things
happen:

py> "%s" % (23,) # tuple with one item looks like an int
'23'
py> "%s" % (23, 42) # tuple with two items fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting


So when dealing with arbitrary objects that you cannot predict what they
are, it is better to use format.



--
Steven
 
Reply With Quote
 
 
 
 
Chris Angelico
Guest
Posts: n/a
 
      05-18-2013
On Sat, May 18, 2013 at 2:01 PM, Steven D'Aprano
<(E-Mail Removed)> wrote:
> Consider if x is an arbitrary object, and you call "%s" % x:
>
> py> "%s" % 23 # works
> '23'
> py> "%s" % [23, 42] # works
> '[23, 42]'
>
> and so on for *almost* any object. But if x is a tuple, strange things
> happen


Which can be guarded against by wrapping it up in a tuple. All you're
seeing is that the shortcut notation for a single parameter can't
handle tuples.

>>> def show(x):

return "%s" % (x,)

>>> show(23)

'23'
>>> show((23,))

'(23,)'
>>> show([23,42])

'[23, 42]'

One of the biggest differences between %-formatting and str.format is
that one is an operator and the other a method. The operator is always
going to be faster, but the method can give more flexibility (not that
I've ever needed or wanted to override anything).

>>> def show_format(x):

return "{}".format(x) # Same thing using str.format
>>> dis.dis(show)

2 0 LOAD_CONST 1 ('%s')
3 LOAD_FAST 0 (x)
6 BUILD_TUPLE 1
9 BINARY_MODULO
10 RETURN_VALUE
>>> dis.dis(show_format)

2 0 LOAD_CONST 1 ('{}')
3 LOAD_ATTR 0 (format)
6 LOAD_FAST 0 (x)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 RETURN_VALUE

Attribute lookup and function call versus binary operator. Potentially
a lot of flexibility, versus basically hard-coded functionality. But
has anyone ever actually made use of it?

str.format does have some cleaner features, like naming of parameters:

>>> "{foo} vs {bar}".format(foo=1,bar=2)

'1 vs 2'
>>> "%(foo)s vs %(bar)s"%{'foo':1,'bar':2}

'1 vs 2'

Extremely handy when you're working with hugely complex format
strings, and the syntax feels a bit clunky in % (also, it's not
portable to other languages, which is one of %-formatting's biggest
features). Not a huge deal, but if you're doing a lot with that, it
might be a deciding vote.

ChrisA
 
Reply With Quote
 
 
 
 
Fbio Santos
Guest
Posts: n/a
 
      05-18-2013
On 17 May 2013 19:38, "Carlos Nepomuceno" <(E-Mail Removed)>
wrote:
>
> Think the following update will make the code more portable:
>
> x += len(line)+len(os.linesep)-1
>
> Not sure if it's the fastest way to achieve that. :/
>


Putting len(os.linesep)'s value into a local variable will make accessing
it quite a bit faster. But why would you want to do that?

You mentioned "\n" translating to two lines, but this won't happen. Windows
will not mess with what you write to your file. It's just that
traditionally windows and windows programs use \r\n instead of just \n. I
think it was for compatibility with os/2 or macintosh (I don't remember
which), which used \r for newlines.

You don't have to follow this convention. If you open a \n-separated file
with *any* text editor other than notepad, your newlines will be okay.

 
Reply With Quote
 
88888 Dihedral
Guest
Posts: n/a
 
      05-18-2013
Steven D'Aprano於 2013年5月18日星期*UTC+8下午12時01分13 寫道:
> On Fri, 17 May 2013 21:18:15 +0300, Carlos Nepomuceno wrote:
>
>
>
> > I thought there would be a call to format method by "'%d\n' % i". It

>
> > seems the % operator is a lot faster than format. I just stopped using

>
> > it because I read it was going to be deprecated. Why replace such a

>
> > great and fast operator by a slow method? I mean, why format is been

>
> > preferred over %?

>
>
>
> That is one of the most annoying, pernicious myths about Python, probably
>
> second only to "the GIL makes Python slow" (it actually makes it fast).
>
>


The print function in python is designed to print
any printable object with a valid string representation.

The format part of the print function has to construct
a printable string according to the format string
and the variables passed in on the fly.

If the acctual string to be printed in the format processing
is obtained in the high level OOP way , then it is definitely
slow due to the high level overheads and generality requirements.





 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      05-18-2013
On Sat, May 18, 2013 at 5:49 PM, Fbio Santos <(E-Mail Removed)> wrote:
> Putting len(os.linesep)'s value into a local variable will make accessingit
> quite a bit faster. But why would you want to do that?
>
> You mentioned "\n" translating to two lines, but this won't happen. Windows
> will not mess with what you write to your file. It's just that traditionally
> windows and windows programs use \r\n instead of just \n. I think it was for
> compatibility with os/2 or macintosh (I don't remember which), which used\r
> for newlines.
>
> You don't have to follow this convention. If you open a \n-separated file
> with *any* text editor other than notepad, your newlines will be okay.



Into two characters, not two lines, but yes. A file opened in text
mode on Windows will have its lines terminated with two characters.
(And it's old Macs that used to use \r. OS/2 follows the DOS
convention of \r\n, but again, many apps these days are happy with
Unix newlines there too.)

ChrisA
 
Reply With Quote
 
Carlos Nepomuceno
Guest
Posts: n/a
 
      05-18-2013
Python really writes '\n\r' on Windows. Just check the files.

Internal representations only keep '\n' for simplicity, but if you wanna keep track of the file length you have to take that into account.

________________________________
> Date: Sat, 18 May 2013 08:49:55 +0100
> Subject: RE: How to write fast into a file in python?
> From: http://www.velocityreviews.com/forums/(E-Mail Removed)
> To: (E-Mail Removed)
> CC: (E-Mail Removed)
>
>
> On 17 May 2013 19:38, "Carlos Nepomuceno"
> <(E-Mail Removed)<mailto:carlosnepomuc (E-Mail Removed)>>
> wrote:
> >
> > Think the following update will make the code more portable:
> >
> > x += len(line)+len(os.linesep)-1
> >
> > Not sure if it's the fastest way to achieve that. :/
> >

>
> Putting len(os.linesep)'s value into a local variable will make
> accessing it quite a bit faster. But why would you want to do that?
>
> You mentioned "\n" translating to two lines, but this won't happen.
> Windows will not mess with what you write to your file. It's just that
> traditionally windows and windows programs use \r\n instead of just \n.
> I think it was for compatibility with os/2 or macintosh (I don't
> remember which), which used \r for newlines.
>
> You don't have to follow this convention. If you open a \n-separated
> file with *any* text editor other than notepad, your newlines will be
> okay.
 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      05-18-2013
tOn Sat, 18 May 2013 08:49:55 +0100, Fbio Santos
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:


> You mentioned "\n" translating to two lines, but this won't happen. Windows
> will not mess with what you write to your file. It's just that
> traditionally windows and windows programs use \r\n instead of just \n. I
> think it was for compatibility with os/2 or macintosh (I don't remember
> which), which used \r for newlines.
>

Neither... It goes back to Teletype machines where one sent a
carriage return to move the printhead back to the left, then sent a line
feed to advance the paper (while the head was still moving left), and in
some cases also provided a rub-out character (a do-nothing) to add an
additional character time delay.

TRS-80 Mod 1-4 used <cr> for "new line", I believe Apple used <lf>
for "new line"... And both lost the ability to move down the page
without also resetting the carriage to the left. In a world where both
<cr><lf> is used, one could draw a vertical line of | by just spacing
across the first line, printing |, then repeat <lf><bkspc>| until done.
To do the same with conventional <lf> is "new line/return" one has to
transmit all those spaces for each line...

At 300baud, that took time....


--
Wulfraed Dennis Lee Bieber AF6VN
(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      05-18-2013
In article <(E-Mail Removed)>,
Dennis Lee Bieber <(E-Mail Removed)> wrote:

> tOn Sat, 18 May 2013 08:49:55 +0100, Fbio Santos
> <(E-Mail Removed)> declaimed the following in
> gmane.comp.python.general:
>
>
> > You mentioned "\n" translating to two lines, but this won't happen. Windows
> > will not mess with what you write to your file. It's just that
> > traditionally windows and windows programs use \r\n instead of just \n. I
> > think it was for compatibility with os/2 or macintosh (I don't remember
> > which), which used \r for newlines.
> >

> Neither... It goes back to Teletype machines where one sent a
> carriage return to move the printhead back to the left, then sent a line
> feed to advance the paper (while the head was still moving left), and in
> some cases also provided a rub-out character (a do-nothing) to add an
> additional character time delay.


The delay was important. It took more than one character time for the
print head to get back to the left margin. If you kept sending
printable characters while the print head was still flying back, they
would get printed in the middle of the line (perhaps blurred a little).

There was also a dashpot which cushioned the head assembly when it
reached the left margin. Depending on how well adjusted things were
this might take another character time or two to fully settle down.

You can still see the remnants of this in modern Unix systems:

$ stty -a
speed 9600 baud; rows 40; columns 136; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2
= M-^?; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z;
rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon
-ixoff -iuclc ixany imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0
vt0 ff0
isig icanon iexten echo echoe -echok -echonl -noflsh -xcase -tostop
-echoprt echoctl echoke

The "nl0" and "cr0" mean it's configured to insert 0 delay after
newlines and carriage returns. Whether setting a non-zero delay
actually does anything useful anymore is an open question, but the
drivers still accept the settings.
 
Reply With Quote
 
Dan Stromberg
Guest
Posts: n/a
 
      05-18-2013
With CPython 2.7.3:
../t
time taken to write a file of size 52428800 is 15.86 seconds

time taken to write a file of size 52428800 is 7.91 seconds

time taken to write a file of size 52428800 is 9.64 seconds


With pypy-1.9:
../t
time taken to write a file of size 52428800 is 3.708232 seconds

time taken to write a file of size 52428800 is 4.868304 seconds

time taken to write a file of size 52428800 is 1.93612 seconds


Here's the code:
#!/usr/local/pypy-1.9/bin/pypy
#!/usr/bin/python

import sys
import time
import StringIO

sys.path.insert(0, '/usr/local/lib')
import bufsock

def create_file_numbers_old(filename, size):
start = time.clock()

value = 0
with open(filename, "w") as f:
while f.tell() < size:
f.write("{0}\n".format(value))
value += 1

end = time.clock()

print "time taken to write a file of size", size, " is ", (end -start),
"seconds \n"

def create_file_numbers_bufsock(filename, intended_size):
start = time.clock()

value = 0
with open(filename, "w") as f:
bs = bufsock.bufsock(f)
actual_size = 0
while actual_size < intended_size:
string = "{0}\n".format(value)
actual_size += len(string) + 1
bs.write(string)
value += 1
bs.flush()

end = time.clock()

print "time taken to write a file of size", intended_size, " is ", (end
-start), "seconds \n"


def create_file_numbers_file_like(filename, intended_size):
start = time.clock()

value = 0
with open(filename, "w") as f:
file_like = StringIO.StringIO()
actual_size = 0
while actual_size < intended_size:
string = "{0}\n".format(value)
actual_size += len(string) + 1
file_like.write(string)
value += 1
file_like.seek(0)
f.write(file_like.read())

end = time.clock()

print "time taken to write a file of size", intended_size, " is ", (end
-start), "seconds \n"

create_file_numbers_old('output.txt', 50 * 2**20)
create_file_numbers_bufsock('output2.txt', 50 * 2**20)
create_file_numbers_file_like('output3.txt', 50 * 2**20)




On Thu, May 16, 2013 at 9:35 PM, <(E-Mail Removed)> wrote:

> On Friday, May 17, 2013 8:50:26 AM UTC+5:30, (E-Mail Removed) wrote:
> > I need to write numbers into a file upto 50mb and it should be fast
> >
> > can any one help me how to do that?
> >
> > i had written the following code..
> >
> >

> -----------------------------------------------------------------------------------------------------------
> >
> > def create_file_numbers_old(filename, size):
> >
> > start = time.clock()
> >
> >
> >
> > value = 0
> >
> > with open(filename, "w") as f:
> >
> > while f.tell()< size:
> >
> > f.write("{0}\n".format(value))
> >
> > value += 1
> >
> >
> >
> > end = time.clock()
> >
> >
> >
> > print "time taken to write a file of size", size, " is ", (end -start),

> "seconds \n"
> >
> >

> ------------------------------------------------------------------------------------------------------------------
> >
> > it takes about 20sec i need 5 to 10 times less than that.

> size = 50mb
> --
> http://mail.python.org/mailman/listinfo/python-list
>


 
Reply With Quote
 
Fbio Santos
Guest
Posts: n/a
 
      05-18-2013
On 18 May 2013 20:19, "Dennis Lee Bieber" <(E-Mail Removed)> wrote:
>
> tOn Sat, 18 May 2013 08:49:55 +0100, Fbio Santos
> <(E-Mail Removed)> declaimed the following in
> gmane.comp.python.general:
>
>
> > You mentioned "\n" translating to two lines, but this won't happen. Windows
> > will not mess with what you write to your file. It's just that
> > traditionally windows and windows programs use \r\n instead of just \n.I
> > think it was for compatibility with os/2 or macintosh (I don't remember
> > which), which used \r for newlines.
> >

> Neither... It goes back to Teletype machines where one sent a
> carriage return to move the printhead back to the left, then sent a line
> feed to advance the paper (while the head was still moving left), and in
> some cases also provided a rub-out character (a do-nothing) to add an
> additional character time delay.
>
> TRS-80 Mod 1-4 used <cr> for "new line", I believe Apple used <lf>
> for "new line"... And both lost the ability to move down the page
> without also resetting the carriage to the left. In a world where both
> <cr><lf> is used, one could draw a vertical line of | by just spacing
> across the first line, printing |, then repeat <lf><bkspc>| until done.
> To do the same with conventional <lf> is "new line/return" one has to
> transmit all those spaces for each line...
>
> At 300baud, that took time....




On Sat, May 18, 2013 at 6:00 PM, Carlos Nepomuceno
<(E-Mail Removed)> wrote:
> Python really writes '\n\r' on Windows. Just check the files.
>
> Internal representations only keep '\n' for simplicity, but if you wanna keep track of the file length you have to take that into account.



On Sat, May 18, 2013 at 3:29 PM, Chris Angelico <(E-Mail Removed)> wrote:
> Into two characters, not two lines, but yes. A file opened in text
> mode on Windows will have its lines terminated with two characters.
> (And it's old Macs that used to use \r. OS/2 follows the DOS
> convention of \r\n, but again, many apps these days are happy with
> Unix newlines there too.)
>
> ChrisA


Thanks for your corrections and explanations. I stand corrected and
have learned something.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert a JPG picture into a vector drawing forexperimentation Danny D Digital Photography 107 05-21-2013 03:13 PM
How to convert image into Hex 2 Dimensional Array? sout saret Java 35 05-15-2013 10:50 AM
How to break a bash command into an array consisting of the argumentsin the command? Peng Yu Perl Misc 3 05-13-2013 10:27 AM
How do I encode and decode this data to write to a file? cl@isbd.net Python 11 05-01-2013 11:36 PM
How to get JSON values and how to trace sessions?? webmaster@terradon.nl Python 2 04-25-2013 02:12 PM



Advertisments