Velocity Reviews > Simple Python Speed Benchmark

Simple Python Speed Benchmark

engsolnom@ipns.com
Guest
Posts: n/a

 01-10-2004
Knowing absolutely zero about benchmarks, I was interested in the posting referencing the paper by
Dr. Cowell-Shah. I was curious how long Python takes just to read and display time.clock(), soI made
the simple test below.

I was amazed at the difference in speed depending on when/how the clock times were converted to
milliseconds....almost 4 to 1...as indicated in the RESULTS below. I understand the variation in the
times is probably due to other running processes.

Any explanation?

The machine used is: P3, 1gHz, 256 meg of ram, Windows 2000, Python 2.3

Also, I did duplicate the intArithmetic (intMaz), using 1,000,000,000, as per the paper, and it
*did* take a long time. I tried a variety of ways to implement the process (while, range, xrange,
etc), and made some improvement, but not much. And of course I did see the "out of memory" message.

Norm

# THE CODE

import time

module_time_start = time.clock()

start_time_1 = time.clock()
stop_time_1 = time.clock()
print 'Test 1: ', (stop_time_1 - start_time_1) * 1000

start_time_2 = time.clock() * 1000
stop_time_2 = time.clock() * 1000
print 'Test 2: ', stop_time_2 - start_time_2

def get_time_1():

start_time_3 = time.clock()
stop_time_3 = time.clock()
duration = (stop_time_3 - start_time_3) * 1000
print 'Test 3: ', duration

def get_time_2():
start_time_4 = time.clock() * 1000
stop_time_4 = time.clock() * 1000
print 'Test 4: ', stop_time_4 - start_time_4

print '\nOK, now loop the methods...\n'

for ix in range(10):
print
get_time_1()
get_time_2()

module_time_stop = time.clock()

print '\nTotal module time: \n', (module_time_stop - module_time_start) * 1000

# THE RESULTS:

Test 1: 0.00363174649292
Test 2: 0.0167619068904

OK, now loop the methods...

Test 3: 0.00810158833036
Test 4: 0.0145269859717

Test 3: 0.006984127871
Test 4: 0.0145269859717

Test 3: 0.00586666741164
Test 4: 0.0145269859717

Test 3: 0.00614603252648
Test 4: 0.0145269859717

Test 3: 0.00614603252648
Test 4: 0.013968255742

Test 3: 0.00586666741164
Test 4: 0.0128507952826

Test 3: 0.00586666741164
Test 4: 0.0136888906272

Test 3: 0.0055873022968
Test 4: 0.0136888906272

Test 3: 0.00642539764132
Test 4: 0.0145269859717

Test 3: 0.0108952394788
Test 4: 0.0145269859717

Total module time:
2.89142893859

Andrew Dalke
Guest
Posts: n/a

 01-12-2004
http://www.velocityreviews.com/forums/(E-Mail Removed):
> I was curious how long Python takes just to read and display time.clock()

...
> I was amazed at the difference in speed depending on when/how the clock

times were converted to
> milliseconds....almost 4 to 1...as indicated in the RESULTS below. I

understand the variation in the
> times is probably due to other running processes.
>
> Any explanation?

There are quite a few things here.

First, if you look at the "Test 3:" and "Test 4:" numbers you'll see it's
only
2 times slower, not 4. Let's start with that case, where the tests you make
are in a function. You can see the PVM op codes generated by using
the dis module

>>> def spam():

.... t1 = time.clock()
....
>>> import dis
>>> dis.dis(spam)

2 0 LOAD_GLOBAL 0 (time)
3 LOAD_ATTR 1 (clock)
6 CALL_FUNCTION 0
9 STORE_FAST 0 (t1)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
>>> def spam2():

.... t1 = time.clock() * 1000
....
>>> dis.dis(spam2)

2 0 LOAD_GLOBAL 0 (time)
3 LOAD_ATTR 1 (clock)
6 CALL_FUNCTION 0
9 LOAD_CONST 1 (1000)
12 BINARY_MULTIPLY
13 STORE_FAST 0 (t1)
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
>>>

This shows that the "* 1000" is doing two more things; load the
constant "1000" and then do a binary multiply. The obvious
conclusion is that the factor of 2 slowdown comes from this step.
(That actually surprises me. Function call overhead should be
much more than a simple multiply.)

The next is the precision of the clock. The numbers generated
are pretty well quantized. For example, '0.014527' comes up
quite frequently as does '0.005867'. It appears that your clock
has a resolution of about 0.0003, which is 5% of the smaller
numbers or 2% of the bigger ones. If the time slices are just
at the wrong point then you may get a systematic error of up
to about 6% (ie, if the time.clock is done in just under an
integral time quantum while the time.clock()*1000 is just over
a quantum).

To get around that, you should do more tests, and not have
the results of one test strongly correlated with the other.
(You do because get_time_1() is always followed with a
get_time_2().) The easiest way is to use the timeit module,
either on the command-line or interactively, as

>>> import timeit
>>> timeit.Timer("time.clock", "import time").repeat(3, 10000)

[0.0063276180834463958, 0.0066243037524600368, 0.0078663607060889262]
>>> timeit.Timer("time.clock()", "import time").repeat(3, 10000)

[0.054394886949353349, 0.053860182268920198, 0.05341180138486834]
>>> timeit.Timer("time.clock()*1000", "import time").repeat(3, 10000)

[0.057691115018485561, 0.059368981429486212, 0.058075800674146194]
>>>

As you can see, on my box it takes 0.06 microseconds to do
'time.clock', 0.48 microseconds to make the call, and 0.03
microseconds to do the *1000.

Try those on your machine.

Andrew
(E-Mail Removed)