Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > performance degradation when looping through lists

Reply
Thread Tools

performance degradation when looping through lists

 
 
Joachim Worringen
Guest
Posts: n/a
 
      04-07-2006
I need to process large lists (in my real application, this is to parse the
content of a file). I noticed that the performance to access the individual list
elements degrades over runtime.

This can be reproduced easily using this code:

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i))

j = 0
t = time.clock()
for i in range(len(A)):
j += int(A[i])
if i % p == 0:
t = time.clock() - t
print t

(the string conversion only servers to increase the duration of each iteration;
you can observer the same effect with ints, too).

When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49

I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the same
effect in my much more complex application)
2. The execution times increase steadily, from "0" to 0.13 for the "fast"
phases, and from 0.37 to 0.49 for the slow phases.

Within my application, the effect is more drastical as the following numbers
show (the processing time per line is roughly the same for each line!):
at line 10000 (32258 lines/s)
at line 20000 (478 lines/s)
at line 30000 (21276 lines/s)
at line 40000 (475 lines/s)
at line 50000 (15873 lines/s)
at line 60000 (471 lines/s)
at line 70000 (12658 lines/s)
at line 80000 (468 lines/s)
at line 90000 (10638 lines/s)
at line 100000 (464 lines/s)
at line 110000 (9090 lines/s)
at line 120000 (461 lines/s)
at line 130000 (7936 lines/s)
at line 140000 (457 lines/s)
at line 150000 (7042 lines/s)
at line 160000 (454 lines/s)
at line 170000 (6369 lines/s)
at line 180000 (451 lines/s)
at line 190000 (5780 lines/s)
at line 200000 (448 lines/s)
at line 210000 (4854 lines/s)
at line 220000 (444 lines/s)
at line 230000 (4504 lines/s)
at line 240000 (441 lines/s)
at line 250000 (4201 lines/s)
at line 260000 (438 lines/s)
at line 270000 (3952 lines/s)
at line 280000 (435 lines/s)
at line 290000 (3717 lines/s)
at line 300000 (432 lines/s)
at line 310000 (3508 lines/s)
at line 320000 (429 lines/s)
at line 330000 (3322 lines/s)
at line 340000 (426 lines/s)
at line 350000 (3154 lines/s)
at line 360000 (423 lines/s)
at line 370000 (3003 lines/s)
at line 380000 (421 lines/s)
at line 390000 (2873 lines/s)

Any ideas why this is like this, and what I could do about it? It really makes
may application non-scalable as the lines/s go down even further.

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
 
Reply With Quote
 
 
 
 
bruno at modulix
Guest
Posts: n/a
 
      04-07-2006
Joachim Worringen wrote:
> I need to process large lists (in my real application, this is to parse
> the content of a file).


Then you probably want to use generators instead of lists. The problem
with large lists is that they eat a lot of memory - which can result in
swapping .

> I noticed that the performance to access the
> individual list elements degrades over runtime.


I leave this point to gurus, but it may have to do with swapping. Also,
this is not real-time, so variations may have to do with your OS tasks
scheduler.

My 2 cents
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in '(E-Mail Removed)'.split('@')])"
 
Reply With Quote
 
 
 
 
Joachim Worringen
Guest
Posts: n/a
 
      04-07-2006
bruno at modulix wrote:
> Joachim Worringen wrote:
>> I need to process large lists (in my real application, this is to parse
>> the content of a file).

>
> Then you probably want to use generators instead of lists. The problem
> with large lists is that they eat a lot of memory - which can result in
> swapping .


The effect also shows up in tiny examples (as the one posted) which surely don't
swap on a 512MB machine.

Also, I only read parts of the file into memory to avoid that memory becomes
exhausted.

Of course, using less memory is always a good idea - do you have a pointer on
how to use generators for this application (basically, buffering file content in
memory for faster access)? BTW, the effect also shows up with the linecache module.

>> I noticed that the performance to access the
>> individual list elements degrades over runtime.

>
> I leave this point to gurus, but it may have to do with swapping. Also,
> this is not real-time, so variations may have to do with your OS tasks
> scheduler.


See above for the swapping. And the OS scheduler may create variations in
runtime, but not monotone degradation. I don't think these two effect come into
play here.

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      04-07-2006
Joachim Worringen wrote:

> I need to process large lists (in my real application, this is to parse
> the content of a file). I noticed that the performance to access the
> individual list elements degrades over runtime.
>
> This can be reproduced easily using this code:
>
> import time
>
> N=100000
> p=10000
>
> A=[]
> for i in range(N):
> A.append(str(i))
>
> j = 0
> t = time.clock()
> for i in range(len(A)):
> j += int(A[i])
> if i % p == 0:
> t = time.clock() - t
> print t
>
> (the string conversion only servers to increase the duration of each
> iteration; you can observer the same effect with ints, too).
>
> When running this, I get output like this:
> 0.0
> 0.37
> 0.03
> 0.4
> 0.06
> 0.43
> 0.09
> 0.46
> 0.13
> 0.49
>
> I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
> [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2
>
> I wonder why
> 1. The execution times alternate between "fast" and "slow" (I observe the
> same effect in my much more complex application)


Your timing code is buggy. Change it to

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i))

j = 0
start = time.clock()
for i in range(len(A)):
j += int(A[i])
if i % p == 0:
end = time.clock()
print end - start
start = end

Does the problem persist? I hope not.

Peter
 
Reply With Quote
 
Joachim Worringen
Guest
Posts: n/a
 
      04-07-2006
Peter Otten wrote:
> Your timing code is buggy. Change it to


Ooops, you're right. Everything is fine now... Thanks.

Joachim

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.
 
Reply With Quote
 
Alan Franzoni
Guest
Posts: n/a
 
      04-07-2006
Joachim Worringen on comp.lang.python said:

> I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
> [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2


Check Peter Otten's answer, and remember as well that GCC 2.96 can lead to
highly strange issues whenever used.

--
Alan Franzoni <(E-Mail Removed)>
-
Togli .xyz dalla mia email per contattarmi.
Rremove .xyz from my address in order to contact me.
-
GPG Key Fingerprint:
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
 
Reply With Quote
 
diffuser78@gmail.com
Guest
Posts: n/a
 
      04-07-2006
Hi,

I wrote a program some days back and I was using lists heavily for
performing operations such as pop, remove, append. My list size was
around 1024x3 and there were around 20 different objects like this.

What I want to ask you is that my program also degraded over a period
of time. I cannot post the code as its lot of code.

But I want to ask a question why List degrade. What other alternative
for lists is a faster measure.

Eveyr help is greatly appreciated,

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Small dynamic objects and performance degradation. Jason Heyes C++ 4 12-14-2004 01:46 AM
Performance degradation Boo R. Ghost ASP .Net 2 05-12-2004 06:05 AM
Re: looping through a list of lists. Rob Hunter Python 2 10-08-2003 05:31 PM
looping through a list of lists. saoirse_79 Python 0 10-08-2003 03:04 PM
Why large performance degradation? Kevin Wan C Programming 7 07-29-2003 01:19 AM



Advertisments