Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > optimization pointers?

Reply
Thread Tools

optimization pointers?

 
 
r.e.s.
Guest
Posts: n/a
 
      12-11-2003
Would anyone care to offer pointers on how the following code
might be optimized? Or alternatives? ...

---
def lz(string):
""" Return the Lempel-Ziv complexity (LZ7 of string. """

vocabulary = []
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.append(word)
word = ''
return len(vocabulary)
---

On my 2 GHz system running PythonWin, the above code averages
about 1.3 hrs to process a 1 MB string. I'd hoped to compute
LZ78 for multi-GB files, but it seems infeasible. (?)

Thanks for any feedback.
--
r.e.s.
 
Reply With Quote
 
 
 
 
Anton Vredegoor
Guest
Posts: n/a
 
      12-11-2003
"r.e.s." <(E-Mail Removed)> wrote:

>Would anyone care to offer pointers on how the following code
>might be optimized? Or alternatives? ...
>
>---
>def lz(string):
> """ Return the Lempel-Ziv complexity (LZ7 of string. """
>
> vocabulary = []
> word = ''
> for char in string:
> word += char
> if word not in vocabulary:
> vocabulary.append(word)
> word = ''
> return len(vocabulary)
>---
>
>On my 2 GHz system running PythonWin, the above code averages
>about 1.3 hrs to process a 1 MB string. I'd hoped to compute
>LZ78 for multi-GB files, but it seems infeasible. (?)


On a 300 mhz celeron the script below takes about ten or twenty
seconds, the original lz is commented out as it never seems to finish
for longer strings.

Anton

p.s. the file I used is here:
http://sailor.gutenberg.org/etext97/1donq10.zip

import sets

def lz(string):
""" Return the Lempel-Ziv complexity (LZ7 of string. """

vocabulary = []
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.append(word)
word = ''
return len(vocabulary)

def lzx(string):
""" Return the Lempel-Ziv complexity (LZ7 of string. """

vocabulary = sets.Set()
word = ''
for char in string:
word += char
if word not in vocabulary:
vocabulary.add(word)
word = ''
return len(vocabulary)

def test():
fn = '1donq10.txt'
s = file(fn,'rb').read(1000000)
print lzx(s)
#print lz(s)


if __name__=='__main__':
test()





 
Reply With Quote
 
 
 
 
r.e.s.
Guest
Posts: n/a
 
      12-12-2003
"Mel Wilson" <(E-Mail Removed)> wrote ...

> One simple improvement is to get rid of
>
> if an_item not in a_list
>
> which does a linear search over the list.
> A dictionary is faster.
>
> Without a full benchmark, it seems that

[...]
> will show about a x6 speedup.


I really appreciate the suggestions -- both yours
and Anton's.

After making the changes you indicated, the code
runs in about 1/500th the time of the original
(i.e. about 2 sec per MB for strings in RAM). The
sets idea also speeds things up tremendously, but
not as much -- it takes about 70% longer than the
dictionary approach.

Thanks again.
--
r.e.s.



 
Reply With Quote
 
Anthony McDonald
Guest
Posts: n/a
 
      12-12-2003
"Mel Wilson" <(E-Mail Removed)> wrote in message
news:rzO2/ks/(E-Mail Removed)...
>
> One simple improvement is to get rid of
>
> if an_item not in a_list
>
> which does a linear search over the list. A dictionary is faster.
>
> Without a full benchmark, it seems that appending
>
>
> def lz_complexity (s):
> voc = {}
> word = ''
> for c in s:
> word += c
> if not voc.get (word, False):
> voc[word] = True
> word = ''
> return len (voc.keys())
>
> if __name__ == '__main__':
> import time
> text = file ('lzc.py', 'rt').read()
> t0 = time.clock()
> t0 = time.clock() - t0
>
> c = lz (text)
> t1 = time.clock()
> c = lz (text)
> t1 = time.clock() - t1
>
> c = lz_complexity (text)
> t2 = time.clock()
> c = lz_complexity (text)
> t2 = time.clock() - t2
>
> print 'T1:', t1 - t0
> print 'T2:', t2 - t0
>
>
> will show about a x6 speedup.
>
> Avoiding
>
> a_string += another_string
>
> is a traditional speedup, but it seems you'll be needing the
> string form for every character processed, so it may not buy
> much in this case.
>
> Regards. Mel.


Heres my contribution, removing the string append in favour of slices. Buys
a little speed thanks to xrange.

def lz_comp_mine(text):
voc = {}
st = 0
for cur in xrange(1,len(text)+1):
if not voc.get(text[st:cur], False):
voc[text[st:cur]] = True
st = cur
return len(voc)

Anthony McDonald


 
Reply With Quote
 
Anthony McDonald
Guest
Posts: n/a
 
      12-13-2003
"Mel Wilson" <(E-Mail Removed)> wrote in message
newsIf2/ks/(E-Mail Removed)...
>
> Nice touch! I tried slices and took a huge performance
> hit (almost 3x the list version) but I didn't use `for ... in
> xrange ...`. It must have all been in the while-loop test
> and index incrementing.
>
> Regards. Mel.


Thanks.

My original attempt which incidentally used range was about half a second
slower than yours with 700K of test data. Just as I was about to close the
editor window I noticed your return statement.

return len (voc.keys())

Which creates a list of keys to then apply len to, but thats an unneeded
step as len applied directly to a dictionary returns the number of keys. I
made the change and gained 7 hundreds of a second. Not much, still behind
yours, but it suggested that chainging range to xrange and avoiding the list
creation might help. Viola!

The interesting thing that benchmarking with test data shows, is that the
difference in speed between our 2 routines is about 0.05secs per 700K
processed. That difference remains constant at least upto 80Mb of input
data, implying that slices are no more efficent than string appending, and
may actually in this case be less efficent.

Anthony McDonald


 
Reply With Quote
 
Anton Vredegoor
Guest
Posts: n/a
 
      12-14-2003
"Anthony McDonald" <tonym1972(at)club-internet(in)fr> wrote:

>return len (voc.keys())
>
>Which creates a list of keys to then apply len to, but thats an unneeded
>step as len applied directly to a dictionary returns the number of keys.


Below are two more small optimization possibilities. Originally I
wasn't going to post them because they are way into micro optimization
country and because the setdefault solution is beautiful but blond. At
least on some of my machines however they are both faster than the
solutions offered so far.

Anton

def lzx(s):
word,D = '',{}
Dget, Dset = D.get, D.__setitem__
for c in s:
word += c
if not Dget(word):
Dset(word,True)
word = ''
return len(D)

def lzy(s):
j,D = 0,{}
func = D.setdefault
for i in xrange(1,len(s)+1):
if func(s[j:i],j) is j: j = i
return len(D)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Zero Optimization and Sign Optimization??? Ravikiran C Programming 22 11-24-2008 03:19 AM
Don't care and optimization Andrew Greensted VHDL 3 01-11-2006 11:47 AM
BIOS Optimization Guide Rev. 8.21 Silverstrand Front Page News 0 08-24-2005 01:46 PM
Optimization problem, for a sports tournament JE Perl 0 08-04-2004 06:52 PM
will Synopsys Design Compiler automatically collect common sub expression to do intelligent optimization? walala VHDL 6 09-25-2003 11:43 AM



Advertisments