Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Fast Data Comparison (dict v. list. v string)

Reply
Thread Tools

Fast Data Comparison (dict v. list. v string)

 
 
Scott Brady Drummonds
Guest
Posts: n/a
 
      04-14-2004
Hi, everyone,

I'm a relative novice at Python and am working on some optimizations in my
first Python project. At its core, this tool I'm writing needs to perform
many comparisons to fixed-size lists of fixed-length strings.

Currently, my implementation uses dictionaries to store each string. I
maintain a separate "key-mapping" dictionary that maps keys from one
dictionary to the other. Then, all I have to do to compare the two
dictionaries is as follows:
for metaKey in keyMap.keys():
if dict1[metaKey] != dict2[keyMap[metaKey]]:
# Do some processing

Since the sizes of the dictionaries never change, I tried implementing this
using lists. The solution looks something like this (and assumes that a
pre-processing phase has sorted the contents of each list so their indexes
are the same):
for i in len(list1):
if list1[i] != list2[i]:
# Do some processing

As it turns out, this implementation appears to be about 33% faster than the
dictionary-based one. Now, assuming that the datum being stored at each
index can fit into one character, I could do a string-based implementation
like this:
for i in len(string1):
if string1[i] != string[i]:
# Do some processing

This last solution actually runs about the same as the dictionary, which
takes 50% longer than the list implementation.

Now, my questions are:
1) Does anyone have another suggestion as to how I can organize these data
so that I can compare many elements many times?
2) Is there a string comparison operator that will return which indexes
have different values? Maybe it would be faster than the iterative
comparison approach for the third implementation.
3) Since my data are changing between the successive executions of the code
snippets above, I need a way of having the other parts of the program update
it. But, it appears that strings are constant as I can't assign individual
characters with "string1[i] = '0'". Is there any way around this?

Thanks!
Scott

--
Remove ".nospam" from the user ID in my e-mail to reply via e-mail.


 
Reply With Quote
 
 
 
 
Larry Bates
Guest
Posts: n/a
 
      04-15-2004
Scott,

You should take a close look at list comprehensions
and possibly sets (new in V2.3). I'm not all that
familiar with sets, but your problem sounds like
they might come in handy.

Example list comprehension:

list1notinlist2=[x for x in list1 if x not in list2]
for item in list1notinlist:
# Do some processing

Note: In Python if you find yourself using [i]
indexes into lists, there is "almost" always a
better way (at least that is my experience).

Larry Bates
Syscon, Inc.

"Scott Brady Drummonds" <(E-Mail Removed)> wrote in
message news:c5k8rp$ql6$(E-Mail Removed)...
> Hi, everyone,
>
> I'm a relative novice at Python and am working on some optimizations in my
> first Python project. At its core, this tool I'm writing needs to perform
> many comparisons to fixed-size lists of fixed-length strings.
>
> Currently, my implementation uses dictionaries to store each string. I
> maintain a separate "key-mapping" dictionary that maps keys from one
> dictionary to the other. Then, all I have to do to compare the two
> dictionaries is as follows:
> for metaKey in keyMap.keys():
> if dict1[metaKey] != dict2[keyMap[metaKey]]:
> # Do some processing
>
> Since the sizes of the dictionaries never change, I tried implementing

this
> using lists. The solution looks something like this (and assumes that a
> pre-processing phase has sorted the contents of each list so their indexes
> are the same):
> for i in len(list1):
> if list1[i] != list2[i]:
> # Do some processing
>
> As it turns out, this implementation appears to be about 33% faster than

the
> dictionary-based one. Now, assuming that the datum being stored at each
> index can fit into one character, I could do a string-based implementation
> like this:
> for i in len(string1):
> if string1[i] != string[i]:
> # Do some processing
>
> This last solution actually runs about the same as the dictionary, which
> takes 50% longer than the list implementation.
>
> Now, my questions are:
> 1) Does anyone have another suggestion as to how I can organize these

data
> so that I can compare many elements many times?
> 2) Is there a string comparison operator that will return which indexes
> have different values? Maybe it would be faster than the iterative
> comparison approach for the third implementation.
> 3) Since my data are changing between the successive executions of the

code
> snippets above, I need a way of having the other parts of the program

update
> it. But, it appears that strings are constant as I can't assign

individual
> characters with "string1[i] = '0'". Is there any way around this?
>
> Thanks!
> Scott
>
> --
> Remove ".nospam" from the user ID in my e-mail to reply via e-mail.
>
>



 
Reply With Quote
 
 
 
 
Scott Brady Drummonds
Guest
Posts: n/a
 
      04-15-2004

"Larry Bates" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Scott,
>
> You should take a close look at list comprehensions
> and possibly sets (new in V2.3). I'm not all that
> familiar with sets, but your problem sounds like
> they might come in handy.


Excellent suggestion! Thanks so much for the idea.

Scott


 
Reply With Quote
 
george young
Guest
Posts: n/a
 
      04-16-2004
"Scott Brady Drummonds" <(E-Mail Removed)> wrote in message news:<c5k8rp$ql6$(E-Mail Removed)>...
> Hi, everyone,
>
> I'm a relative novice at Python and am working on some optimizations in my
> first Python project. At its core, this tool I'm writing needs to perform
> many comparisons to fixed-size lists of fixed-length strings.
>
> Currently, my implementation uses dictionaries to store each string. I
> maintain a separate "key-mapping" dictionary that maps keys from one
> dictionary to the other. Then, all I have to do to compare the two
> dictionaries is as follows:
> for metaKey in keyMap.keys():
> if dict1[metaKey] != dict2[keyMap[metaKey]]:
> # Do some processing
>
> Since the sizes of the dictionaries never change, I tried implementing this
> using lists. The solution looks something like this (and assumes that a
> pre-processing phase has sorted the contents of each list so their indexes
> are the same):
> for i in len(list1):
> if list1[i] != list2[i]:
> # Do some processing
>
> As it turns out, this implementation appears to be about 33% faster than the
> dictionary-based one. Now, assuming that the datum being stored at each
> index can fit into one character, I could do a string-based implementation
> like this:
> for i in len(string1):
> if string1[i] != string[i]:
> # Do some processing
>
> This last solution actually runs about the same as the dictionary, which
> takes 50% longer than the list implementation.
>
> Now, my questions are:
> 1) Does anyone have another suggestion as to how I can organize these data
> so that I can compare many elements many times?
> 2) Is there a string comparison operator that will return which indexes
> have different values? Maybe it would be faster than the iterative
> comparison approach for the third implementation.
> 3) Since my data are changing between the successive executions of the code
> snippets above, I need a way of having the other parts of the program update
> it. But, it appears that strings are constant as I can't assign individual
> characters with "string1[i] = '0'". Is there any way around this?


You might also take a look at psyco: http://psyco.sourceforge.net. It is
very easy to use, non-intrusive, and claims large performance improvements for
code like this. I have hopes that things like psyco can let us keep code
in the simplest and *clearest* form, and let external optimization mechanisms
make it fast enough when needed.

-- George Young
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparison of 2 files and generating the output based on comparison Deepu Perl Misc 1 02-07-2011 03:09 PM
Price Comparison Service. Best Deal. Special Coupon atBest-Price-Comparison.com rapee Digital Photography 0 03-14-2008 06:46 AM
More memory: How fast is fast rfdjr1@optonline.net Computer Support 5 05-19-2004 05:45 PM
I NEED HELP FAST!!!!! REAL FAST!!!!! R. Jizzle MCSE 3 09-29-2003 08:51 PM
Super-fast AA Chargers: Anything as fast as the 15 minute Rayovac? David Chien Digital Photography 4 08-30-2003 07:49 PM



Advertisments