Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > External Hashing [was Re: matching strings in a large set of strings]

Reply
Thread Tools

External Hashing [was Re: matching strings in a large set of strings]

 
 
Helmut Jarausch
Guest
Posts: n/a
 
      04-30-2010
I think one could apply an external hashing technique which would require only
very few disk accesses per lookup.
Unfortunately, I'm now aware of an implementation in Python.
Does anybody know about a Python implementation of external hashing?

Thanks,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      04-30-2010
Helmut Jarausch <(E-Mail Removed)-aachen.de> writes:

> I think one could apply an external hashing technique which would require only
> very few disk accesses per lookup.
> Unfortunately, I'm now aware of an implementation in Python.
> Does anybody know about a Python implementation of external hashing?


SQLite?

--
John Bokma j3b

Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
 
Reply With Quote
 
 
 
 
Tim Chase
Guest
Posts: n/a
 
      04-30-2010
On 04/30/2010 12:51 PM, Helmut Jarausch wrote:
> I think one could apply an external hashing technique which would require only
> very few disk accesses per lookup.
> Unfortunately, I'm now aware of an implementation in Python.
> Does anybody know about a Python implementation of external hashing?


While you don't detail what you're hashing, Stephan Behnel
already suggested (in the parent thread) using one of Python's
native dbm modules (I just use anydbm and let it choose). The
underlying implementations should be fairly efficient assuming
you don't use the dumbdbm last-resort fallback). With the anydbm
interface, you can implement dict/set semantics as long as you
take care that everything is marshalled into and out of strings
for keys/values.

-tkc



 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      04-30-2010
Helmut Jarausch wrote:
> I think one could apply an external hashing technique which would require only
> very few disk accesses per lookup.
> Unfortunately, I'm now aware of an implementation in Python.
> Does anybody know about a Python implementation of external hashing?
>
> Thanks,
> Helmut.
>
>

That's what databases are for. But if you're trying to streamline it
further than a database, you need to give some constraints. In an
earlier thread, you mentioned strings with a constant size. I don't
think you ever actually said that the match string was also of the same
size, but I'll assume it is. Final question is how often you'll be
updating the list, versus searching it. If you do both frequently,
stick to a database.

On the other hand, if the list on disk is fixed, you could thoroughly
optimize its makeup. Perhaps the easiest efficient form would be to
have a file in two parts. The first part is the hash table, where each
entry has a seek-offset and a size. And the rest of the file is just a
"list" of items, sorted by hash value.

If you make a hash large enough that the maximum collision list is a
couple of k, then two seeks would get any record. First seek to the
hash entry, then use that to decide where to seek for the data, and how
much to read. Then you simply search that block in memory.

You could also play games with a two-level hash, where each block of
records has its own mini hash table. Doesn't seem worth it for a mere
83million records.

DaveA
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hashing long strings... WTH Java 8 08-22-2010 10:05 AM
matching strings in a large set of strings Karin Lagesen Python 13 05-03-2010 03:53 PM
How to replace all strings matching a pattern with correspondinglower case strings ? anonym Java 1 01-15-2009 07:29 PM
Speeding up matching Strings in a set softwarepearls_com Java 34 10-26-2008 06:35 PM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM



Advertisments