On Jun 8, 9:35*am, Abu Yahya <abu_ya...@invalid.com> wrote:
> A small application that I'm making requires me to store very long
> strings (>1000 characters) in a database.
>
> I will need to use these strings later to compare for equality to
> incoming strings from another application. I will also want to add some
> of the incoming strings to the storage, if they meet certain criteria.
>
> For my application, I get a feeling that storing these strings in my
> table will be a waste of space, and will impact performance due to
> retrieval and storage times, as well as comparison times.
>
> I considered using an SHA-512 hash of these strings and storing them in
> the database. However, while these will save on storage space, it will
> take time to do the hashing before comparing an incoming string. So I'm
> still wasting time. (Collisions due to hashing will not be a problem,
> since an occasional false positive will not be fatal for my application).
>
> What would be the best approach?
If it's that relevant that you're asking, measure first to see if it's
a problem. If you're that concerned that it will be, then code a
number of reasonable alternatives and measure.
Presumably you need to do a Map lookup on the incoming strings. I
thought about some itern scheme, but that won't work if you're
receiving a lot of incoming new strings. Storing hashs could work. Do
you need to store the strings in a database? If you can store them
locally, maybe a trie?
http://en.wikipedia.org/wiki/Trie
I somewhat doubt (maybe?) that you're going to get much better lookup
performance than a trie (but of course I would measure too).