Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: hash()

Reply
Thread Tools

Re: hash()

 
 
Tim Peters
Guest
Posts: n/a
 
      12-06-2005
[John Marshall]
> For strings of > 1 character, what are the chances
> that hash(st) and hash(st[::-1]) would return the
> same value?


First, if `st` is a string, `st[::-1]` is a list. Do you really mean
to compare string hashes with list hashes here? I'm going to assume
not.

Second, what are your assumptions about (a) the universe of strings;
and, (b) the hash function?

Assuming a finite universe of strings (also finite ), and a hash
function that returns each of its H possible results "at random"
(meaning that there's no algorithmic way to predict any bit of the
hash output short of running the hash function), then the probability
that two distinct strings have the same hash is 1/H. It doesn't
matter to this outcome whether one input is the reversal of the other.

> My goal is to uniquely identify multicharacter strings,


Unclear what that means. Obviously, if your string universe contains
more than H strings, it's impossible for any hash function with H
possible values to return a different hash value each input.

> all of which begin with "/" and never end with "/".
> Therefore, st != st[::-1].


As at the start, I think you mean to say st != "".join(st[::-1]). I
don't know why you might think that matters, though. Is it simply
because this condition eliminates palindromes from your input
universe?

Anyway, to be concrete, using CPython's hash function on a 32-bit box,
H = 2**32-1. Call a string `s` bad iff:

s[0] == "/" and s[-1] != "/" and hash(s) == hash("".join(reversed(s)))

Then there are no bad strings of length 1, 2, 3, or 4. There are 4
bad strings of length 5:

'/\xde&\xf6C'
'/\xca\x0e\xfaC'
'/\xc4\x06\xfcC'
'/\xad\xd6\x01\xd6'

I didn't think about this -- I just wrote a little program to try all
~= 4 billion such strings. So if your universe is the set of all
5-character 8-bit strings that start with '/' but don't end with '/',
and you pick inputs uniformly at random from that universe, the chance
of a hash match between a string and its reversal is

4 / (256**3 * 255)

or a little less than 1 in a billion. For a truly random hash
function with a 32-bit output, the chance would be a little less than
1 in 4 billion.

It would be mildly surprising if those odds got worse as string length
increased. The md5 and sha hashes have much larger H, and were
designed for (among other things) good collision resistance.
 
Reply With Quote
 
 
 
 
Steven Bethard
Guest
Posts: n/a
 
      12-06-2005
Tim Peters wrote:
> First, if `st` is a string, `st[::-1]` is a list.


I hate to question the great timbot, but am I missing something?

>>> 'abcde'[::-1]

'edcba'

STeVe
 
Reply With Quote
 
 
 
 
Aahz
Guest
Posts: n/a
 
      12-06-2005
In article <(E-Mail Removed)>,
Tim Peters <(E-Mail Removed)> wrote:
>
>First, if `st` is a string, `st[::-1]` is a list. Do you really mean
>to compare string hashes with list hashes here? I'm going to assume
>not.


QOTW: Timbot makes an error
--
Aahz ((E-Mail Removed)) <*> http://www.pythoncraft.com/

"Don't listen to schmucks on USENET when making legal decisions. Hire
yourself a competent schmuck." --USENET schmuck (aka Robert Kern)
 
Reply With Quote
 
Scott David Daniels
Guest
Posts: n/a
 
      12-06-2005
Aahz wrote:
> QOTW: Timbot makes an error


Now exactly who was in charge of the replacement capacitors?

--Scott David Daniels
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments