Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Generating random strings without duplicates

Reply
Thread Tools

Generating random strings without duplicates

 
 
eric.gagnon@loto-quebec.com
Guest
Posts: n/a
 
      07-07-2005
In a program randomly generating 10 000 000 alphanumeric codes of 16
characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
efficient way to ensure that I do not generate duplicates?

STL set, map?
Could you give me a little code example?

Thank you.

 
Reply With Quote
 
 
 
 
Bob Hairgrove
Guest
Posts: n/a
 
      07-07-2005
On 7 Jul 2005 06:49:29 -0700, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

>In a program randomly generating 10 000 000 alphanumeric codes of 16
>characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
>efficient way to ensure that I do not generate duplicates?
>
>STL set, map?
>Could you give me a little code example?


160 MB with 1-byte characters, or at least twice that for wchar_t ...
hmmm?

Store them in a database, and put a unique index on that column. True
randomness does not exclude duplicate values.

--
Bob Hairgrove
(E-Mail Removed)
 
Reply With Quote
 
 
 
 
niklasb@microsoft.com
Guest
Posts: n/a
 
      07-08-2005
Bob Hairgrove wrote:
> On 7 Jul 2005 06:49:29 -0700, (E-Mail Removed) wrote:
>
> >In a program randomly generating 10 000 000 alphanumeric codes of 16
> >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
> >efficient way to ensure that I do not generate duplicates?
> >
> >STL set, map?
> >Could you give me a little code example?

>
> 160 MB with 1-byte characters, or at least twice that for wchar_t ...
> hmmm?
>
> Store them in a database, and put a unique index on that column. True
> randomness does not exclude duplicate values.


Depending on what sort of "randomness" you require, you might be
able to avoid any kind of storage/lookup at all. Certain pseudo-
random number generators have the property that the same number
never occurs more than once for the duration of the randomizer's
"period" (after which the entire sequence of numbers repeats).
The linear-congruential algorith used by many implementations of
rand() has this property -- but unfortunately, from what I've
read most tend to have periods of only 65,000 or so.

If you could find a version with a period of 10,000,000 or more
then your problem is 99% solved. All you then have to do is use
each random number to generate a 16-character string in such a
way that no two numbers yield the same string. (This should be
easy enough to guarantee. If you limit yourselves to characters
A-Z and 0-9 this gives you up to 36^16 possible strings which
is more than enough.)

This will give you an algorithm for generating up to N unique
strings (N being the period of the randomizer), where uniqueness
is guaranteed by the algorithm with no lookup required.

 
Reply With Quote
 
bart.kowalski@gmail.com
Guest
Posts: n/a
 
      07-08-2005
(E-Mail Removed) wrote:
> Depending on what sort of "randomness" you require, you might be
> able to avoid any kind of storage/lookup at all. Certain pseudo-
> random number generators have the property that the same number
> never occurs more than once for the duration of the randomizer's
> "period" (after which the entire sequence of numbers repeats).
> The linear-congruential algorith used by many implementations of
> rand() has this property -- but unfortunately, from what I've
> read most tend to have periods of only 65,000 or so.
>
> If you could find a version with a period of 10,000,000 or more
> then your problem is 99% solved.


Google: mersenne twister


Bart.

 
Reply With Quote
 
rossum
Guest
Posts: n/a
 
      07-08-2005
On 7 Jul 2005 06:49:29 -0700, (E-Mail Removed) wrote:

>In a program randomly generating 10 000 000 alphanumeric codes of 16
>characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
>efficient way to ensure that I do not generate duplicates?
>
>STL set, map?
>Could you give me a little code example?
>
>Thank you.


An alternative:

1 Generate a random string.
2 Hash the string to a value in a range > 0 - 9 999 999.
3 If this hash value has been used before then goto 1.
4 If the hash value is a new one then the string is also new.

Keep track of which hash values have appeared by using a bit array,
one bit per value in the hash range.

If you pick the hash range to be 25% larger than the number of strings
then on average you will only have to generate 4 strings to find a new
one at the end of the run.


Another alternative:

1 Generate a file of N > 10 000 000 unique strings in advance.
2 Pick a random number R in the range 1 .. N, both inclusive..
3 If R is already picked then pick another.
4 If R is not already picked then output the Rth string.

Again a bit array will keep track of the numbers that you have already
picked.

By generating the strings in advance you can take as long as you want
to get them right. You can load the file from disk for the actual run.


Yet another alternative:

1 Generate a file of N >= 10 000 000 unique strings in advance.
2 For IX = N downto 1
3 Pick a random number R in the range 1 .. IX, both inclusive.
4 Output string R.
5 Copy string IX over string R
6 Endfor



rossum


The ultimate truth is that there is no ultimate truth
 
Reply With Quote
 
Greg
Guest
Posts: n/a
 
      07-09-2005


Bob Hairgrove wrote:
> On 7 Jul 2005 06:49:29 -0700, (E-Mail Removed) wrote:
>
> >In a program randomly generating 10 000 000 alphanumeric codes of 16
> >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
> >efficient way to ensure that I do not generate duplicates?
> >
> >STL set, map?
> >Could you give me a little code example?

>
> 160 MB with 1-byte characters, or at least twice that for wchar_t ...
> hmmm?
>
> Store them in a database, and put a unique index on that column. True
> randomness does not exclude duplicate values.
>
> --
> Bob Hairgrove
> (E-Mail Removed)


Try "uuidgen"

Greg

 
Reply With Quote
 
E. Mark Ping
Guest
Posts: n/a
 
      07-09-2005
In article <(E-Mail Removed) .com>,
<(E-Mail Removed)> wrote:
>In a program randomly generating 10 000 000 alphanumeric codes of 16
>characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
>efficient way to ensure that I do not generate duplicates?


More information would make it possible to answer in something other
than guesses.

Why are you generating all these strings (how will they be used)? Why
is duplication an issue? etc.
--
Mark Ping
(E-Mail Removed)
 
Reply With Quote
 
Jim Langston
Guest
Posts: n/a
 
      07-13-2005

"E. Mark Ping" <(E-Mail Removed)> wrote in message
news:dap02i$jta$(E-Mail Removed)...
> In article <(E-Mail Removed) .com>,
> <(E-Mail Removed)> wrote:
>>In a program randomly generating 10 000 000 alphanumeric codes of 16
>>characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
>>efficient way to ensure that I do not generate duplicates?

>
> More information would make it possible to answer in something other
> than guesses.
>
> Why are you generating all these strings (how will they be used)? Why
> is duplication an issue? etc.
> --
> Mark Ping
> (E-Mail Removed)


And why can't you just use a UUID generator which is guaranteed to be
unique?


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Math.random() and Math.round(Math.random()) and Math.floor(Math.random()*2) VK Javascript 15 05-02-2010 03:43 PM
random.random(), random not defined!? globalrev Python 4 04-20-2008 08:12 AM
Generating random strings Mike P ASP .Net 6 05-31-2007 12:52 PM
Generating random password strings in JavaScript avanti Javascript 14 01-04-2007 04:40 AM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM



Advertisments