Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Expected Token Density in Random Stream

Reply
Thread Tools

Expected Token Density in Random Stream

 
 
Andrew Tomazos
Guest
Posts: n/a
 
      12-07-2011
Summary: We want to find out how often a given token appears in a
random stream formed by concatenating randomly chosen strings from a
given set of strings.

Details:

Given S, an array of n (non-empty) strings; and T, a string of length
m;

We create a random stream of characters by the following process:

1. assign i = a (uniformly) random integer between 1 and n
inclusive
2. write the characters of the ith element of S to the stream
3. goto 1

(The elements of S are not necessarily the same length)

For example:

if S = {"a", "bug", "ug"}

then the stream may start as follows:

augbugugugaabug...

Each time T appears in the stream we call that a hit. More precisely:

1. Initialize a queue Q with m null elements

2. If Q equals T record a hit
3. Pop front of Q
4. Push to back of Q next char from stream
5. Goto 2

For example:

If T = "ugu"...

then:

augbugugugaabug...
1 2

We get two hits so far.

(Note hits can overlap each other)

What is the expected frequency of hits in terms of S and T?

More precisely:

let y be the number of chars read from the stream so far
let x be the number of hits

What is the expected limit of x/y as y approaches infinity?

We can approximate this empirically as follows in C++:

$ cat > HitFinder.cpp
<paste below code>
$ g++ -o HitFinder HitFinder.cpp
$ ./HitFinder ugu a bug ug
T = ugu
A = {a, bug, ug}

(x, y, x/y) = (0, 1, 0)
(x, y, x/y) = (0, 2, 0)
(x, y, x/y) = (0, 4, 0)
(x, y, x/y) = (0, 8, 0)
(x, y, x/y) = (1, 16, 1/16)
(x, y, x/y) = (3, 32, 1/10.6667)
(x, y, x/y) = (8, 64, 1/
(x, y, x/y) = (16, 128, 1/
...
(x, y, x/y) = (29821708, 268435456, 1/9.00134)
(x, y, x/y) = (59648899, 536870912, 1/9.00052)
(x, y, x/y) = (119304725, 1073741824, 1/8.99999)
(x, y, x/y) = (238593314, 2147483648, 1/9.0006)
(x, y, x/y) = (477198099, 4294967296, 1/9.00039)

As you can see the answer for this converges on 1/9

How can we calculate this figure by direct computation?

ie How would you implement a function with signature...

double ExpectedHitLimit(string T, vector<string> A)

....that quickly calculates this limit for any given T and A?

Thanks and have fun,
Andrew.

--
Andrew Tomazos <(E-Mail Removed)> <http://www.tomazos.com>


// ======================= HitFinder.cpp Cut Here
========================
// (C) 2011, Andrew Tomazos <(E-Mail Removed)>. All Rights
Reserved.

#include <queue>
#include <string>
#include <iostream>
#include <limits>
#include <cstdlib>

using namespace std;

typedef long long int64;
bool IsPow2(int64 i)
{
return i != 0 && !(i & (i - 1));
}

int64 x = 0;

void Hit()
{
x++;
}

int Rand(int n)
{
return rand() % n;
}

class RandCharStream
{
public:
vector<string> S;
size_t n, istr, ichar;

RandCharStream(vector<string> S_) :
S(S_),
n(S.size()),
istr(Rand(n)),
ichar(0)
{
}

char nextChar()
{
if (ichar == S[istr].size())
{
istr = Rand(n);
ichar = 0;
}

return S[istr][ichar++];
}
};

class HitFinder
{
public:
RandCharStream& stream;
queue<char> T;
int m;
queue<char> Q;

HitFinder(RandCharStream& stream_, string T_) :
stream(stream_),
m(T_.size())
{
for (int i = 0; i < m; i++)
{
T.push(T_[i]);
Q.push('\0');
}
}

void nextChar()
{
Q.pop();
Q.push(stream.nextChar());

if (Q == T)
Hit();
}
};

int main(int argc, char** argv)
{
if (argc < 3)
{
cerr << "Usage: " << argv[0] << " <T> <S1> <S2> ... <Sn>" << endl;
return -1;
}

string T = argv[1];
cout << "T = " << T << endl;

cout << "A = {";
vector<string> A;
for (int i = 2; i < argc; i++)
{
string s = argv[i];
cout << s << (i < argc - 1 ? ", " : "");
A.push_back(s);
if (s.empty())
{
cerr << "Element S" << i - 1 << " is empty" << endl;
return -1;
}
}
cout << "}" << endl;
cout << endl;

RandCharStream stream(A);
HitFinder finder(stream, T);

for (int64 y = 1; y < __LONG_LONG_MAX__; y++)
{
finder.nextChar();

if (IsPow2(y))
{
cout << "(x, y, x/y) = (" << x << ", " << y << ", ";
if (x == 0)
cout << "0";
else
cout << "1/" << double(y) / double(x);
cout << ")" << endl;
}
}

return 0;
}
//======================= HitFinder.cpp Cut Here
========================
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
random.random(), random not defined!? globalrev Python 4 04-20-2008 08:12 AM
This is an unexpected token. The expected token is 'NAME' =?Utf-8?B?Y2FzaGRlc2ttYWM=?= ASP .Net 2 07-13-2007 11:38 AM
Token pasting (## operator) - Add whitespace to a token Wessi C Programming 3 08-11-2005 01:02 PM
"token" "token sequence" "scalar variable" "vector" ?? G Fernandes C Programming 1 02-18-2005 05:32 AM
preprocessor, token concatenation, no valid preprocessor token Cronus C++ 1 07-14-2004 11:10 PM



Advertisments