Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > a question on google's algorithm

Reply
Thread Tools

a question on google's algorithm

 
 
%NAME%
Guest
Posts: n/a
 
      02-21-2007
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!

 
Reply With Quote
 
 
 
 
Alf P. Steinbach
Guest
Posts: n/a
 
      02-21-2007
* %NAME%:
> [interesting question, cross-posted to numerous groups]


No C++ content means OFF-TOPIC in [comp.lang.c++].

Thank you.

FUT: [comp.programming]

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
Reply With Quote
 
 
 
 
Bill Reid
Guest
Posts: n/a
 
      02-21-2007

%NAME% <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...

> The title is not exactly a disnomer, so pleaes let me explain...
>
> Say I have a large number of 0-1 vectors containing only
> bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
> I wish to find all the 0-1 vectors that are subsumed by that query bit
> vector.
>
> For example, given bit vector query as 01100, i wish to find all bit
> vectors that whose 1st, 4th, 5th bits are also zero, leaving
> the 2nd, 3rd bits as either 0 or 1.
>
> What I wish to do is to build an index on the bit vectors, so that I
> could avoid checking each 0-1 vector in sequence. In another word, I
> hope this index could help me quickly filter away those vectors that
> guarunteed not to be subsumed by the given 0-1 vector.
>
> I am not sure this is the right place to ask, but the index could be
> used to solve a search engine problem: the 0-1 vector
> corresponds to my keywords search (a 0 means that keyword NOT in my
> query, while a 1 means the keyword is in), and all the webpages
> corresponds to 0-1 vectors. I wish to find all the webpages that
> subsumes my query.
>
> If anyone happen to know how Google solves this problem, and it is not
> confidential, pls let me know, thanks a bunch!!!!
>

Have you tried Googling(TM) your question?

---
William Ernest Reid



 
Reply With Quote
 
mensanator@aol.com
Guest
Posts: n/a
 
      02-21-2007
On Feb 20, 6:56 pm, "%NAME%" <(E-Mail Removed)> wrote:
> The title is not exactly a disnomer, so pleaes let me explain...
>
> Say I have a large number of 0-1 vectors containing only
> bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
> I wish to find all the 0-1 vectors that are subsumed by that query bit
> vector.
>
> For example, given bit vector query as 01100, i wish to find all bit
> vectors that whose 1st, 4th, 5th bits are also zero, leaving
> the 2nd, 3rd bits as either 0 or 1.
>
> What I wish to do is to build an index on the bit vectors, so that I
> could avoid checking each 0-1 vector in sequence. In another word, I
> hope this index could help me quickly filter away those vectors that
> guarunteed not to be subsumed by the given 0-1 vector.
>
> I am not sure this is the right place to ask, but the index could be
> used to solve a search engine problem: the 0-1 vector
> corresponds to my keywords search (a 0 means that keyword NOT in my
> query, while a 1 means the keyword is in), and all the webpages
> corresponds to 0-1 vectors. I wish to find all the webpages that
> subsumes my query.
>
> If anyone happen to know how Google solves this problem, and it is not
> confidential, pls let me know, thanks a bunch!!!!


Suppose I have a database of all the words from the
Official Scrabble Players Dictionary (OSPD). I can
create a consonant/vowel vector for all 5-letter
words using the SQL statement:

SELECT OSPD_words.OSPD,
IIf(Mid$([OSPD],1,1) Like "[a,e,i,o,u]",1,0) AS bit1,
IIf(Mid$([OSPD],2,1) Like "[a,e,i,o,u]",1,0) AS bit2,
IIf(Mid$([OSPD],3,1) Like "[a,e,i,o,u]",1,0) AS bit3,
IIf(Mid$([OSPD],4,1) Like "[a,e,i,o,u]",1,0) AS bit4,
IIf(Mid$([OSPD],5,1) Like "[a,e,i,o,u]",1,0) AS bit5,
[bit1] & [bit2] & [bit3] & [bit4] & [bit5] AS vector
FROM OSPD_words
WHERE (((Len([OSPD]))=5));

which produces:

OSPD bit1 bit2 bit3 bit4 bit5 vector
added 1 0 0 1 0 10010
adder 1 0 0 1 0 10010
addle 1 0 0 0 1 10001
adeem 1 0 1 1 0 10110
adept 1 0 1 0 0 10100
adieu 1 0 1 1 1 10111
adios 1 0 1 1 0 10110
adits 1 0 1 0 0 10100
adman 1 0 0 1 0 10010
admen 1 0 0 1 0 10010
admit 1 0 0 1 0 10010
admix 1 0 0 1 0 10010
adobe 1 0 1 0 1 10101
adobo 1 0 1 0 1 10101
..
..
..

Once I have those vectors, I can then query them to
find anything that has the 1st, 4th & 5th letters
as consonants and the 2nd & 3rd letters can be either
consonants or vowels. This is done using the
like "0??00" pattern matching criteria. The question
marks are single character wild cards:

SELECT vector_test1.OSPD, vector_test1.vector
FROM vector_test1
WHERE (((vector_test1.vector) Like "0??00"));

which filters the above output to show only the
requested vectors:

OSPD vector
baals 01100
backs 01000
baddy 01000
badly 01000
baffs 01000
baffy 01000
baggy 01000
bahts 01000
bails 01100
bairn 01100
baith 01100
baits 01100
..
..
..
brach 00100
bract 00100
brads 00100
brags 00100
braky 00100
brand 00100
..
..
..

And because we didn'y define "y" as a vowel, there
are even some 5-letter words with no vowels:

OSPD vector
ghyll 00000
lymph 00000
hymns 00000
gypsy 00000
crypt 00000
dryly 00000
lynch 00000
synth 00000
glyph 00000
synch 00000

 
Reply With Quote
 
Jerry Coffin
Guest
Posts: n/a
 
      02-21-2007
In article <(E-Mail Removed) .com>,
http://www.velocityreviews.com/forums/(E-Mail Removed) says...
> The title is not exactly a disnomer, so pleaes let me explain...
>
> Say I have a large number of 0-1 vectors containing only
> bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
> I wish to find all the 0-1 vectors that are subsumed by that query bit
> vector.
>
> For example, given bit vector query as 01100, i wish to find all bit
> vectors that whose 1st, 4th, 5th bits are also zero, leaving
> the 2nd, 3rd bits as either 0 or 1.


Assuming your bit-vectors were stored in an integer type, you could use
a comparison like this:

if ((bit_vector | pattern) == pattern)
// found
else
// not found

If you need to store larger bit-vectors than will fit in any integer
type (only 32 are guaranteed), you can look at std::bit_vector and/or
std::vector<bool>. IIRC, bit_vector supports boolean operations
directly, but for vector<bool> you'll probably need to write your own.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      02-21-2007
Jerry Coffin wrote:
> (E-Mail Removed) says...
>
>> The title is not exactly a disnomer, so pleaes let me explain...
>>
>> Say I have a large number of 0-1 vectors containing only bits
>> corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
>> I wish to find all the 0-1 vectors that are subsumed by that
>> query bit vector.
>>
>> For example, given bit vector query as 01100, i wish to find
>> all bit vectors that whose 1st, 4th, 5th bits are also zero,
>> leaving the 2nd, 3rd bits as either 0 or 1.

>
> Assuming your bit-vectors were stored in an integer type, you
> could use a comparison like this:
>
> if ((bit_vector | pattern) == pattern)
> // found
> else
> // not found
>
> If you need to store larger bit-vectors than will fit in any
> integer type (only 32 are guaranteed), you can look at
> std::bit_vector and/or std::vector<bool>. IIRC, bit_vector
> supports boolean operations directly, but for vector<bool>
> you'll probably need to write your own.


The fault is basically that of the OP, but please do not post
off-topic information on c.l.c. It is easy to overlook the foolish
cross-posting, but you should not answer in any group where the
answer (and question) are off topic. F'ups set.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews


 
Reply With Quote
 
Randy Howard
Guest
Posts: n/a
 
      03-06-2007
On Tue, 20 Feb 2007 18:56:19 -0600, NAME% wrote
(in article <(E-Mail Removed) .com>):

> The title is not exactly a disnomer, so pleaes let me explain...


What in the world is a disnomer? Is this another one of those "we made
up a word because the mood struck us" deals?


--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw





 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Filtered Back Projection Algorithm (FBP Algorithm) Bapaiah Katepalli VHDL 1 06-23-2006 04:50 PM
ForLoop Performance / Algorithm question BlueBall Java 12 04-15-2006 03:45 AM
Graph Algorithm Question Anna Smith Java 2 04-24-2004 04:55 PM
Key generation algorithm and Cipher algorithm Ahmed Moustafa Java 0 11-15-2003 06:35 AM
Search algorithm question ortoped Java 1 08-19-2003 09:53 PM



Advertisments