Velocity Reviews > C++ > two interesting data structure/algorithm questions

# two interesting data structure/algorithm questions

de
Guest
Posts: n/a

 11-16-2007
First of all, sorry for cross posting. Couldn't find a dedicated
newgroup for this.

My friend asked me two questions about data structure and algorithms.
I think she got them from some job interview books. I felt they are
interesting but haven't found good solutions yet. Could someone give
some suggestions?

Q1: Suppose you are looking up a contact name on your cell phone's
address book. Suppose your contact's name is "abc123". Then you type
"a" in the search box. The cell phone shows all names start with
letter "a" in sorted order. Then you type "b" after the first "a". The
cell phone shows all names start with letters "ab" in sorted order.
Then you type "c" after the "ab" and the cell phone shows all names
starts with letters "abc" in sorted order. You keep typing the contact
name and the list shown by your cell phone keeps updating and
shrinking untill at last either you find your contact name or you
don't.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

Q2: Suppose you have a book written in English. Your goal is to search
for an arbitrary string and return the page numbers where this string
is found. There could be more than one page number returned if the
string is cross page or if it's found on multiple pages. For example,
you have a string "this is an " on page 20 and then " interesting
book" on page 21. The string you are going to search for could be
"this" or "this is" or "is an" or " an interesting" or "an interesting
book". Then you'll need to return page numbers [20], [20], [20],
[20,21], [20,21] respectively. Of course, if you look for "book", it
could return hundreds of page numbers since "book" is really common.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?
I can think out using hashes but since the string to look up can be
arbitrary, the hash table will be huge...

Richard Heathfield
Guest
Posts: n/a

 11-16-2007
de said:

> First of all, sorry for cross posting. Couldn't find a dedicated
> newgroup for this.
>
> My friend asked me two questions about data structure and algorithms.

comp.programming might have been a better place.

> I think she got them from some job interview books. I felt they are
> interesting but haven't found good solutions yet. Could someone give
> some suggestions?
>
> Q1: Suppose you are looking up a contact name on your cell phone's
> address book. Suppose your contact's name is "abc123". Then you type
> "a" in the search box. The cell phone shows all names start with
> letter "a" in sorted order. Then you type "b" after the first "a". The
> cell phone shows all names start with letters "ab" in sorted order.
> Then you type "c" [etc]
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

You want a trie for that. A lookup costs O(log n).

> Q2: Suppose you have a book written in English. Your goal is to search
> for an arbitrary string and return the page numbers where this string
> is found. There could be more than one page number returned if the
> string is cross page or if it's found on multiple pages. For example,
> you have a string "this is an " on page 20 and then " interesting
> book" on page 21. The string you are going to search for could be
> "this" or "this is" or "is an" or " an interesting" or "an interesting
> book". Then you'll need to return page numbers [20], [20], [20],
> [20,21], [20,21] respectively. Of course, if you look for "book", it
> could return hundreds of page numbers since "book" is really common.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

You might want to ask Google about this. It's their area of expertise.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

James Kanze
Guest
Posts: n/a

 11-16-2007
On Nov 16, 7:29 pm, de <(E-Mail Removed)> wrote:
> First of all, sorry for cross posting. Couldn't find a dedicated
> newgroup for this.

> My friend asked me two questions about data structure and algorithms.
> I think she got them from some job interview books. I felt they are
> interesting but haven't found good solutions yet. Could someone give
> some suggestions?

> Q1: Suppose you are looking up a contact name on your cell phone's
> address book. Suppose your contact's name is "abc123". Then you type
> "a" in the search box. The cell phone shows all names start with
> letter "a" in sorted order. Then you type "b" after the first "a". The
> cell phone shows all names start with letters "ab" in sorted order.
> Then you type "c" after the "ab" and the cell phone shows all names
> starts with letters "abc" in sorted order. You keep typing the contact
> name and the list shown by your cell phone keeps updating and
> shrinking untill at last either you find your contact name or you
> don't.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

If the question is about algorithms, the answer they are looking
for is probably a trie, but if I were writing the program in
C++, I'd actually probably use a sorted std::vector, with
std::upper_bound and std::lower_bound (appending a '\0' to the
end of the string when I call lower_bound, and a '\xFF' for
upper_bound), switching to a straight linear search once the
difference between top and bottom fell beneath a certain point.
The results will use a lot less memory, and probably be faster
as well. And of course, most of the code is already written.

> Q2: Suppose you have a book written in English. Your goal is to search
> for an arbitrary string and return the page numbers where this string
> is found. There could be more than one page number returned if the
> string is cross page or if it's found on multiple pages. For example,
> you have a string "this is an " on page 20 and then " interesting
> book" on page 21. The string you are going to search for could be
> "this" or "this is" or "is an" or " an interesting" or "an interesting
> book". Then you'll need to return page numbers [20], [20], [20],
> [20,21], [20,21] respectively. Of course, if you look for "book", it
> could return hundreds of page numbers since "book" is really common.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?
> I can think out using hashes but since the string to look up can be
> arbitrary, the hash table will be huge...

Too large to fit into memory, or even onto a large disk.

The question is how the data is organized. If the text is in a
single array (std::vector<char>, or whatever), and the page
boundaries are maintained as a separate table of positions in
the array (and everything fits in memory), the fastest solution
is probably a BM search to find the string, then a binary search
(std::lower_bound and std::upper_bound again) to find the page
numbers. (Unless you're getting a hit every page, execution
time will be dominated by the string search, so organizing the
data as above, to optimize this, is probably the best way to go.
And if the strings being searched are reasonably long, a BM
search will beat anything in the standard library, perhaps even
by an order of magnitude or more. So it's worth it, even though
you have to write it yourself.)

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

user923005
Guest
Posts: n/a

 11-16-2007
On Nov 16, 10:29 am, de <(E-Mail Removed)> wrote:
> First of all, sorry for cross posting. Couldn't find a dedicated
> newgroup for this.
>
> My friend asked me two questions about data structure and algorithms.
> I think she got them from some job interview books. I felt they are
> interesting but haven't found good solutions yet. Could someone give
> some suggestions?
>
> Q1: Suppose you are looking up a contact name on your cell phone's
> address book. Suppose your contact's name is "abc123". Then you type
> "a" in the search box. The cell phone shows all names start with
> letter "a" in sorted order. Then you type "b" after the first "a". The
> cell phone shows all names start with letters "ab" in sorted order.
> Then you type "c" after the "ab" and the cell phone shows all names
> starts with letters "abc" in sorted order. You keep typing the contact
> name and the list shown by your cell phone keeps updating and
> shrinking untill at last either you find your contact name or you
> don't.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

Fast algorithms for sorting and searching strings

Full text Pdf (1.10 MB)
Source Symposium on Discrete Algorithms archive
Proceedings of the eighth annual ACM-SIAM symposium on Discrete
algorithms table of contents

New Orleans, Louisiana, United States
Pages: 360 - 369
Year of Publication: 1997
ISBN:0-89871-390-0
Authors Jon L. Bentley Bell Labs, Lucent Technologies, 700 Mountain
Avenue, Murray Hill., NJ
Robert Sedgewick Princeton University, Princeton, NJ

Sponsors SIGACT: ACM Special Interest Group on Algorithms and
Computation Theory
SIAM : Society for Industrial and Applied Mathematics

Publisher Society for Industrial and Applied Mathematics
Philadelphia, PA, USA

> Q2: Suppose you have a book written in English. Your goal is to search
> for an arbitrary string and return the page numbers where this string
> is found. There could be more than one page number returned if the
> string is cross page or if it's found on multiple pages. For example,
> you have a string "this is an " on page 20 and then " interesting
> book" on page 21. The string you are going to search for could be
> "this" or "this is" or "is an" or " an interesting" or "an interesting
> book". Then you'll need to return page numbers [20], [20], [20],
> [20,21], [20,21] respectively. Of course, if you look for "book", it
> could return hundreds of page numbers since "book" is really common.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?
> I can think out using hashes but since the string to look up can be
> arbitrary, the hash table will be huge...

http://clucene.sourceforge.net/index.php/Main_Page

HTH

Follow-up set to news:comp.programming

 Thread Tools

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post de C Programming 3 11-16-2007 07:29 PM GenxLogic Java 3 12-06-2006 08:41 PM Mark ASP .Net 3 08-26-2004 04:18 PM Patrick Michael A+ Certification 0 06-16-2004 04:53 PM Tman MCSE 8 05-21-2004 09:44 PM

Advertisments