Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > find words that contains some specific letters

Reply
Thread Tools

find words that contains some specific letters

 
 
Fralentino
Guest
Posts: n/a
 
      06-01-2009
Hi!

I want to make an algorithm that searches for all words in a
dictionary that contains some specific set of letters. For example i
want to find all words containing the exact letters t,a,s,m,p so the
word stamp is ok.

So basically i want to compare a String with a set of letters and find
out if you can build this string with these exact letter. I have a
solution for this but i dont think it's the the most optimal one so i
would like som tips on how do to this in a optimal and rather easy
way. Is there some recommended designs in how to do this? Is there
perhaps some method or class in the java api that does this for you?

Thanks.
/Baretos
 
Reply With Quote
 
 
 
 
Andrew Thompson
Guest
Posts: n/a
 
      06-01-2009
On Jun 1, 4:16*pm, Fralentino <bare...@gmail.com> wrote:
>...I have a solution for this ..


What is your solution?

--
Andrew T.
pscode.org
 
Reply With Quote
 
 
 
 
Fralentino
Guest
Posts: n/a
 
      06-01-2009
On 1 Juni, 09:20, Andrew Thompson <andrewtho...@gmail.com> wrote:
> On Jun 1, 4:16*pm, Fralentino <bare...@gmail.com> wrote:
>
> >...I have a solution for this ..

>
> What is your solution?
>
> --
> Andrew T.
> pscode.org


My solution:
boolean compareLetters(String toBeCompared,ArrayList<String> letters)
{

for(int i = 0;i<toBeCompared.length();i++)
{
if(letters.contains(""+toBeCompared.charAt(i)))
{
letters.remove(""+toBeCompared.charAt(i));
}
else
return false;
}
return true;

}
 
Reply With Quote
 
John B. Matthews
Guest
Posts: n/a
 
      06-01-2009
In article
<d121e6f6-6afa-43e7-bcd1->,
Fralentino <> wrote:

> On 1 Juni, 09:20, Andrew Thompson <andrewtho...@gmail.com> wrote:
> > On Jun 1, 4:16Â*pm, Fralentino <bare...@gmail.com> wrote:
> >
> > >...I have a solution for this ..

> >
> > What is your solution?

>
> My solution:
> boolean compareLetters(String toBeCompared,ArrayList<String> letters)
> {
>
> for(int i = 0;i<toBeCompared.length();i++)
> {
> if(letters.contains(""+toBeCompared.charAt(i)))
> {
> letters.remove(""+toBeCompared.charAt(i));
> }
> else
> return false;
> }
> return true;
>
> }


IIUC, you then have to run this method on every word in your dictionary.

Another approach is to permute the letters and search the dictionary.
A binary search of an ordered word list is O(log n), worst case.
Here's an example in Ada:

<http://home.roadrunner.com/~jbmatthews/jumble.html>

In Java, permutations can be checked quickly with the contains() method
of the Set interface, once the dictionary is read:

private static final String NAME = "/usr/share/dict/words";
private static final Set<String> set = new TreeSet<String>();
static {
try {
File file = new File(NAME);
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(file)));
String s;
while ((s = in.readLine()) != null) {
set.add(s);
}
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
}

Finding the permutations of a string in Java is straightforward:

<http://www.google.com/search?q=java+permute>

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
 
Reply With Quote
 
Giovanni Azua
Guest
Posts: n/a
 
      06-01-2009
Hi John,

"John B. Matthews" <> wrote in
> Another approach is to permute the letters and search the dictionary.
> A binary search of an ordered word list is O(log n), worst case.
> Here's an example in Ada:
>
> <http://home.roadrunner.com/~jbmatthews/jumble.html>
>
> In Java, permutations can be checked quickly with the contains() method
> of the Set interface, once the dictionary is read:
>
> private static final String NAME = "/usr/share/dict/words";
> private static final Set<String> set = new TreeSet<String>();
> static {
> try {
> File file = new File(NAME);
> BufferedReader in = new BufferedReader(
> new InputStreamReader(new FileInputStream(file)));
> String s;
> while ((s = in.readLine()) != null) {
> set.add(s);
> }
> } catch (IOException ex) {
> System.err.println(ex.getMessage());
> }
> }
>
> Finding the permutations of a string in Java is straightforward:
>
> <http://www.google.com/search?q=java+permute>
>

I was wondering what the overall lookup complexity would be here ...

Assuming "n" being the number of letters in a search word and "m" the total
number of words in the dictionary then the complexity would be:

o(n! * log2 m) => O(n! * log m) isn't this NP complexity? I am looking for
the exponential function that approximates n! I remember in class we used to
evaluate complexity using the O notation with exponential function rather
than n! ...

Best regards,
Giovanni


 
Reply With Quote
 
Giovanni Azua
Guest
Posts: n/a
 
      06-01-2009

"Giovanni Azua" <> wrote in message
> I was wondering what the overall lookup complexity would be here ...
>
> Assuming "n" being the number of letters in a search word and "m" the
> total number of words in the dictionary then the complexity would be:
>
> o(n! * log2 m) => O(n! * log m) isn't this NP complexity? I am looking for
> the exponential function that approximates n! I remember in class we used
> to evaluate complexity using the O notation with exponential function
> rather than n! ...
>

I found it here "Stirling's approximation"
<http://en.wikipedia.org/wiki/Factorial#Rate_of_growth>

so it would be o(n! * log2 m) => O(n^n * log m) and this is NP ...

Best regards,
Giovanni


 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      06-01-2009
Giovanni Azua wrote:
> "Giovanni Azua" <> wrote in message
>> I was wondering what the overall lookup complexity would be here ...
>>
>> Assuming "n" being the number of letters in a search word and "m" the
>> total number of words in the dictionary then the complexity would be:
>>
>> o(n! * log2 m) => O(n! * log m) isn't this NP complexity? I am looking for
>> the exponential function that approximates n! I remember in class we used
>> to evaluate complexity using the O notation with exponential function
>> rather than n! ...
>>

> I found it here "Stirling's approximation"
> <http://en.wikipedia.org/wiki/Factorial#Rate_of_growth>
>
> so it would be o(n! * log2 m) => O(n^n * log m) and this is NP ...


You only build the Set of permuted letters once. Then, following John
Matthews's suggestion to which you replied, you look up the n! permutations
with an O(1) lookup in the Set of dictionary words. So the actual complexity
is just O(n!)

Or one build the dictionary as a Map indexed by word letters in alphabetical
order with the values being corresponding Sets of words using those letters.
Then you only do an O(1) lookup into the Map to find the single ordered
permutation of the search term, then return the matching Set directly. So now
the overall lookup complexity is that of sorting the letters in the search term.

--
Lew
 
Reply With Quote
 
Arved Sandstrom
Guest
Posts: n/a
 
      06-01-2009
Fralentino wrote:
> Hi!
>
> I want to make an algorithm that searches for all words in a
> dictionary that contains some specific set of letters. For example i
> want to find all words containing the exact letters t,a,s,m,p so the
> word stamp is ok.
>
> So basically i want to compare a String with a set of letters and find
> out if you can build this string with these exact letter. I have a
> solution for this but i dont think it's the the most optimal one so i
> would like som tips on how do to this in a optimal and rather easy
> way. Is there some recommended designs in how to do this? Is there
> perhaps some method or class in the java api that does this for you?
>
> Thanks.
> /Baretos


Assuming that we are talking about a fresh search every time, your
proposed algorithm (later post) is probably not all that bad, although
I'd be inclined to take the letters of interest, put them in a HashSet,
and then process each letter of a dictionary word and see if the set
contains it.

Any near-optimal method would also take into account the frequency of
letters in a language. For example, if you saw that your set of letters
contained one rare letter, for example, you'd save time overall by first
doing a search for this letter specifically in each dictionary word, and
discarding all of them (the large majority) that do not contain that
rare letter (*). Then you can do the comprehensive search.

AHS

* If your entire dictionary was composed into a single string, and you
maintained an offsets table, you could just burn through that large
string and look for the rare letter locations, and from the table figure
out what words contain it.
 
Reply With Quote
 
Giovanni Azua
Guest
Posts: n/a
 
      06-01-2009
Hi Lew,

"Lew" <> wrote in message
>> I found it here "Stirling's approximation"
>> <http://en.wikipedia.org/wiki/Factorial#Rate_of_growth>
>>
>> so it would be o(n! * log2 m) => O(n^n * log m) and this is NP ...

>
> You only build the Set of permuted letters once. Then, following John

Once per search to be precise.

> Matthews's suggestion to which you replied, you look up the n!
> permutations with an O(1) lookup in the Set of dictionary words. So the
> actual complexity is just O(n!)
>

One word lookup in the Set costs O(log m) binary search and not O(1).
Therefore the O(log m) is *for each* generated permutation, and this is why
the multiplication i.e. O(n! * log m)

> Or one build the dictionary as a Map indexed by word letters in
> alphabetical order with the values being corresponding Sets of words using
> those letters. Then you only do an O(1) lookup into the Map to find the
> single ordered permutation of the search term, then return the matching
> Set directly. So now the overall lookup complexity is that of sorting the
> letters in the search term.
>

I was writing meantime a similar algorithm to this one you explain ... you
have to watch for multiple occurrences of the same letter though and the Set
should be SortedSet so there is calculating intercept of the Sets which is
O(n) if the Sets are SortedSet.

Best regards,
Giovanni





 
Reply With Quote
 
Tom McGlynn
Guest
Posts: n/a
 
      06-01-2009
On Jun 1, 5:35*am, Fralentino <bare...@gmail.com> wrote:
> On 1 Juni, 09:20, Andrew Thompson <andrewtho...@gmail.com> wrote:
>
> > On Jun 1, 4:16*pm, Fralentino <bare...@gmail.com> wrote:

>
> > >...I have a solution for this ..

>


A somewhat different approach to the text manipulation methods
discussed elsewhere on the thread would be to map each letter to a
prime number and compute the product corresponding to the letters in
each word. You would be interested in words that had a specific value
of that product since permutations of the letters will not affect it
-- and since you are using primes factors you can't have any
interlopers. The 26th prime is 101, so you could do up to about 8
letter words easily with longs. After that you'd eventually need to
use BigIntegers. Using the small primes for common letters is an
obvious optimization (e=2,t=3,... for English). I've no idea how this
would stack up in efficiency with the other approaches.

Regards,
Tom McGlynn
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
In css is it posible to use as a selector part of the id? That is: Select if the ID contains the letters dp1 Cal Who ASP .Net 2 03-31-2010 11:10 PM
RE: find words that contains some specific letters [jumble] Giovanni Azua Java 4 07-28-2009 06:11 PM
making all letters Caps/Small Letters Merrigan Python 4 12-14-2007 10:10 AM
Changing from capital letters to small letters using perl Venugopal Perl Misc 11 11-05-2003 06:07 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57