Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Generate English words from a dictionary?

Reply
Thread Tools

Generate English words from a dictionary?

 
 
Alexander
Guest
Posts: n/a
 
      08-04-2010
What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.
 
Reply With Quote
 
 
 
 
Öö Tiib
Guest
Posts: n/a
 
      08-04-2010
On 4 aug, 20:46, Alexander <(E-Mail Removed)> wrote:
> What choices do I have if I need to be able to generate English words,
> or check whether a word exists (like in spell checking)? Language
> preferred - C or C++.


Spell checkers are part of such well known open source software like
Mozilla Firefox and OpenOffice.org . Probably in C or C++. Just
download source code, take and use, what holds you back?
 
Reply With Quote
 
 
 
 
Juha Nieminen
Guest
Posts: n/a
 
      08-04-2010
Alexander <(E-Mail Removed)> wrote:
> What choices do I have if I need to be able to generate English words,
> or check whether a word exists (like in spell checking)? Language
> preferred - C or C++.


I find this question odd. What is the problem you are seeing? Because
to me it seems kind of *trivial*. Take a big list of words (you can find
extensive lists for free with a little bit of googling), put then sorted
in an array, and that's about it. You can search the array or index it
randomly to get a random word.

(Of course if what you want is for the dictionary to take as little
memory as possible, then that's a completely different optimization
problem, one which has entire books written about.)
 
Reply With Quote
 
Alexander
Guest
Posts: n/a
 
      08-04-2010
On Aug 4, 9:11 pm, Juha Nieminen <(E-Mail Removed)> wrote:
> Alexander <(E-Mail Removed)> wrote:
> > What choices do I have if I need to be able to generate English words,
> > or check whether a word exists (like in spell checking)? Language
> > preferred - C or C++.

>
> I find this question odd. What is the problem you are seeing? Because
> to me it seems kind of *trivial*. Take a big list of words (you can find
> extensive lists for free with a little bit of googling), put then sorted
> in an array, and that's about it. You can search the array or index it
> randomly to get a random word.
>
> (Of course if what you want is for the dictionary to take as little
> memory as possible, then that's a completely different optimization
> problem, one which has entire books written about.)


Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.
 
Reply With Quote
 
Victor Bazarov
Guest
Posts: n/a
 
      08-04-2010
On 8/4/2010 2:23 PM, Alexander wrote:
>[..]
> I wanted to know what the practice is - it is quite common, but not
> neccesairily trivial - even without storing the file in memory
> (stupid) I have to find some quick enough algorithm to search for a
> given word in it.


There are two kinds of people when it comes to dictionaries: those that
use the wheels available today on the market, and those that re-invent
their own. If asked, I would guess that the former kind is more
numerous. If you want to improve the existing wheels, the best way is
to get a job at one of the existing wheel suppliers.

V
--
I do not respond to top-posted replies, please don't ask
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-05-2010
Alexander <(E-Mail Removed)> wrote:
> Yes, first thing I want is to allocate dynamically a few mb's of
> memory, and find a good way to convince the user to wait for the
> allocation in case the program evem succeeds.


I don't get it. Are you possibly talking about some embedded or
hand-held system with a few tens of kilobytes of RAM, or something?

I have made programs for the iPhone which use full-sized English (and
other language) dictionaries. Loading the dictionary into memory takes
a fraction of a second (even though the dictionary data is actually
*compressed* in the flash drive, so there's a decompression to memory
involved; it still takes just a fraction of a second).

On a desktop computer you could probably load such a dictionary into
memory an order of magnitude faster (not to talk that its size in memory
is inconsequential because desktop computers have typically at least an
order of magnitude more RAM avilable for apps than the iPhone).

Loading and using a full dictionary isn't such a heavy operation in
modern systems (even hand-held ones).

> I wanted to know what the practice is - it is quite common, but not
> neccesairily trivial - even without storing the file in memory
> (stupid) I have to find some quick enough algorithm to search for a
> given word in it.


Binary search is the simplest (especially in C++ because the standard
library offers it) and certainly fast enough.
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-05-2010
On Wed, 2010-08-04, Alexander wrote:
> On Aug 4, 9:11 pm, Juha Nieminen <(E-Mail Removed)> wrote:
>> Alexander <(E-Mail Removed)> wrote:
>> > What choices do I have if I need to be able to generate English words,


The generation part somehow got lost in the discussion. Did that mean
forming new words according to rules, e.g. black, blacker, blackest?
If so I recommend a free library like aspell, ispell, or whatever they
are called.

>> > or check whether a word exists (like in spell checking)? Language
>> > preferred - C or C++.

>>
>> I find this question odd. What is the problem you are seeing? Because
>> to me it seems kind of *trivial*. Take a big list of words (you can find
>> extensive lists for free with a little bit of googling), put then sorted
>> in an array, and that's about it. You can search the array or index it
>> randomly to get a random word.
>>
>> (Of course if what you want is for the dictionary to take as little
>> memory as possible, then that's a completely different optimization
>> problem, one which has entire books written about.)

>
> Yes, first thing I want is to allocate dynamically a few mb's of
> memory, and find a good way to convince the user to wait for the
> allocation in case the program evem succeeds.


If allocating memory takes noticeable time, your system is probably
swapping heavily, and *everything* takes noticeable time.

> I wanted to know what the practice is - it is quite common, but not
> neccesairily trivial - even without storing the file in memory
> (stupid) I have to find some quick enough algorithm to search for a
> given word in it.


The first step up from "search a text file" is something like
BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
on disk. Feed the words into one of those, and you can look them up
quickly later.

Or check what the Unix look(1) command does.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-05-2010
Jorgen Grahn <(E-Mail Removed)> wrote:
> The first step up from "search a text file" is something like
> BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
> on disk. Feed the words into one of those, and you can look them up
> quickly later.


What's wrong with having the words sorted in an array and using
binary search? It's not like an English dictionary is going to change
during the execution of a typical program...
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-06-2010
On Thu, 2010-08-05, Juha Nieminen wrote:
> Jorgen Grahn <(E-Mail Removed)> wrote:
>> The first step up from "search a text file" is something like
>> BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
>> on disk. Feed the words into one of those, and you can look them up
>> quickly later.

>
> What's wrong with having the words sorted in an array and using
> binary search?


Nothing. It's more I/O than using a BerkeleyDB though, if you are
looking up just a few words. Depends on what he's going to do.

The Unix look(1) command I mentioned is interesting. I traced the
Linux version. It just mmap()s "/usr/share/dict/words" (a sorted text
file), does some magic, and finds the prefix you look for. I bet it
does some kind of modified binary search directly in the mapped file.

> It's not like an English dictionary is going to change
> during the execution of a typical program...


That's one hint that DB may be overkill, yes. You buy a feature you
don't use.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dictionary spanish-english/english spanish Ant1 Computer Support 2 12-15-2007 08:57 PM
I want to make English-speaking friend to practic my poor English IchBin Java 1 03-26-2006 05:36 AM
English/English DLL =?Utf-8?B?UmFlZCBTYXdhbGhh?= ASP .Net 2 10-16-2005 10:32 AM
Dictionaries for English-French and English-Spanish fkissam Computer Support 2 07-14-2004 09:07 PM
AMERICAN ENGLISH vs BRITISH, CANADIAN, or AUSTRALIAN ENGLISH Proud USA Babe Digital Photography 247 10-07-2003 12:32 AM



Advertisments