Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Writing a parser the right way?

Reply
Thread Tools

Writing a parser the right way?

 
 
beza1e1
Guest
Posts: n/a
 
      09-21-2005
I'm writing a parser for english language. This is a simple function to
identify, what kind of sentence we have. Do you think, this class
wrapping is right to represent the result of the function? Further
parsing then checks isinstance(text, Declarative).

-------------------
class Sentence(str): pass
class Declarative(Sentence): pass
class Question(Sentence): pass
class Command(Sentence): pass

def identify_sentence(text):
text = text.strip()
if text[-1] == '.':
return Declarative(text)
elif text[-1] == '!':
return Command(text)
elif text[-1] == '?':
return Question(text)
return text
-------------------

At first i just returned the class, then i decided to derive Sentence
from str, so i can insert the text as well.

 
Reply With Quote
 
 
 
 
Ben Sizer
Guest
Posts: n/a
 
      09-21-2005
beza1e1 wrote:
> I'm writing a parser for english language. This is a simple function to
> identify, what kind of sentence we have. Do you think, this class
> wrapping is right to represent the result of the function? Further
> parsing then checks isinstance(text, Declarative).
>
> -------------------
> class Sentence(str): pass
> class Declarative(Sentence): pass
> class Question(Sentence): pass
> class Command(Sentence): pass


As far as the parser is concerned, making these separate classes is
unnecessary when you could just store the sentence type as a normal
data member of Sentence. So the answer to your question is no, in my
opinion.

However, when you come to actually use the resulting Sentence objects,
perhaps the behaviour is different? If you're looking to use a standard
interface to Sentences but are going to be doing substantially
different processing depending on which sentence type you have, then
yes, this class hierarchy may be useful to you.

--
Ben Sizer

 
Reply With Quote
 
 
 
 
beza1e1
Guest
Posts: n/a
 
      09-21-2005
Well, a declarative sentence is essentially subject-predicate-object,
while a question is predicate-subject-object. This is important in
further processing. So perhaps i should code this order into the
classes? I need to think a little bit more about this.

Thanks for your feed for thought!

 
Reply With Quote
 
Christopher Subich
Guest
Posts: n/a
 
      09-21-2005
beza1e1 wrote:
> Well, a declarative sentence is essentially subject-predicate-object,
> while a question is predicate-subject-object. This is important in
> further processing. So perhaps i should code this order into the
> classes? I need to think a little bit more about this.


A question is subject-predicate-object?

That was unknown by me.

Honestly, if you're trying a general English parser, good luck.
 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      09-21-2005
"beza1e1" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...
> I'm writing a parser for english language. This is a simple function to
> identify, what kind of sentence we have. Do you think, this class
> wrapping is right to represent the result of the function? Further
> parsing then checks isinstance(text, Declarative).
>
> -------------------
> class Sentence(str): pass
> class Declarative(Sentence): pass
> class Question(Sentence): pass
> class Command(Sentence): pass
>
> def identify_sentence(text):
> text = text.strip()
> if text[-1] == '.':
> return Declarative(text)
> elif text[-1] == '!':
> return Command(text)
> elif text[-1] == '?':
> return Question(text)
> return text
> -------------------
>
> At first i just returned the class, then i decided to derive Sentence
> from str, so i can insert the text as well.
>

Andreas -

Are you trying to parse any English sentence, or just a limited form of
them? Parsing *any* English sentence (or question or interjection or
command) is a ***huge*** undertaking - Google for "natural language" and you
will find many efforts (with substantial time and money and manpower
resources) working on this problem. Applications range from automated
language translation to helpdesk automated analysis. I really suggest you
do a bit of research on this topic, just to get an idea of how big this job
is. Here's a Wikipedia link:
http://en.wikipedia.org/wiki/Natural...age_processing

Here are some simple examples, that quickly go beyond
subject-predicate-object:

I drive a truck.
I drive a red truck.
I drive a red truck to work.
I drive a red truck to the shop to work on it.
I drive a red truck to the shop to have some work done on it.
I drive a red truck very fast.
I drive a red truck through a red light.

Then factor in other sentences (past and future tenses, past and future
perfect tenses, figurative metaphors) and parsing general English is a major
job. The favorite test case of the natural language folks is "Time flies
like an arrow," which early auto-translation software converted to "Temporal
insects enjoy a pointed projectile."

On the other hand, if you plan to limit the type and/or content of the
sentences being parsed (such as computer system commands or adventure game
inputs, or descriptions of physical objects), then you can scope out a
reasonable capability by choosing a vocabulary of known verbs and objects,
and avoiding ambiguities (such as "set", as in "I set the set of glasses
next to the TV set," or "lead" as in "Lead me to the store that sells lead
pencils.").

Hope this sheds some light on your task,
-- Paul


 
Reply With Quote
 
Steven Bethard
Guest
Posts: n/a
 
      09-21-2005
Christopher Subich wrote:
> beza1e1 wrote:
>
>> Well, a declarative sentence is essentially subject-predicate-object,
>> while a question is predicate-subject-object. This is important in
>> further processing. So perhaps i should code this order into the
>> classes? I need to think a little bit more about this.

>
> A question is subject-predicate-object?
>
> That was unknown by me.
>
> Honestly, if you're trying a general English parser, good luck.


I second that. Have you read any of the natural language processing
reasearch in this area? There are a variety of English parsers already
available? Googling for "charniak parser" or "collins parser" should
get you something. I believe Dan Bikel has one too. Those are trained
on Wall Street Journal text. You might also look into Minipar, which is
rule-based and not as WSJ specific.

STeVe
 
Reply With Quote
 
beza1e1
Guest
Posts: n/a
 
      09-22-2005
Thanks for the hints. I just found NLTK and MontyLingua.

And yes, it is just adventure game language. This means every tense
except present tense is discarded as "not changing world". Furthermore
the parser will make a lot of assumptions, which are perhaps 90% right,
not perfect:

if word[-2:] == "ly":
return Adverb(word)

Note that uppercase words are identified before, so Willy is parsed
correctly as a noun. On the other hand "silly boy", will not return a
correct result.

Currently it is just a proof-of-concept. Maybe i can integrate a better
parser engine later. The idea is a kind of mud, where you talk correct
sentences instead of "go north". I envision a difference like Diablo to
Pen&Paper. I'd call it more a collaborative story telling game, than a
actual RPG.

I fed it your sentences, Paul. Result:
<['I', 'drive', 'a']> <['red']> <['truck']>
should be:
<['I']> <['drive']> <['a', 'red', 'truck']>

Verbs are the tricky part i think. There is no way to recognice them.
So i will have to get a database ... work to do.

 
Reply With Quote
 
Steven Bethard
Guest
Posts: n/a
 
      09-22-2005
beza1e1 wrote:
> Verbs are the tricky part i think. There is no way to recognice them.
> So i will have to get a database ... work to do.


Try the Brill tagger[1] or MXPOST[2].

STeVe

[1] http://www.cs.jhu.edu/~brill/code.html
[2] ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
import parser does not import parser.py in same dir on win Joel Hedlund Python 2 11-11-2006 03:46 PM
import parser does not import parser.py in same dir on win Joel Hedlund Python 0 11-11-2006 11:34 AM
XML Parser VS HTML Parser ZOCOR Java 11 10-05-2004 01:58 PM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger Java 0 06-09-2004 01:26 AM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger XML 0 06-09-2004 01:26 AM



Advertisments