Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   text analysis in python (http://www.velocityreviews.com/forums/t343360-text-analysis-in-python.html)

Maurice Ling 04-03-2005 10:21 AM

text analysis in python
 
Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice




Cameron Laird 04-03-2005 12:08 PM

Re: text analysis in python
 
In article <mailman.1255.1112523725.1799.python-list@python.org>,
Maurice Ling <mauriceling@acm.org> wrote:
.
.
.
>In the Java world, there is GATE (general architecture for text
>engineering) and it seems very impressive. Are there something like that
>for Python?

.
.
.
I don't know if you're aware that, in a fairly strong sense,
anything "[i]n the Java world" *is* "for Python". If you
program with Jython (for example--there are other ways to
achieve much the same end), your source code can be in
Python, but you have full access to any library coded in Java.

beliavsky@aol.com 04-03-2005 01:11 PM

Re: text analysis in python
 
The book "Text Processing in Python" by David Mertz, available online
at http://gnosis.cx/TPiP/ , may be helpful.


Maurice LING 04-03-2005 01:38 PM

Re: text analysis in python
 
.
> I don't know if you're aware that, in a fairly strong sense,
> anything "[i]n the Java world" *is* "for Python". If you
> program with Jython (for example--there are other ways to
> achieve much the same end), your source code can be in
> Python, but you have full access to any library coded in Java.


Yes, I do know the presence of Jython but had not used it in any
productive ways. So I might need some assistance here... Say I code my
stuffs in Jython (importing java libraries) in a file "text.py"... Will
there be any issues when I try to import text.py into CPython?

My impression is that NLTK is more of a teaching tool rather than for
production use. Please correct me if I'm wrong... The main reason I'm
looking at NLTK is that it is pure python and is about the comprehensive
text analysis toolkit in python. Are there any projects that uses NLTK?

Thanks and Cheers
Maurice

Mark Winrock 04-03-2005 06:00 PM

Re: text analysis in python
 
Maurice Ling wrote:
> Hi,
>
> I'm a postgraduate and my project deals with a fair bit of text
> analysis. I'm looking for some libraries and tools that is geared
> towards text analysis (and text engineering). So far, the most
> comprehensive toolkit in python for my purpose is NLTK (natural language
> tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
> there any OSS tools out there that is more comprehensive than NLTK?
>
> In the Java world, there is GATE (general architecture for text
> engineering) and it seems very impressive. Are there something like that
> for Python?
>
> Thanks in advance.
>
> Cheers
> Maurice
>
>


You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."

Maurice LING 04-03-2005 10:03 PM

Re: text analysis in python
 
Mark Winrock wrote:


>
> You might try http://web.media.mit.edu/~hugo/montylingua/
>
> "Liu, Hugo (2004). MontyLingua: An end-to-end natural
> language processor with common sense. Available
> at: web.media.mit.edu/~hugo/montylingua."



Thanks Mark. I've downloaded MontyLingua and it looks pretty cool. To
me, it seems like pretty much geared to people like myself who needs
something to process written text but do not need the hardcore bolts and
nuts of a computational linguistist. NLTK is more of the bolts and nuts
toolkit. GATE still seems more advanced than MontyLingua but to a
different end.

Is there anyone in this forum that is using or had used MontyLingua and
is happy to comment more on it? I'm happy to get more opinions.

Thanks and cheers
Maurice

Terry Reedy 04-03-2005 10:13 PM

Re: text analysis in python
 

"Maurice LING" <mauriceling@acm.org> wrote in message
news:424FF1CD.2040102@acm.org...
>Say I code my stuffs in Jython (importing java libraries) in a file
>"text.py"


Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.

>... Will there be any issues when I try to import text.py into CPython?


If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.

Terry J. Reedy




Steven Bethard 04-03-2005 10:19 PM

Re: text analysis in python
 
Maurice Ling wrote:
> In the Java world, there is GATE (general architecture for text
> engineering) and it seems very impressive. Are there something like that
> for Python?


I worked with GATE this last summer and really hated it. Can't decide
whether that was just my growing distaste for Java or actually the GATE
API. Anyway, if you're looking for something like GATE that (in my
experience) runs significantly faster, you should look at Ellogon
(www.ellogon.org). It's written in C and TCL, with C++, Java, Perl, and
Python bindings. And I believe, if you have any software already
written for GATE, Ellogon can run those modules directly. I've
personally never done so -- all my modules are written in Python (often
simple wrappers for things like MXPOST, MXTerminator, Charniak's parser,
etc.) I find the Python interface simple and easy to use, and they've
added a number of my suggestions to the API in the last release.

STeVe

Maurice LING 04-03-2005 11:36 PM

Re: text analysis in python
 
Terry Reedy wrote:

> "Maurice LING" <mauriceling@acm.org> wrote in message
> news:424FF1CD.2040102@acm.org...
>
>>Say I code my stuffs in Jython (importing java libraries) in a file
>>"text.py"

>
>
> Just to be clear, Jython is not a separate langague that you code *in*, but
> a separate implementation that you may slightly differently code *for*.
>

Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.

>
>>... Will there be any issues when I try to import text.py into CPython?

>
>
> If text.py is written in an appropriate version of Python, it itself will
> cause no problem. Hoqwever, when it imports javacode files, as opposed to
> CPython bytecode files, CPython will choke.
>

In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....

Cheers
Maurice

Dennis Lee Bieber 04-04-2005 01:30 AM

Re: text analysis in python
 
On Mon, 04 Apr 2005 09:36:32 +1000, Maurice LING <mauriceling@acm.org>
declaimed the following in comp.lang.python:

> >

> Yes, I do get this point rightly. Jython is just an implementation of
> Python virtual machine using Java. I do note that there are some


Pardon? I though Jython directly used the Java VM... It is not a
Python VM at all. It's the same language at the source level, but a
totally different back-end.

Hence, it requires the JVM to be able to run anything that
imports a Java library. Pure Python (source code) is compatible because
the two implementations will "compile" into either JVM byte code
(Jython) or classic Python byte code (CPython).

The CPython /run time/ has no facilities for interpreting JVM
byte code and can not, therefore, process Java library imports.
Similarly, the JVM has no facilities for interfacing with CPython
compiled libraries.

--
> ================================================== ============ <
> wlfraed@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
> wulfraed@dm.net | Bestiaria Support Staff <
> ================================================== ============ <
> Home Page: <http://www.dm.net/~wulfraed/> <
> Overflow Page: <http://wlfraed.home.netcom.com/> <



All times are GMT. The time now is 08:19 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.