Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   summarize text (http://www.velocityreviews.com/forums/t358147-summarize-text.html)

robin 05-29-2006 11:49 AM

summarize text
 
hello list,

does anyone know of a library which permits to summarise text? i've
been looking at nltk but haven't found anything yet. any help would be
very welcome.
thank you all in advance,

robin


Tim Chase 05-29-2006 01:13 PM

Re: summarize text
 
> does anyone know of a library which permits to summarise text?
> i've been looking at nltk but haven't found anything yet. any
> help would be very welcome.


Well, summarizing text is one of those things that generally
takes a brain-cell or two to do. Automating the process would
require doing it either smartly (some sort of
neural-net/NLP/Markov-chain technology, which is a non-trivial
task--something one might consider braving in the 3rd or 4th-year
of a university computer-science program), or doing it fairly
dumbly. As an example of a "dumb" solution, you can use regexps
to trim off the first few words and the last few words and call
that a "summary":

>>> import re
>>> r = re.compile(r'^(.{8}.*?\b)\s.*\s(\b.{8}.*?)', re.DOTALL)
>>> s = """This is the first line

.... and it has a second line
.... and a third line
.... and the last line is the fourth line."""
>>> result = r.sub(r"\1...\2",s.strip())
>>> result

'This is the...fourth line.'

You can adjust the "{8}" portions for more or less
leader/trailing context characters.

The regexp might need a bit of tweaking for somewhat short
strings, but if they're fairly short, one might not need to
summarize them ;)

-tkc







gene tani 05-29-2006 02:52 PM

Re: summarize text
 

robin wrote:
> hello list,
>
> does anyone know of a library which permits to summarise text? i've
> been looking at nltk but haven't found anything yet. any help would be


unclear what you're asking, maybe look at:
http://www.cs.waikato.ac.nz/~ml/weka/index.html

http://www.kdnuggets.com/software/suites.html
http://www.ailab.si/orange

http://mallet.cs.umass.edu/index.php/Main_Page
http://minorthird.sourceforge.net/
http://www.dia.uniroma3.it/db/roadRunner/

http://www.lemurproject.org/


robin 05-31-2006 09:59 AM

Re: summarize text
 
thanks for all your replies. lemur looks pretty interesting!
robin

gene tani wrote:
> robin wrote:
> > hello list,
> >
> > does anyone know of a library which permits to summarise text? i've
> > been looking at nltk but haven't found anything yet. any help would be

>
> unclear what you're asking, maybe look at:
> http://www.cs.waikato.ac.nz/~ml/weka/index.html
>
> http://www.kdnuggets.com/software/suites.html
> http://www.ailab.si/orange
>
> http://mallet.cs.umass.edu/index.php/Main_Page
> http://minorthird.sourceforge.net/
> http://www.dia.uniroma3.it/db/roadRunner/
>
> http://www.lemurproject.org/



Lawrence D'Oliveiro 06-05-2006 09:29 AM

Re: summarize text
 
.... sorry, I thought you said "summarize Proust".

:)


All times are GMT. The time now is 11:03 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57