Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: how to get text from a html file?

Reply
Thread Tools

Re: how to get text from a html file?

 
 
Chris Colbert
Guest
Posts: n/a
 
      04-13-2010
On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <> wrote:
>
> Hi,
> Can anyone tell me how to get text from a html file?I am trying to display
> the text of an html file in textview(of glade).If i directly display the
> file,it shows with html tags and attributes, etc. in textview.I don't want
> that.I just want the text.
> Can someone help me with this?
>
>
> Regards
> Varnika Tewari
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


You should look into beautiful soup

http://www.crummy.com/software/BeautifulSoup/
 
Reply With Quote
 
 
 
 
Grant Edwards
Guest
Posts: n/a
 
      04-13-2010
On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <> wrote:

> Can anyone tell me how to get text from a html file?I am trying to display
> the text of an html file in textview(of glade).If i directly display the
> file,it shows with html tags and attributes, etc. in textview.I don't want
> that.I just want the text.


[Parent article is unavailable on gmane, so my reply isn't quite in
the right place in the tree]

I generally just use something like this:

Popen(['w3m','-dump',filename],stdout=PIPE).stdout.read()

I'm sure there are more complex ways...

--
Grant Edwards grant.b.edwards Yow! I'm having fun
at HITCHHIKING to CINCINNATI
gmail.com or FAR ROCKAWAY!!
 
Reply With Quote
 
 
 
 
rake
Guest
Posts: n/a
 
      04-14-2010
On Apr 13, 2:12*pm, Chris Colbert <sccolb...@gmail.com> wrote:
> On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <varnika...@gmail.com> wrote:
>
> > Hi,
> > Can anyone tell me how to get text from a html file?I am trying to display
> > the text of an html file in textview(of glade).If i directly display the
> > file,it shows with html tags and attributes, etc. in textview.I don't want
> > that.I just want the text.
> > Can someone help me with this?

>
> > Regards
> > Varnika Tewari

>
> > --
> >http://mail.python.org/mailman/listinfo/python-list

>
> You should look into beautiful soup
>
> http://www.crummy.com/software/BeautifulSoup/


For more complex parsing beautiful soup is definitely the way to go.

However, if all you want to do is strip the html and keep all
remaining text I'd recommend pyparsing package with this short script:

http://pyparsing.wikispaces.com/file...tmlStripper.py
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      04-14-2010
rake, 14.04.2010 02:45:
> On Apr 13, 2:12 pm, Chris Colbert wrote:
>> You should look into beautiful soup
>>
>> http://www.crummy.com/software/BeautifulSoup/

>
> For more complex parsing beautiful soup is definitely the way to go.


Why would a library that even the author has lost interest in be "the way
to go"?

Stefan

 
Reply With Quote
 
Emile van Sebille
Guest
Posts: n/a
 
      04-14-2010
On 4/13/2010 11:43 PM Stefan Behnel said...
> rake, 14.04.2010 02:45:
>> On Apr 13, 2:12 pm, Chris Colbert wrote:
>>> You should look into beautiful soup
>>>
>>> http://www.crummy.com/software/BeautifulSoup/

>>
>> For more complex parsing beautiful soup is definitely the way to go.

>
> Why would a library that even the author has lost interest in be "the
> way to go"?
>
> Stefan
>

Why not when the recent release dates from only five days ago?

Emile

 
Reply With Quote
 
Grant Edwards
Guest
Posts: n/a
 
      04-14-2010
On 2010-04-14, Stefan Behnel <> wrote:
>> On Apr 13, 2:12 pm, Chris Colbert wrote:
>>> You should look into beautiful soup
>>>
>>> http://www.crummy.com/software/BeautifulSoup/

>>
>> For more complex parsing beautiful soup is definitely the way to go.

>
> Why would a library that even the author has lost interest in be "the way
> to go"?


Sure, if the library is still being maintained. I can't think of too
many open-source projects where somebody else hasn't taken over from
the original author.

--
Grant Edwards grant.b.edwards Yow! I'm dressing up in
at an ill-fitting IVY-LEAGUE
gmail.com SUIT!! Too late...
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      04-14-2010
Emile van Sebille, 14.04.2010 15:24:
> On 4/13/2010 11:43 PM Stefan Behnel said...
>> rake, 14.04.2010 02:45:
>>> On Apr 13, 2:12 pm, Chris Colbert wrote:
>>>> You should look into beautiful soup
>>>>
>>>> http://www.crummy.com/software/BeautifulSoup/
>>>
>>> For more complex parsing beautiful soup is definitely the way to go.

>>
>> Why would a library that even the author has lost interest in be "the
>> way to go"?

>
> Why not when the recent release dates from only five days ago?


Interesting, even the web site has had a revamp.

Nice - I like competition.

Stefan

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Get ubuntu ! Get ubuntu ! Get ubuntu ! Get ubuntu ! Getubuntu Windows 64bit 1 06-01-2009 08:54 AM
Controlling text in a Text Area or Text leo ASP General 1 12-05-2005 01:13 AM
How to get the text in html tag.like<div...><font...>Text</font></ =?Utf-8?B?Tmlja3k=?= ASP .Net 2 02-20-2005 03:03 PM
WebService-Error: "the found request-type is 'text/html; charset=utf-8', but 'text/xml' was expected" ASP .Net 0 02-20-2004 03:37 PM



Advertisments