Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Create a string array of all comments in a html file...

Reply
Thread Tools

Create a string array of all comments in a html file...

 
 
sophie_newbie
Guest
Posts: n/a
 
      09-30-2007
Hi, I'm wondering how i'd go about extracting a string array of all
comments in a HTML file, HTML comments obviously taking the format
"<!-- Comment text here -->".

I'm fairly stumped on how to do this? Maybe using regular expressions?

Thanks.

 
Reply With Quote
 
 
 
 
Robin Becker
Guest
Posts: n/a
 
      09-30-2007
sophie_newbie wrote:
> Hi, I'm wondering how i'd go about extracting a string array of all
> comments in a HTML file, HTML comments obviously taking the format
> "<!-- Comment text here -->".
>
> I'm fairly stumped on how to do this? Maybe using regular expressions?
>
> Thanks.
>

You should probably eat beautiful soup at

http://www.crummy.com/software/Beaut...mentation.html

which helps with this sort of task.
--
Robin Becker
 
Reply With Quote
 
 
 
 
William James
Guest
Posts: n/a
 
      09-30-2007
On Sep 30, 10:39 am, sophie_newbie <(E-Mail Removed)> wrote:
> Hi, I'm wondering how i'd go about extracting a string array of all
> comments in a HTML file, HTML comments obviously taking the format
> "<!-- Comment text here -->".
>
> I'm fairly stumped on how to do this? Maybe using regular expressions?
>
> Thanks.


E:\Ruby>irb --prompt xmp
"<!-- Comment
here -->And <i>so</i> funny!
<p>It was a dark and stormy night.
</p><!-- Comment <> -->".scan(/<!--.*?-->/m)
==>["<!-- Comment\nhere -->", "<!-- Comment <> -->"]

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      09-30-2007
On Sep 30, 10:39 am, sophie_newbie <(E-Mail Removed)> wrote:
> Hi, I'm wondering how i'd go about extracting a string array of all
> comments in a HTML file, HTML comments obviously taking the format
> "<!-- Comment text here -->".
>
> I'm fairly stumped on how to do this? Maybe using regular expressions?
>
> Thanks.


>>> from pyparsing import htmlComment
>>> htmlComment.searchString("""<!-- Comment

.... here -->And <i>so</i> funny!
.... </p><!-- Comment <> -->""").asList()
[['<!-- Comment \nhere -->'], ['<!-- Comment <> -->']]

-- Paul



 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      10-06-2007
sophie_newbie wrote:
> Hi, I'm wondering how i'd go about extracting a string array of all
> comments in a HTML file, HTML comments obviously taking the format
> "<!-- Comment text here -->".
>
> I'm fairly stumped on how to do this? Maybe using regular expressions?



from lxml import etree

parser = etree.HTMLParser()
tree = etree.parse("somefile.html", parser)

print tree.xpath("//comment()")


http://codespeak.net/lxml

Stefan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
A program to replace all JS comments with JSP comments in jsp files tungchau81@yahoo.com Javascript 4 06-03-2006 02:00 PM
A program to replace all JS comments with JSP comments in jsp files tungchau81@yahoo.com Java 0 06-02-2006 06:35 AM
Cannot create an object of type 'System.String[]' from its representation 'String[] Array' Hessam ASP .Net Building Controls 1 08-16-2003 10:26 AM
Cannot create an object of type 'System.String[]' from its representation 'String[] Array' Hessam ASP .Net Web Controls 0 08-08-2003 08:36 AM
Cannot create an object of type 'System.String[]' from its representation 'String[] Array' Hessam ASP .Net 0 08-08-2003 08:36 AM



Advertisments