Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > ASP .Net Web Services > Parsing HTML

Reply
Thread Tools

Parsing HTML

 
 
Mohammad-Reza
Guest
Posts: n/a
 
      02-23-2007
Hi
I want to parse a web page (in a web service) and retrive some of its
information. I googled the MSDN and found a walkthrough (How to: Create Web
Services That Parse the Contents of a Web Page) but the walkthrogh is a
little complex and the writer did not completly describe all the aspects of
the solution.
Could any one elaborate on this walkthrough? Or direct me to another (or
better) way to deal with such a problem.

Thanks in advance.
 
Reply With Quote
 
 
 
 
Scott M.
Guest
Posts: n/a
 
      02-23-2007
How about using the W3C Document Object Model, which was designed to do just
what you are trying to do?


"Mohammad-Reza" <> wrote in message
news:43674C30-008E-490F-8131-...
> Hi
> I want to parse a web page (in a web service) and retrive some of its
> information. I googled the MSDN and found a walkthrough (How to: Create
> Web
> Services That Parse the Contents of a Web Page) but the walkthrogh is a
> little complex and the writer did not completly describe all the aspects
> of
> the solution.
> Could any one elaborate on this walkthrough? Or direct me to another (or
> better) way to deal with such a problem.
>
> Thanks in advance.



 
Reply With Quote
 
 
 
 
Mohammad-Reza
Guest
Posts: n/a
 
      02-24-2007
I want to write a web service that extracts some information from a web page
and use that web service in a windows application. I think the usual solution
for parsing is a little bit slow and costs too much (getting HTML code and
finding the keys using loops). I want to know if there is any possible way in
..NET to simply extract those information (for example a method that returns
every HTML tag of the web page with its value)?
The process time of the web service is very important for me.

Thanks in advance.

"Scott M." wrote:

> How about using the W3C Document Object Model, which was designed to do just
> what you are trying to do?
>
>
> "Mohammad-Reza" <> wrote in message
> news:43674C30-008E-490F-8131-...
> > Hi
> > I want to parse a web page (in a web service) and retrive some of its
> > information. I googled the MSDN and found a walkthrough (How to: Create
> > Web
> > Services That Parse the Contents of a Web Page) but the walkthrogh is a
> > little complex and the writer did not completly describe all the aspects
> > of
> > the solution.
> > Could any one elaborate on this walkthrough? Or direct me to another (or
> > better) way to deal with such a problem.
> >
> > Thanks in advance.

>
>
>

 
Reply With Quote
 
Scott M.
Guest
Posts: n/a
 
      02-24-2007
I don't know where you have gotten your information, but this is exactly
what the DOM is for.


"Mohammad-Reza" <> wrote in message
news:F8BD50C4-EA65-44D4-9D0E-...
>I want to write a web service that extracts some information from a web
>page
> and use that web service in a windows application. I think the usual
> solution
> for parsing is a little bit slow and costs too much (getting HTML code and
> finding the keys using loops). I want to know if there is any possible way
> in
> .NET to simply extract those information (for example a method that
> returns
> every HTML tag of the web page with its value)?
> The process time of the web service is very important for me.
>
> Thanks in advance.
>
> "Scott M." wrote:
>
>> How about using the W3C Document Object Model, which was designed to do
>> just
>> what you are trying to do?
>>
>>
>> "Mohammad-Reza" <> wrote in message
>> news:43674C30-008E-490F-8131-...
>> > Hi
>> > I want to parse a web page (in a web service) and retrive some of its
>> > information. I googled the MSDN and found a walkthrough (How to: Create
>> > Web
>> > Services That Parse the Contents of a Web Page) but the walkthrogh is a
>> > little complex and the writer did not completly describe all the
>> > aspects
>> > of
>> > the solution.
>> > Could any one elaborate on this walkthrough? Or direct me to another
>> > (or
>> > better) way to deal with such a problem.
>> >
>> > Thanks in advance.

>>
>>
>>



 
Reply With Quote
 
John Saunders
Guest
Posts: n/a
 
      02-25-2007
"Scott M." <s-> wrote in message
news:...
>I don't know where you have gotten your information, but this is exactly
>what the DOM is for.


Scott,

I used this approach with a Windows Forms application back in 2001, with
..NET 1.0. It worked, but was a bit clumsy, and it was time-consuming. I used
the ActiveX Internet Browser control to load the page I was interested in,
and once the page was loaded, I could access the DOM from C# code. Did you
have a different technique in mind when you talk about the DOM?

Perhaps a faster technique would be to use regular expressions to parse the
HTML and find what you're looking for.

John


 
Reply With Quote
 
Scott M.
Guest
Posts: n/a
 
      02-25-2007
What I had in mind was, if the HTML in question was well-formed (XHTML), you
could just load it into an XMLDocument (from a string) object and use the
XML DOM to parse from there.



"John Saunders" <john.saunders at trizetto.com> wrote in message
news:...
> "Scott M." <s-> wrote in message
> news:...
>>I don't know where you have gotten your information, but this is exactly
>>what the DOM is for.

>
> Scott,
>
> I used this approach with a Windows Forms application back in 2001, with
> .NET 1.0. It worked, but was a bit clumsy, and it was time-consuming. I
> used the ActiveX Internet Browser control to load the page I was
> interested in, and once the page was loaded, I could access the DOM from
> C# code. Did you have a different technique in mind when you talk about
> the DOM?
>
> Perhaps a faster technique would be to use regular expressions to parse
> the HTML and find what you're looking for.
>
> John
>
>



 
Reply With Quote
 
Mohammad-Reza
Guest
Posts: n/a
 
      02-26-2007
Can you give a sample code for loading XHTML to a XMLDocument?

"Scott M." wrote:

> What I had in mind was, if the HTML in question was well-formed (XHTML), you
> could just load it into an XMLDocument (from a string) object and use the
> XML DOM to parse from there.
>
>
>
> "John Saunders" <john.saunders at trizetto.com> wrote in message
> news:...
> > "Scott M." <s-> wrote in message
> > news:...
> >>I don't know where you have gotten your information, but this is exactly
> >>what the DOM is for.

> >
> > Scott,
> >
> > I used this approach with a Windows Forms application back in 2001, with
> > .NET 1.0. It worked, but was a bit clumsy, and it was time-consuming. I
> > used the ActiveX Internet Browser control to load the page I was
> > interested in, and once the page was loaded, I could access the DOM from
> > C# code. Did you have a different technique in mind when you talk about
> > the DOM?
> >
> > Perhaps a faster technique would be to use regular expressions to parse
> > the HTML and find what you're looking for.
> >
> > John
> >
> >

>
>
>

 
Reply With Quote
 
Scott M.
Guest
Posts: n/a
 
      02-26-2007
Well, XHTML is XML, so you'd really be loading XML into an XMLDocument, but
once it's loaded, you can parse out whatever you like using the DOM.

Dim xmlDoc As New System.XML.XMLDocument()
'You can load the XML in one of two ways...

'docPath represents a path to an file containing the XML
xmlDoc.Load(docPath)

'or
'Here you can load a string directly
xmlDoc.LoadXML(string)

'Example of getting all the paragraph tags and then the text of the first
one using the DOM...
dim theParagraphs As XMLNodeList = xmlDoc.GetElementsByTagName("P")
dim firstParagraphText As String = theParagraphs(0).Text


-Scott


"Mohammad-Reza" <> wrote in message
news:9980C3BA-D1A3-4BDC-B1FF-...
> Can you give a sample code for loading XHTML to a XMLDocument?
>
> "Scott M." wrote:
>
>> What I had in mind was, if the HTML in question was well-formed (XHTML),
>> you
>> could just load it into an XMLDocument (from a string) object and use the
>> XML DOM to parse from there.
>>
>>
>>
>> "John Saunders" <john.saunders at trizetto.com> wrote in message
>> news:...
>> > "Scott M." <s-> wrote in message
>> > news:...
>> >>I don't know where you have gotten your information, but this is
>> >>exactly
>> >>what the DOM is for.
>> >
>> > Scott,
>> >
>> > I used this approach with a Windows Forms application back in 2001,
>> > with
>> > .NET 1.0. It worked, but was a bit clumsy, and it was time-consuming. I
>> > used the ActiveX Internet Browser control to load the page I was
>> > interested in, and once the page was loaded, I could access the DOM
>> > from
>> > C# code. Did you have a different technique in mind when you talk about
>> > the DOM?
>> >
>> > Perhaps a faster technique would be to use regular expressions to parse
>> > the HTML and find what you're looking for.
>> >
>> > John
>> >
>> >

>>
>>
>>



 
Reply With Quote
 
John Saunders
Guest
Posts: n/a
 
      02-26-2007
"Scott M." <s-> wrote in message
news:...
> What I had in mind was, if the HTML in question was well-formed (XHTML),
> you could just load it into an XMLDocument (from a string) object and use
> the XML DOM to parse from there.


That works well for XHTML. The problem is that most web sites are still
using HTML, which is not well-formed XML.

John


 
Reply With Quote
 
Scott M.
Guest
Posts: n/a
 
      02-26-2007
But, we're not talking about most web pages. We are talking about a
particular page that is being used with a web service. In other words, it's
part of the OP's applicaiton, which he should have some control over.


"John Saunders" <john.saunders at trizetto.com> wrote in message
news:...
> "Scott M." <s-> wrote in message
> news:...
>> What I had in mind was, if the HTML in question was well-formed (XHTML),
>> you could just load it into an XMLDocument (from a string) object and use
>> the XML DOM to parse from there.

>
> That works well for XHTML. The problem is that most web sites are still
> using HTML, which is not well-formed XML.
>
> John
>
>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing HTML with HTML::Tree Ninja Li Perl Misc 1 03-01-2010 03:37 PM
Parsing HTML with HTML::TableExtract Ninja Li Perl Misc 2 11-28-2009 12:43 AM
Parsing HTML - using HTML::TreeBuilder olson_ord@yahoo.it Perl Misc 7 10-06-2006 06:33 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57