![]() |
|
|
|||||||
![]() |
Java - Re: Junit - "Credible" HTML checker? |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
On Thu, 6 Aug 2009, bugbear wrote:
> I have some routines that generate HTML; > it would be useful if (in my unit testing) > I had a quick and dirty "is this valid HTML" test. > > I don't need an html renderer - something > cruddy based on "likely" looking regexps would > suit me very well. > > I'm simply trying to avoid doing full deploy + interactive > testing of stuff (html) which isn't even "likely". > > Does anyone know of anything? The Rolls-Royce here is HtmlUnit, which is a complete headless browser - it reads HTML, parses CSS, runs javascript (courtesy of Rhino), etc. It has interfaces which make it easy to ask questions like "get me all the div elements", "get me all the paragraph elements with class errorReport", "get me the text content of this element", etc, which is what you need for testing. It's built on top of NekoHTML, which is a pretty decent HTML parser. Other popular parsers are JTidy and TagSoup, but i think those are more lenient in their parsing (Neko can be lenient, but tends more towards strictness), and for what you want to do, you don't want leniency. Apologies for the lack of URLs, but you strike me as the kind of chap who is quite capable of using google! tom -- The sunlights differ, but there is only one darkness. -- Ursula K. LeGuin, 'The Dispossessed' Tom Anderson |
|
|
|
|
#2 |
|
Posts: n/a
|
Tom Anderson a écrit :
> On Thu, 6 Aug 2009, bugbear wrote: > >> I have some routines that generate HTML; >> it would be useful if (in my unit testing) >> I had a quick and dirty "is this valid HTML" test. >> >> I don't need an html renderer - something >> cruddy based on "likely" looking regexps would >> suit me very well. >> >> I'm simply trying to avoid doing full deploy + interactive >> testing of stuff (html) which isn't even "likely". >> >> Does anyone know of anything? > > The Rolls-Royce here is HtmlUnit, which is a complete headless browser - > it reads HTML, parses CSS, runs javascript (courtesy of Rhino), etc. It > has interfaces which make it easy to ask questions like "get me all the > div elements", "get me all the paragraph elements with class > errorReport", "get me the text content of this element", etc, which is > what you need for testing. > > It's built on top of NekoHTML, which is a pretty decent HTML parser. > Other popular parsers are JTidy and TagSoup, but i think those are more > lenient in their parsing (Neko can be lenient, but tends more towards > strictness), and for what you want to do, you don't want leniency. > > Apologies for the lack of URLs, but you strike me as the kind of chap > who is quite capable of using google! > > tom > The problem with HtmlUnit (in this particular case) is precisely that it tries to work like a real browser, which means that it'll do his best to give you a dom tree even if the HTML is not valid at all. JB. Jean-Baptiste Nizet |
|
|
|
#3 |
|
Posts: n/a
|
Jean-Baptiste Nizet wrote:
> Tom Anderson a écrit : >> On Thu, 6 Aug 2009, bugbear wrote: >>> I have some routines that generate HTML; >>> it would be useful if (in my unit testing) >>> I had a quick and dirty "is this valid HTML" test. >>> >>> I don't need an html renderer - something >>> cruddy based on "likely" looking regexps would >>> suit me very well. >>> >>> I'm simply trying to avoid doing full deploy + interactive >>> testing of stuff (html) which isn't even "likely". >>> >>> Does anyone know of anything? >> >> The Rolls-Royce here is HtmlUnit, which is a complete headless browser >> - it reads HTML, parses CSS, runs javascript (courtesy of Rhino), etc. >> It has interfaces which make it easy to ask questions like "get me all >> the div elements", "get me all the paragraph elements with class >> errorReport", "get me the text content of this element", etc, which is >> what you need for testing. >> >> It's built on top of NekoHTML, which is a pretty decent HTML parser. >> Other popular parsers are JTidy and TagSoup, but i think those are >> more lenient in their parsing (Neko can be lenient, but tends more >> towards strictness), and for what you want to do, you don't want >> leniency. >> >> Apologies for the lack of URLs, but you strike me as the kind of chap >> who is quite capable of using google! > > The problem with HtmlUnit (in this particular case) is precisely that it > tries to work like a real browser, which means that it'll do his best to > give you a dom tree even if the HTML is not valid at all. If super strict parsing is needed, then XHTML and a regular XML parser is an option. Arne Arne Vajhøj |
|
|
|
#4 |
|
Posts: n/a
|
Arne Vajhøj wrote:
> Jean-Baptiste Nizet wrote: >> Tom Anderson a écrit : >>> On Thu, 6 Aug 2009, bugbear wrote: >>>> I have some routines that generate HTML; >>>> it would be useful if (in my unit testing) >>>> I had a quick and dirty "is this valid HTML" test. >>>> >>>> I don't need an html renderer - something >>>> cruddy based on "likely" looking regexps would >>>> suit me very well. >>>> >>>> I'm simply trying to avoid doing full deploy + interactive >>>> testing of stuff (html) which isn't even "likely". >>>> >>>> Does anyone know of anything? >>> >>> The Rolls-Royce here is HtmlUnit, which is a complete headless >>> browser - it reads HTML, parses CSS, runs javascript (courtesy of >>> Rhino), etc. It has interfaces which make it easy to ask questions >>> like "get me all the div elements", "get me all the paragraph >>> elements with class errorReport", "get me the text content of this >>> element", etc, which is what you need for testing. >>> >>> It's built on top of NekoHTML, which is a pretty decent HTML parser. >>> Other popular parsers are JTidy and TagSoup, but i think those are >>> more lenient in their parsing (Neko can be lenient, but tends more >>> towards strictness), and for what you want to do, you don't want >>> leniency. >>> >>> Apologies for the lack of URLs, but you strike me as the kind of chap >>> who is quite capable of using google! >> >> The problem with HtmlUnit (in this particular case) is precisely that >> it tries to work like a real browser, which means that it'll do his >> best to give you a dom tree even if the HTML is not valid at all. > > If super strict parsing is needed, then XHTML and a regular XML > parser is an option. > > Arne some other options you might want to explore -- Validator might be the most appropriate if JTidy isn't: COBRA: http://lobobrowser.org/cobra.jsp Validator.nu: http://about.validator.nu/htmlparser/ HTMLCleaner: http://htmlcleaner.sourceforge.net/ Chris Riesbeck |
|
|
|
#5 |
|
Posts: n/a
|
On Fri, 7 Aug 2009, Jean-Baptiste Nizet wrote:
> Tom Anderson a ?crit : >> On Thu, 6 Aug 2009, bugbear wrote: >> >>> I have some routines that generate HTML; >>> it would be useful if (in my unit testing) >>> I had a quick and dirty "is this valid HTML" test. >>> >>> I don't need an html renderer - something >>> cruddy based on "likely" looking regexps would >>> suit me very well. >>> >>> I'm simply trying to avoid doing full deploy + interactive >>> testing of stuff (html) which isn't even "likely". >>> >>> Does anyone know of anything? >> >> The Rolls-Royce here is HtmlUnit, which is a complete headless browser - it >> reads HTML, parses CSS, runs javascript (courtesy of Rhino), etc. It has >> interfaces which make it easy to ask questions like "get me all the div >> elements", "get me all the paragraph elements with class errorReport", "get >> me the text content of this element", etc, which is what you need for >> testing. >> >> It's built on top of NekoHTML, which is a pretty decent HTML parser. Other >> popular parsers are JTidy and TagSoup, but i think those are more lenient >> in their parsing (Neko can be lenient, but tends more towards strictness), >> and for what you want to do, you don't want leniency. >> >> Apologies for the lack of URLs, but you strike me as the kind of chap who >> is quite capable of using google! > > The problem with HtmlUnit (in this particular case) is precisely that it > tries to work like a real browser, which means that it'll do his best to > give you a dom tree even if the HTML is not valid at all. Ah, but then it's simply a matter of bending the tool to your will. We modified HtmlUnit to XHTML - and amongst other things, that means being less tolerant of errors. Basically, we found HtmlUnit's central parsing class, the one which wraps NekoHTML, and changed the set of options it sets on Neko before a parse. We also had to modify a few other spots in the parser chain, ISTR. I'll dig out the details tomorrow, tom -- For various unconvincing reasons, your call may be recorded. Tom Anderson |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Accessibility Checker | Radley | General Help Related Topics | 0 | 01-10-2008 04:23 AM |
| VHDL code of odd parity checker | shakeelsultan | Hardware | 0 | 10-27-2006 06:27 PM |
| MSN Messenger Block Checker and Yahoo Block Checker | mianriz | Software | 0 | 07-30-2006 09:22 AM |
| Re: System File Checker - XP | jt | A+ Certification | 0 | 10-18-2003 11:45 PM |