Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Parsing xhtml with libxml

Reply
Thread Tools

Parsing xhtml with libxml

 
 
Jon Smirl
Guest
Posts: n/a
 
      12-16-2005
If you get errors complaining of undefined entities like   when
parsing xhtml it means you need to install the DTD for xhtml 1.0 or
1.1.

Example of a doctype for xhtml 1.1:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

You want to install the DTDs locally following the model in /etc/xml.
If you don't libxml will fetch the DTD from www.w3.org each time you
parse a document. Needing to install these DTDs was not obvious to me
and should be part of the documentation. There a rpm for xhtml 1.0 -
"xhtml1-dtds-1.0-7". I couldn't find one for xhtml 1.1 so I downloaded
it piecemeal from w3.org.

Installing the DTD does not automatically turn on validation. If you
want to validate you need to turn it on:
XML:arser::default_validity_checking =3D TRUE

XML:arser::default_load_external_dtd controls the loading of the
'external subset' (the definition for the character entities like
&amp;. It is defaulted to TRUE.

XML:arser::default_load_external_dtd is broken. This fixes it.

Index: ruby_xml_parser.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /var/cvs/xml-tools/libxml-ruby/ruby_xml_parser.c,v
retrieving revision 1.1.1.1
diff -r1.1.1.1 ruby_xml_parser.c
274c274
< if (xmlSubstituteEntitiesDefaultValue)
---
> if (xmlLoadExtDtdDefaultValue)

916c916
< ruby_xml_parser_default_load_external_dtd_set,=
0);
---
> ruby_xml_parser_default_load_external_dtd_get,=

0);
918c918
< ruby_xml_parser_default_load_external_dtd_get,=
1);
---
> ruby_xml_parser_default_load_external_dtd_set,=

1);


Sam's patches for libxml are also needed:
http://www.intertwingly.net/blog/200...s-Ruby-binding


 
Reply With Quote
 
 
 
 
Eero Saynatkari
Guest
Posts: n/a
 
      12-16-2005
Jon Smirl wrote:
> If you get errors complaining of undefined entities like &nbsp; when
> parsing xhtml it means you need to install the DTD for xhtml 1.0 or
> 1.1.
>
> Example of a doctype for xhtml 1.1:
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
>
> <snip explanation & code due to ruby-forum.com />
>
> Sam's patches for libxml are also needed:
> http://www.intertwingly.net/blog/200...s-Ruby-binding


Thank you for this!



E
--
This document is NOT valid XHTML 1.0!

--
Posted via http://www.ruby-forum.com/.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XML Parsing Speed - ruby libxml & REXML subimage Ruby 11 06-08-2006 11:02 AM
C++ libraries: Xerces, libxml/libxml++ or perhaps Arabica? Olav XML 3 01-20-2005 02:51 PM
Parsing html by XML::libXML John7481 Perl Misc 2 08-13-2004 11:01 AM
libxml: Parsing XML Question? jwang C Programming 5 07-07-2004 08:22 PM
Problems with libxml, XML::LibXML and Perl Ian Gregory XML 1 07-25-2003 04:20 PM



Advertisments