Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > perl, XML::LibXML: encoding problems while changing attributes on an XML string

Reply
Thread Tools

perl, XML::LibXML: encoding problems while changing attributes on an XML string

 
 
kellner
Guest
Posts: n/a
 
      07-23-2006
Hello,

I'm parsing a chunk of XML code and would like to add attribute values
to individual tags if these are lacking. This is with perl 5.8.6,
libxml2 2.6.17, XML::LibXML 1.58.

Basically, I have the parser add the attribute values to the respective
nodes and then use the toString method of XML::LibXML:ocument to
write the modified text to a scalar. Both the original and the modified
text evaluate properly as utf8, but the modified text doesn't print
properly on the console, nor does it get entered as utf8 into a MySQL
database.

I don't really understand what's going on, and on what level the
error(s) could be located (console encoding, perl encoding, XML
encoding), and would appreciate any help I can get ...

Here's the code:
------------------------------------------------

#!/usr/bin/perl

use strict;
use XML::LibXML;
use Encode 'decode_utf8';
use vars qw ($parser $p);
$parser = XML::LibXML->new();
my $version = XML::LibXML::LIBXML_DOTTED_VERSION;
print "libxml2 $version\n-------------\nXML::LibXML
$XML::LibXML::VERSION\n-------------------\n";


$p->{text} = qq|
<p>
<q who="Blabla">pramāṇavārttikasvavṛtti*īkā </q> And this is
some further text.<br/>And even more text.<br/>And more.
<q who="Blabla2">The second quotation!</q>.
pramāṇavārttikasvavṛtti*īkā.
</p>|;

my $a = &validate_text($p->{text});
print "$a \n";

sub validate_text {
my $text = shift;
if (decode_utf8($text)) { print "TEXT is utf8\n";} else { print "is not
utf8\n";}
print "TESTING $text\n";
my $id = 1;
my $doc = $parser->parse_string($text);
my $root = $doc->getDocumentElement;

my @quotations = $root->findnodes('q');
foreach my $q (@quotations) {
unless ($q->hasAttribute('id')) { print "NO ID\n";
$q->setAttribute('id', "$id"); ++$id;}
else { print "HAS ID\n";}
my $id_new = $q->getAttribute('id');
print "NEW ID: $id_new\n";
}

my $newtext= $root->toString;
if (decode_utf8($newtext)) { print "NEW TEXT is utf8\n";} else { print
"is not utf8\n";}
return ($newtext);
}
------------------------------------------------------------

I know that I can set a document encoding by creating a new $doc
altogether, but I don't want to do this in this case, as the
createDocument method prepends an xml version string to the created
document, and this messes up the routines which process the code
afterwards.

Thanks in advance,

Birgit Kellner

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ElementTree.XML(string XML) and ElementTree.fromstring(string XML)not working Kee Nethery Python 12 06-27-2009 06:06 AM
web.xml / XML schema issue, why do some XML schema attributes disappear asciz@starmail.com Java 3 02-20-2007 09:56 AM
changing JVM encoding; setting -Dfile.encoding doesn't work pasmol@plusnet.pl Java 1 10-08-2004 09:50 PM
Encoding problems / Perl 5.8.0 / XML::LibXML / XML::LibXSLT Iain XML 2 12-15-2003 07:33 PM
Encoding problems / Perl 5.8.0 / XML::LibXML / XML::LibXSLT Iain Perl Misc 1 12-15-2003 05:28 PM



Advertisments