Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

Reply
Thread Tools

LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

 
 
Vlajko Knezic
Guest
Posts: n/a
 
      03-05-2005
Not so sure what is going on here but is something to do with the way UTF8
is handled in Perl and/or LibXML



The sctript below:

- accepts a value from a form text field;

- builds XML document around it,

- deparses the document to the string using toString(),

- parses the string into the XML document using parse_string()

- transforms XML document into HTML document using XSL
transformation



Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:

================================================== ===================

:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_text>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24================================================ =====================



I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.



Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.



Vlajko Knezic,

Toronto, Ontario



---------------------------------------------------------------------------------------------------------------------

test.cgi



#! c:/Perl/bin/Perl.exe



use CGI;

use XML::LibXML;

use XML::LibXSLT;

use CGI::Carp qw( fatalsToBrowser );

use Encode;



my $mDocument = XML::LibXML:ocument-> new();

my $parser = XML::LibXML->new();



$mDocument->setEncoding("UTF8");

my $mCGI = new CGI;

print $mCGI->header;

my $mTest_text = $mCGI->param('test');;



my $mTest = $mDocument-> createElement("test");

my $mTestText = $mDocument-> createElement("test_text");

$mTestText->appendTextNode($mTest_text);

$mTest->appendChild($mTestText);

$mDocument->setDocumentElement( $mTest );

$mDocument->setEncoding("UTF8");

my $mTestXML = $mDocument->toString();

my $mParsedTestXML = $parser->parse_string($mTestXML);



my $mParsedXMLXSL = $parser->parse_file('test.xsl');

my $mParserXSL = XML::LibXSLT->new();

my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);

my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);

my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);

print $mPrintPageHTML;



test.xsl



<?xml version="1.0"?>

<xsl:stylesheet xmlnssl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xslutput method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="yes"/>

<xsl:template match="//test">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

</head>

<html>

<body>

<xsl:value-of select="test_text"/>

<form name="test" type="post" target="_self">

<input type="text" name="test" /><input type="submit" name="button"/>

</form>

</body>

</html>

</xsl:template>

</xsl:stylesheet>






 
Reply With Quote
 
 
 
 
Joe Smith
Guest
Posts: n/a
 
      03-06-2005
Vlajko Knezic wrote:

> $mDocument->setEncoding("UTF8");
> my $mCGI = new CGI;
> my $mTest_text = $mCGI->param('test');;


This is the point, you need to encode $mTest_text into
UTF8 before doing anything with that string. You have
promised the XML library that you will be working with
UTF8, therefore it is up to you to ensure that everything
is UTF8 (not ISO8859-1).

Any further questions should be posted to comp.lang.perl.misc
and not this newsgroup (comp.lang.perl is defunct).
-Joe
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
given char* utf8, how to read unicode line by line, and output utf8 gry C++ 2 03-13-2012 04:32 AM
LibXML UTF8 - Input is not proper UTF-8, indicate encoding ! Vlajko Knezic Perl Misc 2 03-05-2005 05:19 PM
C++ libraries: Xerces, libxml/libxml++ or perhaps Arabica? Olav XML 3 01-20-2005 02:51 PM
Encoding.Default and Encoding.UTF8 Hardy Wang ASP .Net 5 06-09-2004 04:04 PM
Problems with libxml, XML::LibXML and Perl Ian Gregory XML 1 07-25-2003 04:20 PM



Advertisments