Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Finding and replacing Invalid Tokens in an XML document

Reply
Thread Tools

Finding and replacing Invalid Tokens in an XML document

 
 
Ben Holness
Guest
Posts: n/a
 
      01-06-2006
Hi all,

I have a system which allows users to enter a message on a (PHP) website.
This message is then put into a (MySQL) Database.

A perl script then picks up the message and creates an XML document.

The webpages, database and XML are all UTF-8, however every now and then I
get an error in the XML parser that tells me I have an invalid token. This
occurs when the message contains particular characters, although I don't
know which characters - all I can see in the logs is the ANSI
representation (e.g. @^C). If I copy & paste into word the I get a square
box after the @ that takes two right cursor presses to go past.

My script catches that there is an invalid token, but rather than fail the
message completely, I would like to replace the bad characters with a
space.
Is there a simple way to find these characters, or do I have to
write a function that looks at the output of $@ from the eval and work out
where the character is from the line/column/byte information in order to
fix it?

FYI, the XML is created and parsed with XML::Simple and UTF-8 encoded with
encode. I have included a simplified snippet (written into this post, so
may contain typos) at the end of the email.

Cheers,

Ben

-- Snippet of Code --

# $MessageText is pulled from the database and may contain bad
characters.

# Build an array of the elements
my %arr;
$arr{'Message'}=encode("UTF-8", $MessageText);

# Convert the array into an XML Document with XMLOut
my $tempxml = new XML::Simple (NoAttr=>1, RootName=>'WebMessage');
my $xmldoc = "<?xml version=\"1.0\" encoding=\"UTF-8\">";
$xmldoc .= $tempxml->XMLout(\$arr);

# Parse the XML Document
my $tempxml2 = new XML::Simple (ForceArray => 1);
eval ($tempxml2->XMLin($xmldoc);};
if ($@)
{
# An error occurred. Usually an invalid token due to a bad character
# in $MessageText
}

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing - and not Replacing... Rob Meade ASP General 5 04-11-2005 06:49 PM
Refer to an XML document from within an XML document Manish Hatwalne XML 1 07-13-2004 10:24 AM
Help on including one XML document within another XML document using XML Schemas Tony Prichard XML 0 12-12-2003 03:18 PM
Xalan document() function finding wrong document root Steve Carrow XML 0 07-28-2003 02:28 AM
Xalan document() function finding wrong document root Steve Carrow Java 0 07-28-2003 02:28 AM



Advertisments