Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > XML::Twig

Reply
Thread Tools

XML::Twig

 
 
c0rk
Guest
Posts: n/a
 
      09-25-2004
OK. I am now desperate. I have written a sub routine to slipt up large
(~2-3MB) XML documents into seperate documents. When I use $twig->
parsefile I get the following error:

"not well-formed (invalid token) at line 27072, column 1934, byte 878399
at C:/Perl/site/lib/XML/Parser.pm line 187"

When I change to $twig->safe_parsefile I can parse the document, but it
only gets a portion of the document (~38 of 83 elements).

I am the first to admit that I am not a Perl hack by trade, so please
don't rape me for my code sample. I should also mention that this code
worked great on smaller files ( <300k ).

Any help/suggestions would be greatly appreciated.

Brendan


sub splitFiles {
my $fPath = $_[0];
my $twig= new XML::Twig;
&logMessage("DEBUG - Build the Twig for " . $fPath);
$twig->safe_parsefile($fPath); # build the twig
&logMessage("DEBUG - I can parse the file");
my $root = $twig->root; # get the root of the twig
(vdf_metadata_list)
&logMessage("DEBUG - Videos: ". $root->children_count);
my @videos = $root->children; # put the vdf_metadata elements into
an array
if (scalar @videos > 0 ) {
&logMessage("DEBUG - Number of videos is " . scalar @videos);
my $i = 0;
foreach my $video (@videos) {
$i++;
my $timeStamp = gettimeofday;
my $tmpPath = "$tmpDir".$timeStamp.$i;
my $FH;
open($FH, ">$tmpPath") || die("cannot open file: " . $!);
$video->print($FH);
close (FH);
}
} else {
&logMessage("DEBUG - Skipping file " . $fPath);
}
}
 
Reply With Quote
 
 
 
 
Brian McCauley
Guest
Posts: n/a
 
      09-25-2004


c0rk wrote:
> OK. I am now desperate. I have written a sub routine to slipt up large
> (~2-3MB) XML documents into seperate documents. When I use $twig->
> parsefile I get the following error:
>
> "not well-formed (invalid token) at line 27072, column 1934, byte 878399
> at C:/Perl/site/lib/XML/Parser.pm line 187"


Well, in the absense of any evidence to the contrary I'm be inclined to
accept that at face value.

Do you have a reason to disbelive it?

 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      09-25-2004
c0rk <(E-Mail Removed)> wrote:

> When I use $twig->
> parsefile I get the following error:
>
> "not well-formed (invalid token) at line 27072, column 1934, byte 878399
> at C:/Perl/site/lib/XML/Parser.pm line 187"



This message means that there is something wrong with the _data_
rather than with the code.

Open the data file to the 1934th character on the 27072nd line
and see what it is that makes it invalid XML.



--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
c0rk
Guest
Posts: n/a
 
      09-26-2004
Brian McCauley <(E-Mail Removed)> wrote in
news:cj46h1$v2m$(E-Mail Removed):

>
>
> c0rk wrote:
>> OK. I am now desperate. I have written a sub routine to slipt up
>> large (~2-3MB) XML documents into seperate documents. When I use
>> $twig-> parsefile I get the following error:
>>
>> "not well-formed (invalid token) at line 27072, column 1934, byte
>> 878399 at C:/Perl/site/lib/XML/Parser.pm line 187"

>
> Well, in the absense of any evidence to the contrary I'm be inclined
> to accept that at face value.
>
> Do you have a reason to disbelive it?
>


Brian

You know - I have been working on this script since Thursday, trying to
determine _my_ problem. When I saw this error, I took it as there was an
error in my processing method (i.e. memory problem). For whatever reason, I
just didn't read the error message for what it was. Turns out that the XML
has bad characters in it. I replaced those characters and my script
processed a 3MB file in seconds.

Many thanks for your response!

-c
 
Reply With Quote
 
c0rk
Guest
Posts: n/a
 
      09-26-2004
Tad McClellan <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> c0rk <(E-Mail Removed)> wrote:
>
>> When I use $twig->
>> parsefile I get the following error:
>>
>> "not well-formed (invalid token) at line 27072, column 1934, byte
>> 878399 at C:/Perl/site/lib/XML/Parser.pm line 187"

>
>
> This message means that there is something wrong with the _data_
> rather than with the code.
>
> Open the data file to the 1934th character on the 27072nd line
> and see what it is that makes it invalid XML.
>
>
>


Tad,

thanks for the response. you are 100% correct. I replaced the bad
characters at the specified location, and life is good!!!

Thanks,

-c
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments