Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Problem with perlsax splitting the calls to characters callback

Reply
Thread Tools

Problem with perlsax splitting the calls to characters callback

 
 
raga
Guest
Posts: n/a
 
      10-13-2008
From the link given here :
http://search.cpan.org/~kmacleod/lib...oc/PerlSAX.pod
Perl sax seems to split the characters call for a single entity.
Though this is wierd.(not sure if there is a genuine reason) it is
fine.. as all belong to same entity, we can simply append all the
characters calls.
However ,sadly it just calls the characters api with an unwanted
space.
Eg: i've tag < tag1>mynameisrs</tag>
it calls characters("myname") characters(" ") characters("isrs") ,
It is not atall predictible why it is doing this way. coz the problem
is when i append it becomes "myname isrs".
Any help is appreciated.
Thanks
 
Reply With Quote
 
 
 
 
RedGrittyBrick
Guest
Posts: n/a
 
      10-13-2008

raga wrote:
> From the link given here :
> http://search.cpan.org/~kmacleod/lib...oc/PerlSAX.pod
> Perl sax seems to split the characters call for a single entity.
> Though this is wierd.(not sure if there is a genuine reason) it is
> fine.. as all belong to same entity, we can simply append all the
> characters calls.


The URL you provide says this:

"The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks;"


> However ,sadly it just calls the characters api with an unwanted
> space.
> Eg: i've tag < tag1>mynameisrs</tag>


That isn't well formed XML and so cant be parsed.
1. you have a space in front of the firts tag name.
2. you open tag1 but close tag.


> it calls characters("myname") characters(" ") characters("isrs") ,
> It is not atall predictible why it is doing this way.


In my experience it is always sufficiently predictable. Probably your
mynameisrs data is split over several lines and you've not written your
handler to take this into account.


$ cat sax.pl
#!/usr/local/bin/perl
use strict;
use warnings;
use XML:arser:erlSAX;

my $xml="<tag>mynameisrs</tag>";

my $handler = MyHandler->new();
my $parser = XML:arser:erlSAX->new(Handler=>$handler);

$parser->parse($xml);


package MyHandler;
use strict;
use warnings;
use Data:umper;

sub new {
my $type = shift;
return bless {}, $type;
}

my $current_element = '';

sub start_element {
my ($self, $element) = @_;
$current_element = $element->{Name};
print "Start: <$current_element>\n";
}

sub end_element {
my ($self, $element) = @_;
print "End: \n";
}

sub characters {
my ($self, $characters) = @_;
my $text = $characters->{Data};
print "Characters: '$text'\n";
}

1;


$ perl sax.pl
Start: <tag>
Characters: 'mynameisrs'
End:



--
RGB
 
Reply With Quote
 
 
 
 
raga
Guest
Posts: n/a
 
      10-13-2008
On Oct 13, 5:06*pm, RedGrittyBrick <RedGrittyBr...@spamweary.invalid>
wrote:
> raga wrote:
> > From the link given here :
> >http://search.cpan.org/~kmacleod/lib...oc/PerlSAX.pod
> > Perl sax seems to split the characters call for a single entity.
> > Though this is wierd.(not sure if there is a genuine reason) *it is
> > fine.. as all belong to same entity, we can simply append all the
> > characters calls.

>
> The URL you provide says this:
>
> "The Parser will call this method to report each chunk of character
> data. SAX parsers may return all contiguous character data in a single
> chunk, or they may split it into several chunks;"
>
> > However ,sadly it just calls the characters api with an unwanted
> > space.
> > Eg: i've tag < tag1>mynameisrs</tag>

>
> That isn't well formed XML and so cant be parsed.
> 1. you have a space in front of the firts tag name.
> 2. you open tag1 but close tag.
>
> > it calls characters("myname") characters(" ") characters("isrs") ,
> > It is not atall predictible why it is doing this way.

>
> In my experience it is always sufficiently predictable. Probably your
> mynameisrs data is split over several lines and you've not written your
> handler to take this into account.
>
> $ cat sax.pl
> #!/usr/local/bin/perl
> use strict;
> use warnings;
> use XML:arser:erlSAX;
>
> my $xml="<tag>mynameisrs</tag>";
>
> my $handler = MyHandler->new();
> my $parser = XML:arser:erlSAX->new(Handler=>$handler);
>
> $parser->parse($xml);
>
> package MyHandler;
> use strict;
> use warnings;
> use Data:umper;
>
> sub new {
> * *my $type = shift;
> * *return bless {}, $type;
>
> }
>
> my $current_element = '';
>
> sub start_element {
> * * *my ($self, $element) = @_;
> * * *$current_element = $element->{Name};
> * * *print "Start: <$current_element>\n";
>
> }
>
> sub end_element {
> * * *my ($self, $element) = @_;
> * * *print "End: \n";
>
> }
>
> sub characters {
> * * *my ($self, $characters) = @_;
> * * *my $text = $characters->{Data};
> * * *print "Characters: '$text'\n";
>
> }
>
> 1;
>
> $ perl sax.pl
> Start: <tag>
> Characters: 'mynameisrs'
> End:
>
> --
> RGB


sorry for the wrong input provided earlier.. it was my hurry to type
quickly
i intended to type <tag>mynameisrs</tag>

Yes, the perlsax occasionally splits the chars to multiple calls. ur
snip doesnt seems to handle it!.
My actual query is in addition to the calls made to the charchters api
with the split chunks, it randomly calls the characters API with a
unwanted space..
Thanks again for ur earlier reply.
 
Reply With Quote
 
RedGrittyBrick
Guest
Posts: n/a
 
      10-13-2008

raga wrote:
> On Oct 13, 5:06 pm, RedGrittyBrick <RedGrittyBr...@spamweary.invalid>
> wrote:
>> raga wrote:
>>> From the link given here :
>>> http://search.cpan.org/~kmacleod/lib...oc/PerlSAX.pod
>>> Perl sax seems to split the characters call for a single entity.
>>> Though this is wierd.(not sure if there is a genuine reason) it is
>>> fine.. as all belong to same entity, we can simply append all the
>>> characters calls.

>> The URL you provide says this:
>>
>> "The Parser will call this method to report each chunk of character
>> data. SAX parsers may return all contiguous character data in a single
>> chunk, or they may split it into several chunks;"
>>
>>> However ,sadly it just calls the characters api with an unwanted
>>> space.
>>> Eg: i've tag < tag1>mynameisrs</tag>

>> That isn't well formed XML and so cant be parsed.
>> 1. you have a space in front of the firts tag name.
>> 2. you open tag1 but close tag.
>>
>>> it calls characters("myname") characters(" ") characters("isrs") ,
>>> It is not atall predictible why it is doing this way.

>> In my experience it is always sufficiently predictable. Probably your
>> mynameisrs data is split over several lines and you've not written your
>> handler to take this into account.
>> [perl program omitted]

>
> sorry for the wrong input provided earlier.. it was my hurry to type
> quickly
> i intended to type <tag>mynameisrs</tag>
>
> Yes, the perlsax occasionally splits the chars to multiple calls. ur
> snip doesnt seems to handle it!.


My program wasn't intended to handle it, it was intended to show that no
unexpected space characters are inserted.

> My actual query is in addition to the calls made to the charchters api
> with the split chunks, it randomly calls the characters API with a
> unwanted space..


It never does for me!

Create and post a short working program that shows it!

--
RGB
 
Reply With Quote
 
Tad J McClellan
Guest
Posts: n/a
 
      10-13-2008
raga <> wrote:


> sorry for the wrong input provided earlier.. it was my hurry to type
> quickly



You should not attempt to type code or data at all.

You should instead copy/paste it so that you do not insert
errors that are not in your real code or data.

Please see the Posting Guidelines that are posted here frequently.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting a string into groups of three characters lemon97@gmail.com Python 12 08-08-2005 10:29 PM
Windows 2003 IIS 6.0 SMTP splitting and adding extra characters =?Utf-8?B?QlBOQQ==?= ASP .Net 0 10-07-2004 10:45 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Dibling C++ 0 07-19-2003 04:41 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? Mark C++ 0 07-19-2003 04:24 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Ericson C++ 0 07-19-2003 04:03 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57