Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Problem with SaxParser. Works Occasionally.

Reply
Thread Tools

Problem with SaxParser. Works Occasionally.

 
 
stacey
Guest
Posts: n/a
 
      02-27-2007
Hello everyone,

I am using SaxParser, to parse an xml document, and i noticed that
sometimes it ignores some data, while reading the characters inside an
element.
What i mean is:

My xml file includes many instances of the following structure:

<pk>
<absi>769.9541477864069</absi>
<area>170.0227589457148</area>
<background>0.0000000000000000</background>
<chisq>1202.267500954470</chisq>
<goodn>1607.355201650164</goodn>
<ind>1106.121302500338</ind>
<lind>1082.000000000000</lind>
<mass>922.4428373952809</mass>
<meth>4</meth>
<reso>5700.586009913091</reso>
<rind>1256.000000000000</rind>
<s2n>6.073893519996163</s2n>
<type>0</type>
</pk>

My java code is quite big, but i debugged it, and i saw that the
problem is in the function characters.
>From every of the above structures i want to get the absi and mass

value, and i write them in a file.

The function characters:
public void characters(char buf[], int offset, int len) throws
SAXException {

String s = new String(buf, offset, len);

The s string sometimes is not as big as it should. Can we define the
offset and the len?

The problem is when it reaches the <mass> element. Sometimes it works
ok, and it reads all the number 922.4428373952809.
But other times, when let say the mass value is 1455.668578151738,
the result is just 14 .

Do you have any idea what is the error?? I can post my code, but i
didn't do it know cause this post is already too big. Maybe someone
has already encountered the error..


I would appreciate any help.

Thank you very very much,

Stacey


PS:
I have many files like this, and i have noticed that my output files
have a similar structure. Fault every four lines for some time and
then it correct. :

922.4428373952809 769.9541477864069
927.4953784899038 37191.92095290756
933.5252259507145 8110.517189035567
940.4653099147753 9868.125486196035
941.518898 4381.813320162202 <------------ we
lose some numbers
947.5404155021193 2787.439831966368
954.4881638784998 392.1130341071628
965.4569335866441 1401.545504962355
978.4210869646715 438.9369494573742
984.4917 886.1359194417274 <------------ we
lose more numbers
1003.550150977367 497.0759433683916
1017.529625718612 3169.151170705610
1055.582684542875 4943.314163415449
1066.107033179408 443.6729762946884
1074.5 3354.853245126279 <------------ we
lose more numbers
1076.531768475310 646.8024403962839
1083.527120777254 498.7760684249872
1311.697689088619 16369.00024571709
1325.729755074315 287.2714228497898
1349 637.2393867567375 <------------
the number now is integer (error)
1373.598186480195 223.2986354292584
1385.548026377931 431.7554665051648
1387.553176811347 268.0273520356520
1443.594333307889 1317.685936747487
14 661.7093703067692 <------- It
should have read: 1455.668578151738
1457.610697327313 768.3194420301912
1467.786043204199 3468.484434990418
1546.734272041830 565.9503240406932
1552.544423206343 610.4527352860962
1566.639317869258 308.7611649665076
1575.708923737670 1524.695259940473

(the rest of the file is ok)

 
Reply With Quote
 
 
 
 
Chris Uppal
Guest
Posts: n/a
 
      02-27-2007
stacey wrote:

> The function characters:
> public void characters(char buf[], int offset, int len) throws
> SAXException {
>
> String s = new String(buf, offset, len);
>
> The s string sometimes is not as big as it should. Can we define the
> offset and the len?
>
> The problem is when it reaches the <mass> element. Sometimes it works
> ok, and it reads all the number 922.4428373952809.
> But other times, when let say the mass value is 1455.668578151738,
> the result is just 14 .


Are you assuming that characters() will always be called with all the text in
one call ? If so, then don't because it won't. SAX may supply
"1455.668578151738" in as many separate peices as it wants to -- even in 17
different calls with one character each.

-- chris


 
Reply With Quote
 
 
 
 
Thomas Fritsch
Guest
Posts: n/a
 
      02-27-2007
stacey wrote:
> I am using SaxParser, to parse an xml document, and i noticed that
> sometimes it ignores some data, while reading the characters inside an
> element.
> What i mean is:
>
> My xml file includes many instances of the following structure:
>
> <pk>
> <absi>769.9541477864069</absi>
> <area>170.0227589457148</area>
> <background>0.0000000000000000</background>
> <chisq>1202.267500954470</chisq>
> <goodn>1607.355201650164</goodn>
> <ind>1106.121302500338</ind>
> <lind>1082.000000000000</lind>
> <mass>922.4428373952809</mass>
> <meth>4</meth>
> <reso>5700.586009913091</reso>
> <rind>1256.000000000000</rind>
> <s2n>6.073893519996163</s2n>
> <type>0</type>
> </pk>
>
> My java code is quite big, but i debugged it, and i saw that the
> problem is in the function characters.
>>From every of the above structures i want to get the absi and mass

> value, and i write them in a file.
>
> The function characters:
> public void characters(char buf[], int offset, int len) throws
> SAXException {
>
> String s = new String(buf, offset, len);
>
> The s string sometimes is not as big as it should. Can we define the
> offset and the len?
>
> The problem is when it reaches the <mass> element. Sometimes it works
> ok, and it reads all the number 922.4428373952809.
> But other times, when let say the mass value is 1455.668578151738,
> the result is just 14 .

A wild guess:
Could it be, that sometimes the parser passes the characters in two chunks
instead of in one chunk?
For example: In most cases the parser might call your handler like this:
beginElement(...) // <mass>
characters(...) // 1455.668578151738
endElement(...) // </mass>
But in some rare cases the parser might call your handler like this:
beginElement(...) // <mass>
characters(...) // 14
characters(...) // 55.668578151738
endElement(...) // </mass>

Note that both ways are perfectly well according to the SAX specification.
Hence your content handler has to cope with the possibility of multiple
chunks (probably by concatenating the chunks to one string).

--
Thomas
 
Reply With Quote
 
angrybaldguy@gmail.com
Guest
Posts: n/a
 
      02-27-2007
On Feb 27, 9:21 am, "stacey" <(E-Mail Removed)> wrote:

> I am using SaxParser, to parse an xml document, and i noticed that
> sometimes it ignores some data, while reading the characters inside an
> element.
>
> My xml file includes many instances of the following structure:
>
> <pk>
> <absi>769.9541477864069</absi>
> <area>170.0227589457148</area>
> <background>0.0000000000000000</background>
> <chisq>1202.267500954470</chisq>
> <goodn>1607.355201650164</goodn>
> <ind>1106.121302500338</ind>
> <lind>1082.000000000000</lind>
> <mass>922.4428373952809</mass>
> <meth>4</meth>
> <reso>5700.586009913091</reso>
> <rind>1256.000000000000</rind>
> <s2n>6.073893519996163</s2n>
> <type>0</type>
> </pk>
>
> My java code is quite big, but i debugged it, and i saw that the
> problem is in the function characters.>From every of the above structures i want to get the absi and mass


> The problem is when it reaches the <mass> element. Sometimes it works
> ok, and it reads all the number 922.4428373952809.
> But other times, when let say the mass value is 1455.668578151738,
> the result is just 14 .


This is not a bug, actually. SAX doesn't guarantee that text nodes
will be delivered in a single call to the "characters" method -- in
the example you gave you should get characters("14") and then
characters("55.66....") immediately afterwards; it's your
responsibility to stitch these back into a single string.

Put off parsing the string into a number until you see the
corresponding endElement call.

Owen

 
Reply With Quote
 
Sem
Guest
Posts: n/a
 
      02-27-2007
On Feb 27, 12:56 pm, "Chris Uppal" <(E-Mail Removed)-
THIS.org> wrote:
> stacey wrote:
> > The function characters:
> > public void characters(char buf[], int offset, int len) throws
> > SAXException {

>
> > String s = new String(buf, offset, len);

>
> > The s string sometimes is not as big as it should. Can we define the
> > offset and the len?

>
> > The problem is when it reaches the <mass> element. Sometimes it works
> > ok, and it reads all the number 922.4428373952809.
> > But other times, when let say the mass value is 1455.668578151738,
> > the result is just 14 .

>
> Are you assuming that characters() will always be called with all the text in
> one call ? If so, then don't because it won't. SAX may supply
> "1455.668578151738" in as many separate peices as it wants to -- even in 17
> different calls with one character each.
>
> -- chris


Where Can I study more on SaxParser?
Please help

--sem

 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      02-27-2007
Sem wrote:

> Where Can I study more on SaxParser?


Read a good book ? I liked:
Processing XML with Java
Elliotte Rusty Harold
Online at:
http://www.cafeconleche.org/books/xmljava/

Or you could go the SAX home page:
http://www.saxproject.org/

Or you could read Sun's JavaDocs:
http://java.sun.com/javase/6/docs/ap...e-summary.html

-- chris


 
Reply With Quote
 
Mike Schilling
Guest
Posts: n/a
 
      02-27-2007
Chris Uppal wrote:
> stacey wrote:
>
>> The function characters:
>> public void characters(char buf[], int offset, int len) throws
>> SAXException {
>>
>> String s = new String(buf, offset, len);
>>
>> The s string sometimes is not as big as it should. Can we define the
>> offset and the len?
>>
>> The problem is when it reaches the <mass> element. Sometimes it works
>> ok, and it reads all the number 922.4428373952809.
>> But other times, when let say the mass value is 1455.668578151738,
>> the result is just 14 .

>
> Are you assuming that characters() will always be called with all the
> text in one call ? If so, then don't because it won't. SAX may
> supply "1455.668578151738" in as many separate peices as it wants to
> -- even in 17 different calls with one character each.


Why are you assuimg that each call will supply a non-zero number of
characters?


 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      02-27-2007
Mike Schilling wrote:

[me:]
> > SAX may
> > supply "1455.668578151738" in as many separate peices as it wants to
> > -- even in 17 different calls with one character each.

>
> Why are you assuimg that each call will supply a non-zero number of
> characters?




I did consider that issue. I decided it might be a little confusing to mention
it, though...

Actually, I can't find anything to suggest that 0-length sequence is forbidden
by the SAX spec (such as it is). OTOH, I have no reason to suppose that any
SAX implementation would actually do it.

Makes no real difference in practice, since code which is written to work right
with variable length (sub-)sequences at all will automatically cope with
0-length sequences too.

-- chris


 
Reply With Quote
 
Sem
Guest
Posts: n/a
 
      02-27-2007
On Feb 27, 2:01 pm, "Chris Uppal" <(E-Mail Removed)-
THIS.org> wrote:
> Sem wrote:
> > Where Can I study more on SaxParser?

>
> Read a good book ? I liked:
> Processing XML with Java
> Elliotte Rusty Harold
> Online at:
> http://www.cafeconleche.org/books/xmljava/
>
> Or you could go the SAX home page:
> http://www.saxproject.org/
>
> Or you could read Sun's JavaDocs:
> http://java.sun.com/javase/6/docs/ap...e-summary.html
>
> -- chris


Thank you very much
It helps me a lot and I will chew, swallow and speak it soon
--sem

 
Reply With Quote
 
stacey
Guest
Posts: n/a
 
      02-27-2007
Thank you all for answering..and for your help.

I didn't know that there could be more than one calls to get all the
text.
I thought that one call is logical.

Anyways, I will try it now. I hope it works!

My question still stands about the frequency of the "errors". ( i mean
the every four lines).

Thank you very very much again,

Really Best Regards,

Stacey


On Feb 27, 7:56 pm, "Chris Uppal" <(E-Mail Removed)-
THIS.org> wrote:
> stacey wrote:
> > The function characters:
> > public void characters(char buf[], int offset, int len) throws
> > SAXException {

>
> > String s = new String(buf, offset, len);

>
> > The s string sometimes is not as big as it should. Can we define the
> > offset and the len?

>
> > The problem is when it reaches the <mass> element. Sometimes it works
> > ok, and it reads all the number 922.4428373952809.
> > But other times, when let say the mass value is 1455.668578151738,
> > the result is just 14 .

>
> Are you assuming that characters() will always be called with all the text in
> one call ? If so, then don't because it won't. SAX may supply
> "1455.668578151738" in as many separate peices as it wants to -- even in 17
> different calls with one character each.
>
> -- chris



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
getMethod() works and works not Alexander Burger Java 25 11-29-2010 06:33 PM
When I turn on my PC, it works, works, works. Problem! Fogar Computer Information 1 01-17-2006 12:57 AM
Read all of this to understand how it works. then check around on otherRead all of this to understand how it works. then check around on other thelisa martin Computer Support 2 08-18-2005 06:40 AM
[py2exe.i18n] English works, German works, but not French. What do I miss? F. GEIGER Python 3 08-06-2004 10:01 AM
After rebooting my PC works, works, works! Antivirus problem? Adriano Computer Information 1 12-15-2003 05:30 AM



Advertisments