![]() |
Is it possible with xerces ?
I try to parse an indented xml file with dom xerces c++.
the file is like that : <root> <child1> <field1> foo </field1> <field2> bar </field2> </child1> <child2> <field1> foo </field1> <field2> bar </field2> </child2> </root> where return an white spaces are in the xml file. So the program I writed with dom give me this tree : root has five childs : text-node child1 text-node child2 text-node the text of the first text-node is "\n " the text of the second text-node is "\n " the text of the third text-node is "\n" these text-node of spaces occurs at each step in the tree hierarchy. Is it possible to strip these nodes automatically ? XML standard question : does this xml code respects the xml standard ? <child2> some text <field1> foo </field1> <field2> bar </field2> </child2> "some text" is in the same depth of field1 and field2 but is a text. So there is a soap of text and element. I thougth that the text must be a leaf of the tree ... So does it respects the standard ? Thanks |
Re: Is it possible with xerces ?
Manuel Yguel wrote:
> I try to parse an indented xml file with dom xerces c++. > the file is like that : > <root> > <child1> > <field1> foo </field1> > <field2> bar </field2> > </child1> > <child2> > <field1> foo </field1> > <field2> bar </field2> > </child2> > </root> > > where return an white spaces are in the xml file. So the program I > writed with dom give me this tree : > root has five childs : > text-node child1 text-node child2 text-node > > the text of the first text-node is "\n " > the text of the second text-node is "\n " > the text of the third text-node is "\n" > > these text-node of spaces occurs at each step in the tree hierarchy. > > Is it possible to strip these nodes automatically ? yes : there is an option that allows to strip ignorable whitespaces, but you must give a grammar that defines where are ignorable whitespaces, like this : <!ELEMENT root (child1,child2)> > > XML standard question : does this xml code respects the xml standard ? > > <child2> some text > <field1> foo </field1> > <field2> bar </field2> > </child2> > > "some text" is in the same depth of field1 and field2 but is a text. So > there is a soap of text and element. I thougth that the text must be a > leaf of the tree ... So does it respects the standard ? yes : an element may contain : -nothing (empty element) -subelements -text -text and subelements > > Thanks > -- Cordialement, /// (. .) -----ooO--(_)--Ooo----- | Philippe Poulard | ----------------------- |
Re: Is it possible with xerces ?
Philippe Poulard wrote:
> Manuel Yguel wrote: > >> I try to parse an indented xml file with dom xerces c++. >> the file is like that : >> <root> >> <child1> >> <field1> foo </field1> >> <field2> bar </field2> >> </child1> >> <child2> >> <field1> foo </field1> >> <field2> bar </field2> >> </child2> >> </root> >> >> where return an white spaces are in the xml file. So the program I >> writed with dom give me this tree : >> root has five childs : >> text-node child1 text-node child2 text-node >> >> the text of the first text-node is "\n " >> the text of the second text-node is "\n " >> the text of the third text-node is "\n" >> >> these text-node of spaces occurs at each step in the tree hierarchy. >> >> Is it possible to strip these nodes automatically ? > > > yes : there is an option that allows to strip ignorable whitespaces, but > you must give a grammar that defines where are ignorable whitespaces, > like this : > > <!ELEMENT root (child1,child2)> > thanks, but after how do you use the grammar with the parser ? >> >> XML standard question : does this xml code respects the xml standard ? >> >> <child2> some text >> <field1> foo </field1> >> <field2> bar </field2> >> </child2> >> >> "some text" is in the same depth of field1 and field2 but is a text. >> So there is a soap of text and element. I thougth that the text must >> be a leaf of the tree ... So does it respects the standard ? > > > yes : an element may contain : > -nothing (empty element) > -subelements > -text > -text and subelements > >> >> Thanks >> > > |
Re: Is it possible with xerces ?
Manuel Yguel wrote:
> Philippe Poulard wrote: > >> Manuel Yguel wrote: >> >>> I try to parse an indented xml file with dom xerces c++. >>> the file is like that : >>> <root> >>> <child1> >>> <field1> foo </field1> >>> <field2> bar </field2> >>> </child1> >>> <child2> >>> <field1> foo </field1> >>> <field2> bar </field2> >>> </child2> >>> </root> >>> >>> where return an white spaces are in the xml file. So the program I >>> writed with dom give me this tree : >>> root has five childs : >>> text-node child1 text-node child2 text-node >>> >>> the text of the first text-node is "\n " >>> the text of the second text-node is "\n " >>> the text of the third text-node is "\n" >>> >>> these text-node of spaces occurs at each step in the tree hierarchy. >>> >>> Is it possible to strip these nodes automatically ? >> >> >> >> yes : there is an option that allows to strip ignorable whitespaces, >> but you must give a grammar that defines where are ignorable >> whitespaces, like this : >> >> <!ELEMENT root (child1,child2)> >> > thanks, but after how do you use the grammar with the parser ? > use the <!DOCTYPE> declaration you should have a look at the spec -- Cordialement, /// (. .) -----ooO--(_)--Ooo----- | Philippe Poulard | ----------------------- |
| All times are GMT. The time now is 02:40 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.