Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Writing a parser

Reply
Thread Tools

Writing a parser

 
 
Jan Danielsson
Guest
Posts: n/a
 
      08-17-2005
Hello all,

I guess this is a question for people who have written a parser.

Does an XML parser ever need to be recursive? I mean like:

&fo&bar;o;

I know this particular example is in the XML specs, and it says that
it will not happen. But are there some really wild constructions that
are allowed, that would require recurive parsing?

Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
-->ml;o/>"></tag2> -->></tag>

Please, don't start taking that a part, I know all the errors in it.
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?

--
Kind Regards,
Jan Danielsson
Te audire no possum. Musa sapientum fixa est in aure.
 
Reply With Quote
 
 
 
 
Gerald Aichholzer
Guest
Posts: n/a
 
      08-17-2005
Hello,

Jan Danielsson wrote:
>
> I guess this is a question for people who have written a parser.
>
> Does an XML parser ever need to be recursive? I mean like:
>
> &fo&bar;o;
>
> I know this particular example is in the XML specs, and it says that
> it will not happen. But are there some really wild constructions that
> are allowed, that would require recurive parsing?
>
> Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
> -->ml;o/>"></tag2> -->></tag>
>
> Please, don't start taking that a part, I know all the errors in it.
> However, what I want to demonstrate is the level of complexity I'm
> wondering about. Any case where recursion is needed?
>


I'm no expert, but AFAIK a XML parser will have to stop if the XML
file is not well-formed. The above example contains errors (you
said it), so it is not well-formed. There's no need for a parser
to accept the above construct. I even think that a parser is not
allowed to accept it.

Gerald
 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      08-17-2005
Jan Danielsson wrote:

> However, what I want to demonstrate is the level of complexity I'm
> wondering about. Any case where recursion is needed?


Why do you worry about recursion ?
Recursive functions usually make parsers easier to implement.
If you *really* cant recurse in your implementation, use stacks
for holding the context.
 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      08-17-2005
In article <4303a842$(E-Mail Removed)>,
Jan Danielsson <(E-Mail Removed)> wrote:

>Does an XML parser ever need to be recursive? I mean like:


Yes, but not in the way your examples are.

Elements may contain other elements:

<foo>...<bar>...</bar>...</foo>

Even if you don't return this as a nested structure (for example,
a SAX parser just returns start and end tags), you need to maintain
a stack of open elements so you can detect errors like this:

<foo>...<bar>...</bar>...</wrong>

The replacement text of entities may contain references to other
entities:

<!ENTITY foo "some text">
<!ENTITY bar "contains this [ &bar; ] text">

So that a reference in the document to "&foo;" must be expanded
to "contains this [ some text ] text".

And similarly for external entities.

-- Richard
 
Reply With Quote
 
Jan Danielsson
Guest
Posts: n/a
 
      08-17-2005
Jürgen Kahrs wrote:
>>However, what I want to demonstrate is the level of complexity I'm
>>wondering about. Any case where recursion is needed?

>
> Why do you worry about recursion ?
> Recursive functions usually make parsers easier to implement.
> If you *really* cant recurse in your implementation, use stacks
> for holding the context.


I'm sorry, but I was talking about recursive *expressions* in *XML*,
not as in "a function calling itself". I already have a stack based
parser, but I'm beginning to wonder it is worth the trouble, I haven't
actually seen any examples where I would actually need the stack based
design, and there'a much neater way to solve it, imho, but it would
make certain recursions *in* *XML* impossible.

Sorry for the confusion.

--
Kind Regards,
Jan Danielsson
Te audire no possum. Musa sapientum fixa est in aure.
 
Reply With Quote
 
Soren Kuula
Guest
Posts: n/a
 
      08-17-2005
Richard Tobin wrote:
> <!ENTITY foo "some text">
> <!ENTITY bar "contains this [ &bar; ] text">
>
> So that a reference in the document to "&foo;" must be expanded
> to "contains this [ some text ] text".
>

Surely you mean
> <!ENTITY bar "contains this [ &foo; ] text">


?

Soren
 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      08-17-2005
In article <bmPMe.65151$(E-Mail Removed)>,
Soren Kuula <(E-Mail Removed)> wrote:

>> <!ENTITY bar "contains this [ &bar; ] text">


>Surely you mean
> > <!ENTITY bar "contains this [ &foo; ] text">


Yes, of course.

The one I typed is illegal (and must be reported as such by an XML
parser if it is used).

-- Richard
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
import parser does not import parser.py in same dir on win Joel Hedlund Python 2 11-11-2006 03:46 PM
import parser does not import parser.py in same dir on win Joel Hedlund Python 0 11-11-2006 11:34 AM
XML Parser VS HTML Parser ZOCOR Java 11 10-05-2004 01:58 PM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger Java 0 06-09-2004 01:26 AM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger XML 0 06-09-2004 01:26 AM



Advertisments