Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Newbie question about how to solve the use escape characters

Reply
Thread Tools

Newbie question about how to solve the use escape characters

 
 
Mark Chao
Guest
Posts: n/a
 
      11-15-2005
Hi, I am a newbie, I spend quite sometime searching on the web, but I
didn't find anything. I hope this question is not too bad to ask here.

I am trying to convert XML document into another form, such as this:

<a>
A
<b>B</b>
<c>C</c>
</a>

should be converted to this:

a A
a b B
a c C

I am using the Java's sax parser with my own extended DefaultHandler.
Usually XML documents given to me will have the elements and child
elements properly idented (as above). However this will cause problem,
as the character() in the handler class will be called even between 2
endElement() call, sometimes between 2 startElement() call.

This will also cause problem as the "A" will be parsed to "\n\tA"
because it is just parsed as it is. The obvious way to solve this
problem is to just make my handler taking only XML files which have no
"\n" nor "\t" escape characters. I can also manually take out any of
these escape characters, but it will also accidentally remove any
intended escape characters.

Another way would be disallowing XML documents which have character
data between 2 startElement or 2 endElement. ie only have character
data between 1 startElement and 1 endElement. However this constraint
is too heavy and not appropriate.

This is just a semantic problem, but I just want to know if there are
any other ways to tackle the problem.

 
Reply With Quote
 
 
 
 
Peter Flynn
Guest
Posts: n/a
 
      11-16-2005
Mark Chao wrote:

> Hi, I am a newbie, I spend quite sometime searching on the web, but I
> didn't find anything. I hope this question is not too bad to ask here.
>
> I am trying to convert XML document into another form, such as this:
>
> <a>
> A
> <b>B</b>
> <c>C</c>
> </a>


This should ring immediate warning bells. Mixed Content (interspersed
text and markup) is normally the wrong model in data-oriented
applications. A more useful form would be

<a>
<something>A</something>
<b>B</b>
<c>C</c>
</a>

After all, the "A" must have some function, so it should be identified.

> should be converted to this:
>
> a A
> a b B
> a c C


The following XSLT will do this.

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlnssl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xslutput method="text"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*">
<xsl:for-each select="ancestor::*">
<xsl:value-of select="name()"/>
<xsl:text>&#x0009;</xsl:text>
</xsl:for-each>
<xsl:value-of select="name()"/>
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="text()">
<xsl:text>&#x0009;</xsl:text>
<xsl:value-of select="normalize-space(.)"/>
<xsl:text>&#x000A;</xsl:text>
</xsl:template>

</xsl:stylesheet>

> I am using the Java's sax parser with my own extended DefaultHandler.
> Usually XML documents given to me will have the elements and child
> elements properly idented (as above). However this will cause problem,
> as the character() in the handler class will be called even between 2
> endElement() call, sometimes between 2 startElement() call.


That's why I suggest that this is a suboptimal format for the data.

> This is just a semantic problem, but I just want to know if there are
> any other ways to tackle the problem.


Try XSLT.

///Peter


 
Reply With Quote
 
 
 
 
mcha226@gmail.com
Guest
Posts: n/a
 
      11-16-2005
Thanks a lot. I'll start learning XSLT as well.

About what I have done, I used the decorator pattern and created a
decorator wrapping around my base handler. This will buffer the text
received in characters(), and send the complete text in one go. It will
also take out the \n and \t from the beginning of the text and the end
of the text.

I found out later that there is a XMLFilterImpl. It is interesting that
this class implements both the reader interface and all the handler
interface, whereas my decorator only implements the ContentHandler.
Just a personal opinion, I think my design can be a little be more
efficient. For example:

reader = XMLReaderFactory.createXMLReader();
handler = new SimpleHandler(); // Extends DefaultHandler

reader.setContentHandler(new BufferedHandler(handler));
reader.setErrorHandler(handler);

My design is easier to understand (implements only the handler part of
the interface) and it can prevent passing the call unnecessarily. (if
you are using XMLFilterImpl to create a filter for each of the
ContentHandler and ErrorHandler, this will cause extra calls across
layers.)

Anyone think the same as me?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to read strings cantaining escape character from a file and useit as escape sequences? slomo Python 5 12-02-2007 11:39 AM
Re: html special character and escape characters knowledgepays@hotmail.com ASP .Net 0 01-27-2005 02:08 AM
trying out escape characters Griff Perl 6 08-20-2004 08:20 PM
What Happens To Escape Characters? Guadala Harry ASP .Net 3 08-19-2004 01:59 AM
Escape characters Maziar Aflatoun ASP .Net 3 12-05-2003 05:55 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57