Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > embedding xml in xml as non-xml :)

Reply
Thread Tools

embedding xml in xml as non-xml :)

 
 
Mark Van Orman
Guest
Posts: n/a
 
      09-14-2004
Hi all,

I have an application that logs in xml.

Assume <xmlLog></xmlLog>. In this element the app logs anything it gets
from foreign hosts. Now if the host sends xml data, the structure of the
document changes. ie. <xmlLog><somTag></somTag></xmlLog>. This will
cause problems with my log reader, because it assumes that <xmlLog/>
contains non-xml data.

My question is, is there a way to treat the data in the <xmlLog/>
element as non xml data. Something I can do that would treat anything
this element contains as a literal?

Any help or suggestions would be greatly appreciated.



Regards,


Mark
 
Reply With Quote
 
 
 
 
William Park
Guest
Posts: n/a
 
      09-14-2004
Mark Van Orman <(E-Mail Removed)> wrote:
> Hi all,
>
> I have an application that logs in xml.
>
> Assume <xmlLog></xmlLog>. In this element the app logs
> anything it gets from foreign hosts. Now if the host sends xml
> data, the structure of the document changes. ie.
> <xmlLog><somTag></somTag></xmlLog>. This will cause problems
> with my log reader, because it assumes that <xmlLog/> contains
> non-xml data.
>
> My question is, is there a way to treat the data in the
> <xmlLog/> element as non xml data. Something I can do that
> would treat anything this element contains as a literal?
>
> Any help or suggestions would be greatly appreciated.


Modify your "log reader". If remote can send any ASCII, then why does
log reader assume a particular format? '<somTag></somTag>' is ASCII
string to me.

--
William Park <(E-Mail Removed)>
Open Geometry Consulting, Toronto, Canada
 
Reply With Quote
 
 
 
 
Andy Dingley
Guest
Posts: n/a
 
      09-14-2004
On Mon, 13 Sep 2004 23:51:39 -0500, Mark Van Orman
<(E-Mail Removed)> wrote:

>In this element the app logs anything it gets from foreign hosts.


Your problem is to map "input" to well-formed character data according
to the rules of
http://www.w3.org/TR/2004/REC-xml11-20040204/#syntax

This is a task as old as computer programming with input files. There
are several rechniques to solve it, broadly by "escaping" or by
"wrapping"


Your example of
> <xmlLog><somTag></somTag></xmlLog>

is quite easy, and could indeed be stored and read back, then treated
as ASCII.

However a foreign host that sends "<notATag<><>>" will break things,
because
<xmlLog><notATag<><>></xmlLog>
isn't well-formed XML and so parsers will choke on it.


The main problem is to handle the mapping of arbitrary characters into
"character data" (this is a term carefully defined in the XML spec).

The "escaping" way to do this is quite simple, and can be done with a
handful of character substitutions (from the XML spec):

:>The ampersand character (&) and the left angle bracket (<) MUST NOT
:> appear in their literal form, [...] they MUST be escaped using
:> either numeric character references or the strings "&amp;" and "&lt;"
:> respectively. The right angle bracket (>) MAY be represented using
:> the string "&gt;", and MUST, for compatibility, be escaped using
:> either "&gt;" or a character reference when it appears in the string
:> "]]>" in content,

So your example of
<xmlLog><somTag></somTag></xmlLog>
becomes
<xmlLog>&lt;somTag&gt;&lt;/somTag&gt;</xmlLog>


You could also use a "CDATA section", which would be the "wrapping"
approach. This takes the dubious input content and places it between
two markers that say "Between these points is CDATA, not XML markup"

The markers are <![CDATA[ and ]]>

Your example of
<xmlLog><somTag></somTag></xmlLog>
becomes
<xmlLog><![CDATA[<somTag></somTag>]]></xmlLog>

be warned that you'll still need escaping in case the input contains a
copy of the end marker! (read the XML spec, or ask again)



Second problem is to define "input". This is important because in
today's world we're really having to face up to internationalization,
character sets and encodings. It's likely that you can redefine input
from "anything" to "anything that is in UTF-8", which will make your
life easier, but be aware you _have_ made a deliberate choice here.

It's OK to write code that breaks in Japanese - just be aware that
you've done so, and know what would need changing if you needed to
remedy this.


You'll find that RSS has this same problem when embedding HTML content
within it. Some RSS versions handle this better than others, and
there's an excellent overview here
http://diveintomark.org/archives/200...compatible-rss

--
Smert' spamionam
 
Reply With Quote
 
Kenneth Stephen
Guest
Posts: n/a
 
      09-14-2004
Andy Dingley wrote:


> It's OK to write code that breaks in Japanese - just be aware that
> you've done so, and know what would need changing if you needed to
> remedy this.
>

Andy,

Why would code break only in Japanese and why is that ok?

Regards,
Kenneth
 
Reply With Quote
 
Andy Dingley
Guest
Posts: n/a
 
      09-14-2004
On Tue, 14 Sep 2004 12:51:49 GMT, Kenneth Stephen
<(E-Mail Removed)> wrote:

> Why would code break only in Japanese and why is that ok?


That's just as an example. Most European-written XML code fails in
CJKV countries (China, Japan, Korea, Vietnam). Most American-written
XML fails in France Just look how many RSS feeds choke when they meet
, or more usually &eacute; with the entity having been defined.

XML _itself_ (and the major tools) are very good at supporting a wide
range of character sets and encodings, but there are rules you have to
follow. For most _applications_, coders don't bother to do this. If
you _know_ your app will never receive something outside ASCII, then
that's all you need - but you should still be aware of what you've
built.

--
Smert' spamionam
 
Reply With Quote
 
Patrick TJ McPhee
Guest
Posts: n/a
 
      09-15-2004
In article <(E-Mail Removed)>,
Andy Dingley <(E-Mail Removed)> wrote:

[...]

% The markers are <![CDATA[ and ]]>
%
% Your example of
% <xmlLog><somTag></somTag></xmlLog>
% becomes
% <xmlLog><![CDATA[<somTag></somTag>]]></xmlLog>
%
% be warned that you'll still need escaping in case the input contains a
% copy of the end marker! (read the XML spec, or ask again)

You don't need escaping so much as you need to end and restart the
CDATA section

<xmlLog><![CDATA[<somTag><![CDATA[with a CDATA section]]>]]><![CDATA[</somTag>]]></xmlLog>

The first ]]> ends the first CDATA section. The second is data.
--

Patrick TJ McPhee
East York Canada
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
embedding xml in xml as string shaun roe XML 1 11-02-2005 06:13 PM
Embedding xml as remote obects into static html Kevin HTML 1 12-17-2003 01:49 PM
Embedding xml as remote obects into static html Kevin XML 1 12-17-2003 01:49 PM
Embedding XML into ASPX question Victor Fees ASP .Net 5 07-31-2003 04:41 AM
Embedding non standard XML tags in XML comments terry ASP .Net 0 07-09-2003 01:27 PM



Advertisments