Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Information on XML overhead analysis

Reply
Thread Tools

Information on XML overhead analysis

 
 
Generic Usenet Account
Guest
Posts: n/a
 
      02-14-2011
Greetings,

Have there been any studies done on the overhead imposed by XML? We
are evaluating whether or not XML imposes an unacceptable overhead for
severely resource constrained devices in M2M (Machine-to-Machine)
deployments. These devices are expected to be very cheap (< $10) and
are expected to run on battery power for years.

Any pointers will be appeciated.

Regards,
Bhat
 
Reply With Quote
 
 
 
 
Peter Flynn
Guest
Posts: n/a
 
      02-14-2011
On 14/02/11 21:19, Generic Usenet Account <(E-Mail Removed)> wrote
in comp.text.tex:
> Greetings,
>
> Have there been any studies done on the overhead imposed by XML? We
> are evaluating whether or not XML imposes an unacceptable overhead for
> severely resource constrained devices in M2M (Machine-to-Machine)
> deployments. These devices are expected to be very cheap (< $10) and
> are expected to run on battery power for years.
>
> Any pointers will be appreciated.


I think it depends how *much* XML is "unacceptable". Parsing a very
small, well-formed instance, with no reference to DTDs or Schemas, such
as a simple config file, would not appear to present much difficulty,
and there are libraries for the major scripting languages that could be
cut down for the purpose.

Larger files of the "Data" genre may also be "acceptable", as they do
not typically use mixed content, and rarely descend much below 4-5
levels IMHE. "Document" files (eg DocBook, XHTML, TEI, etc) by contrast
can be arbitrarily complex and may nest markup to a considerable depth;
TEI in particular. In both cases, a definition of "severely restrained"
would be needed: is this memory, speed, bandwidth, or what? (or all three?).

You might want to talk to some of the utility and application authors
who have implemented some very fast XML software, and see what their
approach was. I'm not a computer scientist, so I don't know how you
would measure the balance between the demands of XML and the demands of
the implementation language, but I would expect that there are metrics
for this which would let you take the platform restrictions into account.

There was some discussion of performance and resources at last year's
XML Summerschool in Oxford, mostly in the sessions on JSON vs XML. I'm
not sure that there was a formal conclusion at that stage, but the
consensus seemed to be that they weren't in competition; rather, that
they addresses different requirements. There was also a recent tweet
from Michael Kay implying that there may be JSON support in Saxon 3.x,
which would make serialisation easier.That, however, doesn't address the
problem for small devices that Java is a hog
(http://xmlsummerschool.com)

The underlying implication of the XML Spec is that resources (disk
space, bandwidth, processor speed) would become less and less of a
factor: I'm not sure that we envisaged severely resource-constrained
devices as forming part of the immediate future. But perhaps someone out
there has indeed tested and measured the cycles and bytes needed.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      02-15-2011
On 2/14/2011 4:19 PM, Generic Usenet Account wrote:
> Have there been any studies done on the overhead imposed by XML?


Depends on the XML, depends on the alternatives, depends on the specific
task being addressed.

Generally, my recommendation is that XML be thought of as a data model
for interchange and toolability. If you're exchanging data entirely
inside of something where nobody else is going to touch it, raw binary
works just fine, and is maximally compact. When the data wants to move
into or out of that controlled environment, XML can be a good choice as
a representation that reliably works across architectures, is easy to
debug, and has a great deal of support already in place which you can
take advantage of.

Tools for tasks. No one tool is perfect for everything, and they *ALL*
involve tradeoffs.



--
Joe Kesselman,
http://www.love-song-productions.com...lam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
Reply With Quote
 
Rui Maciel
Guest
Posts: n/a
 
      02-16-2011
Generic Usenet Account wrote:

> Greetings,
>
> Have there been any studies done on the overhead imposed by XML? We
> are evaluating whether or not XML imposes an unacceptable overhead for
> severely resource constrained devices in M2M (Machine-to-Machine)
> deployments. These devices are expected to be very cheap (< $10) and
> are expected to run on battery power for years.
>
> Any pointers will be appeciated.


XML does impose a considerable overhead, which means that an answer to
your question will only depend on what you consider to be "unacceptable".
For example, if you happen to design a protocol to be used in establishing
the communication between two systems then if you happen to need to
exchange data structures you will be forced to either feed/swallow a lot
of cruft just to get that (i.e., tons of convoluted elements whose opening
and closing scheme end up wasting multiple times the data used to encode
the information which it is designed to convey) or to develop crude hacks
to weasel your way out of that problem where XML forced upon yourself
(i.e., dump your data structures on an element according to your own
format and then re-parse it a second time around on the receiving end).

And there is a good reason for that: XML is a markup language. It was
designed to encode documents, such as HTML, and nothing else. It may do
that well but once you step beyond that then it simply doesn't work that
well. Plus, there are a lot of better suited alternatives out there.

My suggestion is that if you really want a data interchange language then
you should go with a language designed specifically with that in mind.
One such language is JSON, which, in spite of it's name, happens to be a
great language. For example, unlike XML it provides explicit support for
data structures (objects, arrays/lists) and for basic data types (text
strings, numbers, boolean values, NULL) Another added feature is the fact
that it is terribly simple to parse, which means you can develop a fully
conforming parser in a hundred or so LoC of C, including all the state
machine stuff for the lexer.


Hope this helps,
Rui Maciel
 
Reply With Quote
 
Roberto Waltman
Guest
Posts: n/a
 
      03-01-2011
GUA wrote:
>Have there been any studies done on the overhead imposed by XML? We
>are evaluating whether or not XML imposes an unacceptable overhead for
>severely resource constrained devices in M2M (Machine-to-Machine)
>deployments.


I personally find that markup/data overheads of several hundred
percent are difficult to justify.

Somehow related, see "Why the Air Force needs binary XML"
http://www.mitre.org/news/events/xml...an_keynote.pdf
--
Roberto Waltman

[ Please reply to the group.
Return address is invalid ]
 
Reply With Quote
 
Roberto Waltman
Guest
Posts: n/a
 
      03-01-2011
>Somehow related, see "Why the Air Force needs binary XML"
>http://www.mitre.org/news/events/xml...an_keynote.pdf


After I posted that, searching for "Binary XML" brought up this:
http://www.extreme.indiana.edu/~aslom/papers/bxsa.pdf
--
Roberto Waltman

[ Please reply to the group.
Return address is invalid ]
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      03-01-2011
On 2/28/2011 7:20 PM, Roberto Waltman wrote:
> I personally find that markup/data overheads of several hundred
> percent are difficult to justify.


XML compresses like a sonofagun. And industry experience has been that
the time needed to parse XML vs. the time needed to reload from a
binary-stream representation aren't all that different. That's the rock
on which past attempts to push the idea of standardizing a binary
equivalent of XML have foundered -- the intuitive sense that binary
should automatically be better hasn't panned out.

Sharing binary representations once the data is in memory makes more
sense. In fact, XML's greatest strength is at the edges of a system --
as a data interchange/standardization/tooling format -- while the
interior of the system would often be better off using a data model
specifically tuned to that system's needs.


--
Joe Kesselman,
http://www.love-song-productions.com...lam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
Reply With Quote
 
Pascal J. Bourguignon
Guest
Posts: n/a
 
      03-01-2011
Roberto Waltman <(E-Mail Removed)> writes:

> GUA wrote:
>>Have there been any studies done on the overhead imposed by XML? We
>>are evaluating whether or not XML imposes an unacceptable overhead for
>>severely resource constrained devices in M2M (Machine-to-Machine)
>>deployments.

>
> I personally find that markup/data overheads of several hundred
> percent are difficult to justify.
>
> Somehow related, see "Why the Air Force needs binary XML"
> http://www.mitre.org/news/events/xml...an_keynote.pdf


Do they accept propositions?

What about something like:

element ::= 0x28 element-name 0x20 attributes 0x20 contents 0x29 .

attributes ::= 0x28 ( attribute-name 0x20 attribute-value )* 0x29 .

contents ::= ( element | value ) { 0x20 contents } .

value ::= 0x22 ( non-double-quote-character | 0x5c 0x22 | 0x5c 0x5c ) * 0x22
| number
| identifier .

element-name ::= identifier .
attribute-name ::= identifier .
attribute-value ::= value .


--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      03-01-2011
On 2/28/2011 10:22 PM, Joe Kesselman wrote:
> On 2/28/2011 7:20 PM, Roberto Waltman wrote:
>> I personally find that markup/data overheads of several hundred
>> percent are difficult to justify.

>
> XML compresses like a sonofagun. And industry experience has been that
> the time needed to parse XML vs. the time needed to reload from a
> binary-stream representation aren't all that different. That's the rock
> on which past attempts to push the idea of standardizing a binary
> equivalent of XML have foundered -- the intuitive sense that binary
> should automatically be better hasn't panned out.
>
> Sharing binary representations once the data is in memory makes more
> sense. In fact, XML's greatest strength is at the edges of a system --
> as a data interchange/standardization/tooling format -- while the
> interior of the system would often be better off using a data model
> specifically tuned to that system's needs.
>


I think it depends somewhat on the type of data.

in my own binary XML format (SBXE), which is mostly used for compiler
ASTs (for C and several other languages), I am often seeing an approx 6x
to 9x size difference.

most of the difference is likely that of eliminating redundant strings
and tag names (SBXE handles both via MRU lists).


grabbing a few samples (ASTs in both formats), and running them through
gzip:
textual XML compresses by around 29x;
SBXE compresses by around 3.7x.

the gzip'ed text XML is 1.1x (approx 10%) larger than the gzip'ed SBXE.

so, purely from a sake of size (if GZIP can be reasonably used in a
given context), binary XML is not really needed.


the binary format is likely a little faster to decode again, and as
typically used, I don't use deflate.

it is mostly used within the same program, and also for stuffing XML
data into a few other misc binary formats.


however, it can be noted that most common uses of XML don't involve the
corresponding use of deflate, so a format which is partly compressed by
default will still save much over one which is not compressed at all.

so, one would still likely need a "special" file format (lets' just call
it ".xml.gz" or maybe ".xgz" for the moment...).


or such...

 
Reply With Quote
 
Rui Maciel
Guest
Posts: n/a
 
      03-01-2011
Roberto Waltman wrote:

> I personally find that markup/data overheads of several hundred
> percent are difficult to justify.
>
> Somehow related, see "Why the Air Force needs binary XML"
> http://www.mitre.org/news/events/xml...an_keynote.pdf


At first glance, that presentation is yet another example how XML is
inexplicably forced into inappropriate uses. The presentation basically
states that the US air force needs to implement "seamless interoperability
between the warfighting elements", which means adopting a protocol to
handle communications, and then out of nowhere XML is presented as a
given, without giving any justification why it is any good, let alone why
it should be used. As that wasn't enough, then half of the presentation
is spent suggesting ways to try to mitigate one of XML's many problems,
which incidentally consists of simply eliminating XML's main (and single?)
selling point: being a human-readable format.

So, it appears it's yet another example of XML fever, where people
involved in decision-making are attracted to a technology due to marketing
buzzwords instead of their technological merits.


Rui Maciel
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Evaluating static analysis and Dynamic analysis tools for C/C++ ssubbarayan C Programming 5 11-03-2009 12:50 AM
Dynamic analysis tools information vipindeep C Programming 1 10-26-2004 02:22 AM
Dynamic analysis tools information vipindeep C++ 1 10-25-2004 10:36 AM
Dynamic analysis tools information vipindeep Java 0 10-25-2004 09:36 AM
REVIEW: "Information Security Risk Analysis", Thomas R. Peltier Rob Slade, doting grandpa of Ryan and Trevor Computer Security 0 06-21-2004 05:55 PM



Advertisments