>>>>> Pete Becker <> (PB) wrote:
>PB> The cited news article is rather superficial. Be careful about drawing
>PB> conclusions about how the legal system works from reading such sources.
>PB> They're often wrong.
>PB> The patent itself was filed in 1994 (not 1998, as the article says) and
>PB> issued in 1998. It mentions SGML (the parent of XML) in several places, and
>PB> says that the method at issue is fundamentally different because it does
>PB> not put structural information in the data stream. More particularly:
>PB> Thus, in sharp contrast to the prior art the present
>PB> invention is based on the practice of separating encoding
>PB> conventions from the content of a document. The invention
>PB> does not use embedded metacoding to differentiate the content
>PB> of the document, but rather, the metacodes of the document are
>PB> separated from the content and held in distinct storage in a
>PB> structure called a metacode map, whereas document content is
>PB> held in a mapped content area. Raw content is an extreme
>PB> example of mapped content wherein the latter is totally
>PB> unstructured and has no embedded metacodes in the data stream.
>PB> That doesn't sound like a description of XML.
Well, read the whole patent. What they do is process a document with
embedded markup (like troff, SGML, XML, or maybe even TeX) in such a way
that inside the program the markup is separated from the plain text. The
external representation is still the marked up text. So it does apply to
XML. This is quite a primitive way of parsing the markup. It is just
scanning the input until you find a tag (called metacode in the patent)
copying the text before the tag to an output area, and copying the tag
to a list of tags (called a metacode map in the patent). So compared to
modern parsing techniques there are two differences: (1) nowaday you
usually build a parse tree; they have just a degenerate tree (only a
list). (2) usually the plain text is put in the leaves of the tree; they
have the text in one contiguous area, and the `parse tree' contains
pointers or indices to this area.
The advantage of their structure comes when you need more than one tag
structure on top of the text: for example when you both have the
hierarchical XML structure and a structure with lines and pages.
SGML has the possibility of having more than one structure in the same
document and that fact is mentioned in the patent.
The only innovative idea in the patent is this separation because it
makes it easier to do editing on the document when you have more than
one structure on top of it. And I don't know how innovative it is
because once you need to edit a marked up text with more than one (markup)
structure on top of it, this is quite a logical choice. And moreover
ideas cannot be patented, so the idea doesn't count (but IANAL).
Once you have this idea, implementing it is peanuts. You could give this
to any student that attends a beginner's programming course when they
have had strings, arrays and loops, and they should be able to solve it.
So the patent is about the transformation of the marked up text to the
separated data structure and v.v. and about calculating another
structure from the first one, plus some minor other things. I find it
really silly that you can get a patent for this kind of thing.
I am writing a small Python program that illustrates the patented
algorithms.
--
Piet van Oostrum <>
URL:
http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: