Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > DOM2 API (Java): how to get namespace declarations?

Reply
Thread Tools

DOM2 API (Java): how to get namespace declarations?

 
 
Simon Brooke
Guest
Posts: n/a
 
      02-11-2006
I was debugging a new XML generator tonight and trying to determine why
it wasn't working; and realised my dom printer does not output XML
namespace declarations.

My method to output an Element is as follows:

/**
* Print an element node, and, by recursive descent, it's children
*
* @param node the node to print
* @param out the stream to print it on
* @param url the base URL to use in expanding relative URLs
* @param level the indentation level if pretty printing
*/
protected void print( Element node, PrintStream out, URL url,
int level )
throws IOException
{
indent( out, level );
out.print( '<' );

String tagname = node.getNodeName( );
out.print( tagname );

NamedNodeMap attrs = node.getAttributes( );
NodeList children = node.getChildNodes( );

/**
* Get the attributes of the node and print their values.
*/
for ( int i = 0; i < attrs.getLength( ); i++ )
{
print( ( (Attr) attrs.item( i ) ), out, url, level + 1 );
}

if ( ( children != null ) && ( children.getLength( ) > 0 ) )
{ // it's a non-empty tag
out.print( '>' );

int len = children.getLength( );

for ( int i = 0; i < len; i++ )
{
print( children.item( i ), out, url, level + 1 );
}

/**
* Set the end tag.
*/
indent( out, level );
out.print( '<' );
out.print( '/' );
out.print( tagname );
}
else // it's an empty tag
{
out.print( " /" );
}

out.print( '>' );
}

Performing the exact same XSL transform, the Xerces printer emits:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF version="1.0"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:geourl="http://geourl.org/rss/module/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rss version="0.91">
...

whereas my printer emits:

<rdf:RDF version="1.0">
<rss version="0.91">
...

The relevant part of the XSL file reads:

<xsl:template match="category">
<rdf:RDF version="1.0"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:geourl="http://geourl.org/rss/module/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">
<rss version="0.91">
...

Clearly what Xerces is emitting is right and what I am emitting is wrong,
but I'm having trouble seeing what I'm doing wrong. My method to output
an attribute node is as follows:

/**
* Print an attribute node. If url is not null, use it as a base URL
* for expanding URL values.
*
* @param node the node to print
* @param out the stream to print it on
* @param url the base URL to use in expanding relative URLs
* @param level the indentation level if pretty printing
*/
protected void print( Attr node, PrintStream out, URL url,
int level )
throws IOException
{
String delimiter = "\"";
String value = node.getNodeValue( );

if ( value != null )
{
/* As I understand it, you aren't allowed unvalued
* attributes in XML
*/
value = cleanString( value, true );
/* are attribute values allowed to contain *any*
* characters? */

if ( value.indexOf( delimiter ) > -1 )
/* if an attribute has double quotes in it's value, we'll use
* single quotes as the delimiter and vice versa. If it has
* both we're stuffed. */
{
delimiter = "'";
}

indent( out, level );
out.print( " " );
out.print( node.getNodeName( ) );
out.print( "=" );
out.print( delimiter );

/* If this is an attribute whose value
* should be a URL. */
if ( ( node.getNodeName( ).equalsIgnoreCase( "href" ) ||
node.getNodeName( ).equalsIgnoreCase( "link" ) ||
node.getNodeName( ).equalsIgnoreCase( "src" ) ) &&
( url != null ) )
{
/* Change the partial URL to a full URL. */
try
{
String fullURL = new URL( url, value ).toString( );

out.print( fullURL );
}
catch ( MalformedURLException m )
{
// log
m.printStackTrace();
}
}
else
{ /* If I've got a value, clean it and
* print it. */
out.print( value );
}

out.print( delimiter );
}
else
{
System.err.println( "Unvalued attribute: " +
node.getNodeName( ));
}
}

Neither the MalformedURLException nor the string 'Unvalued attribute'
ever appear in the log. From this it seems that neither
Node.getAttributes() nor Node.getChildNodes() return the namespace
declarations. Yet I can't see any other no-args get...() method in the
API. Reading through the Xerces XMLSerializer code makes is seem that
they are finding the namespace declarations among the attributes.

Can anyone see what I'm doing wrong? I appreciate it probably some basic
howler, but I just can't see it myself.

--
http://www.velocityreviews.com/forums/(E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/

Hobbit ringleader gives Sauron One in the Eye.

 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      02-11-2006
Simon Brooke wrote:
> I was debugging a new XML generator tonight and trying to determine why
> it wasn't working; and realised my dom printer does not output XML
> namespace declarations.


XML namespace declarations are optional in the DOM, since every node
carries its namespace and bindings can be reconstructed when you
serialize the DOM's contents as XML. The flipside is that it is the
serializer's responsibility to check that the necessary declarations are
present as Attribute nodes, and/or to synthesize those declarations.

The DOM Level 3 spec should have a fairly detailed description of one
algorithm for doing that check and fixup. (I drafted the first version
of that logic, though I think it's been tweaked a bit since then.) I'd
suggest reading that before implementing your own DOM-printer.

Alternatively, you can insist that whoever constructs your DOM take
responsibility for making sure that all the necessary Attribute nodes
exist to declare the namespaces. (Note that they have to be in the
correct namespace themselves...). But it's probably better not to count
on that unless you have full control of both sides of the system.

Note that most DOM implementations these days ship with serializers that
know how to do the right things, so unless you're creating your own DOM
or have unusual formatting requirements it might be simpler to just use
those rather than reimplementing that code. (And of course DOM Level 3
proposes a standard API for that function.)

But doing a recursive-descent DOM printer _is_ a good learning exercise,
so it's probably something you should write at least once. Among other
things, the same tree-walking logic is useful for many other kinds of
DOM processing.

 
Reply With Quote
 
 
 
 
Simon Brooke
Guest
Posts: n/a
 
      02-11-2006
in message <(E-Mail Removed)>, Joe Kesselman
('(E-Mail Removed)') wrote:

> Simon Brooke wrote:
>> I was debugging a new XML generator tonight and trying to determine
>> why it wasn't working; and realised my dom printer does not output XML
>> namespace declarations.

>
> XML namespace declarations are optional in the DOM, since every node
> carries its namespace and bindings can be reconstructed when you
> serialize the DOM's contents as XML. The flipside is that it is the
> serializer's responsibility to check that the necessary declarations
> are present as Attribute nodes, and/or to synthesize those
> declarations.


Thanks very much!

> The DOM Level 3 spec should have a fairly detailed description of one
> algorithm for doing that check and fixup. (I drafted the first version
> of that logic, though I think it's been tweaked a bit since then.) I'd
> suggest reading that before implementing your own DOM-printer.


OK, got it.
<URL:http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html>

> Note that most DOM implementations these days ship with serializers
> that know how to do the right things, so unless you're creating your
> own DOM or have unusual formatting requirements it might be simpler to
> just use those rather than reimplementing that code. (And of course DOM
> Level 3 proposes a standard API for that function.)


Yup. The thing is I wrote my printer back in February 2000 when there
weren't a lot of others around - which makes it surprising that it's
failure to do the right things with namespaces hasn't tripped me up
before. It would probably be more economic now to just make a call to
the DOM3 serialiser API, but as a matter of craftsmanship I'd like to
get mine right.

OK, so: we look at a node and see if it needs a namespace, and if it does
we generate a namespace declaration. Suppose we have a structure

1 <a>
2 <b>
3 <foo:c/>
4 <foo:d/>
5 </b>
6 <bar:e/>
7 </a>

am I right in thinking that it would be correct to attach the 'foo'
namespace declaration at any of nodes c /and/ d, or at node b, or at
node a, and the 'bar' namespace declaration at either node e or node a?

Clearly not duplicating the declaration makes the job of the parser
easier. Is there any good reason not to pre-scan the tree an collect all
of the namespaces used and declare them on the root element of the
document? Looking at the 'algorithms' page it seems that unless two
elements use the same prefix to indicate different namespaces, there
should be no problem in 'shuffling' namespace declaration as high up the
tree as possible.

--
(E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; When all else fails, read the distractions.

 
Reply With Quote
 
Bjoern Hoehrmann
Guest
Posts: n/a
 
      02-11-2006
* Simon Brooke wrote in comp.text.xml:
>OK, so: we look at a node and see if it needs a namespace, and if it does
>we generate a namespace declaration. Suppose we have a structure
>
>1 <a>
>2 <b>
>3 <foo:c/>
>4 <foo:d/>
>5 </b>
>6 <bar:e/>
>7 </a>
>
>am I right in thinking that it would be correct to attach the 'foo'
>namespace declaration at any of nodes c /and/ d, or at node b, or at
>node a, and the 'bar' namespace declaration at either node e or node a?


xmlns:foo must be in scope of c and d, adding them there would do the
job, as well as adding them to one of the ancestors. Adding them to
a,b,c,d would also be possible, for example, but probably be redundant.
Note that 'foo' might map to different namespace names on different
elements, e.g.

<x>
<y:z xmlns:y='foo' />
<y:z xmlns:y='bar' />
</x>

would also be possible and there might be content that depends on the
prefixes (e.g., XPath expressions in a XSLT document), so if you have

<x some-qname-attribute='y:z' xmlns:y='foo'>
<y:example />
</x>

mapping that to

<x some-qname-attribute='y:z'>
<y:example xmlns:y='foo' />
</x>

might be a bad idea.

>Clearly not duplicating the declaration makes the job of the parser
>easier. Is there any good reason not to pre-scan the tree an collect all
>of the namespaces used and declare them on the root element of the
>document? Looking at the 'algorithms' page it seems that unless two
>elements use the same prefix to indicate different namespaces, there
>should be no problem in 'shuffling' namespace declaration as high up the
>tree as possible.


This is true in general, but it would turn a probably incorrect document
like

<x some-qname-attribute='y:z'>
<y:example xmlns:y='foo' />
</x>

into a correct document, which might not be intended. Of course, QNames
in content might not be a concern for your application.
--
Björn Höhrmann · (E-Mail Removed) · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
 
Reply With Quote
 
Simon Brooke
Guest
Posts: n/a
 
      02-11-2006
in message <(E-Mail Removed) ehrmann.de>,
Bjoern Hoehrmann ('(E-Mail Removed)') wrote:

> * Simon Brooke wrote in comp.text.xml:
>>OK, so: we look at a node and see if it needs a namespace, and if it
>>does we generate a namespace declaration. Suppose we have a structure
>>
>>1 <a>
>>2 <b>
>>3 <foo:c/>
>>4 <foo:d/>
>>5 </b>
>>6 <bar:e/>
>>7 </a>
>>
>>am I right in thinking that it would be correct to attach the 'foo'
>>namespace declaration at any of nodes c /and/ d, or at node b, or at
>>node a, and the 'bar' namespace declaration at either node e or node a?

>
> xmlns:foo must be in scope of c and d, adding them there would do the
> job, as well as adding them to one of the ancestors. Adding them to
> a,b,c,d would also be possible, for example, but probably be redundant.
> Note that 'foo' might map to different namespace names on different
> elements, e.g.
>
> <x>
> <y:z xmlns:y='foo' />
> <y:z xmlns:y='bar' />
> </x>
>
> would also be possible and there might be content that depends on the
> prefixes (e.g., XPath expressions in a XSLT document), so if you have
>
> <x some-qname-attribute='y:z' xmlns:y='foo'>
> <y:example />
> </x>
>
> mapping that to
>
> <x some-qname-attribute='y:z'>
> <y:example xmlns:y='foo' />
> </x>
>
> might be a bad idea.
>
>>Clearly not duplicating the declaration makes the job of the parser
>>easier. Is there any good reason not to pre-scan the tree an collect
>>all of the namespaces used and declare them on the root element of the
>>document? Looking at the 'algorithms' page it seems that unless two
>>elements use the same prefix to indicate different namespaces, there
>>should be no problem in 'shuffling' namespace declaration as high up
>>the tree as possible.

>
> This is true in general, but it would turn a probably incorrect
> document like
>
> <x some-qname-attribute='y:z'>
> <y:example xmlns:y='foo' />
> </x>
>
> into a correct document, which might not be intended. Of course, QNames
> in content might not be a concern for your application.


OK, my algorithm at this stage is as follows

if ( responsibleForNamespaceDeclarations )
{
try
{
spaces = recursivelyCollectNamespaces( node );

Enumeration keys = spaces.keys( );

while ( keys.hasMoreElements( ) )
{
String key = keys.nextElement( ).toString( );
printNS( key, spaces.get( key ).toString( ), out,
level + 1 );
}

responsibleForNamespaceDeclarations = false;
}
catch ( NamespaceCollisionException e )
{
String uri = node.getNamespaceURI( );
String prefix = node.getPrefix( );

if ( ( uri != null ) && ( prefix != null ) )
{
printNS( prefix, uri, out, level + 1);
}

System.err.println( "Namespace clash: " + e.getMessage( ) );
}
}
...
for ( int i = 0; i < children.length(); i++ )
{
print( children.item( i ), out, level + 1,
responsibleForNamespaceDeclarations );
}

That is to say, when printing an element node, I do recursive descent to
collect all the namespaces down tree from it. If there is a collision,
then if I have a local namespace to deal with, I deal with that locally,
and leave responsibility for printing namespaces set for the child
nodes. If there is no collision, then I deal with all the down-tree
namespaces and clear the responsibleForNamespaceDeclarations flag.

Can anyone see problems with this? And what do I do about the default
namespace? Will the default namespace have getNamespaceURI() non-null
and getPrefix() null?

--
(E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/

The Conservative Party is now dead. The corpse may still be
twitching, but resurrection is not an option - unless Satan
chucks them out of Hell as too objectionable even for him.

 
Reply With Quote
 
Bjoern Hoehrmann
Guest
Posts: n/a
 
      02-11-2006
* Simon Brooke wrote in comp.text.xml:
>Can anyone see problems with this? And what do I do about the default
>namespace? Will the default namespace have getNamespaceURI() non-null
>and getPrefix() null?


http://lists.w3.org/Archives/Public/...5Dec/0017.html
--
Björn Höhrmann · (E-Mail Removed) · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      02-11-2006
Simon Brooke wrote:
>? Will the default namespace have getNamespaceURI() non-null
> and getPrefix() null?


Yes.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
Simon Brooke
Guest
Posts: n/a
 
      02-11-2006
in message <(E-Mail Removed)>, Joe Kesselman
('(E-Mail Removed)') wrote:

> Simon Brooke wrote:
>>? Will the default namespace have getNamespaceURI() non-null
>> and getPrefix() null?

>
> Yes.


Thanks.

--
(E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; no eternal reward will forgive us now for wasting the dawn.
;; Jim Morrison

 
Reply With Quote
 
Simon Brooke
Guest
Posts: n/a
 
      02-11-2006
in message <(E-Mail Removed)>, Joe Kesselman
('(E-Mail Removed)') wrote:

> Simon Brooke wrote:
>>? Will the default namespace have getNamespaceURI() non-null
>> and getPrefix() null?

>
> Yes.


Thanks.

[did I reply to this already?]

--
(E-Mail Removed) (Simon Brooke) http://www.jasmine.org.uk/~simon/
Iraq war: it's time for regime change...
... go now, Tony, while you can still go with dignity.
[update 18 months after this .sig was written: it's still relevant]
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
table row property DOM2 sudhaoncyberworld@gmail.com Javascript 4 12-06-2005 03:28 PM
space on text node DOM2 sudhaoncyberworld@gmail.com Javascript 3 12-06-2005 12:39 PM
Get the table row index from table DOM2 sudhaoncyberworld@gmail.com Javascript 3 12-05-2005 02:31 PM
radio button/check box selecting default problem DOM2 sudhaoncyberworld@gmail.com Javascript 3 11-28-2005 03:08 PM
How to access ul element's style via DOM2? Spartanicus HTML 2 11-10-2004 04:33 AM



Advertisments