Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   Document.importNode(Node,boolean) - what supports it? (http://www.velocityreviews.com/forums/t484683-document-importnode-node-boolean-what-supports-it.html)

Simon Brooke 03-16-2007 12:32 AM

Document.importNode(Node,boolean) - what supports it?
 
The DOM API has included public Node importNode(Node,boolean) as a method
of the Document interface for a long time. Does anything actually
implement it? Xerces 2 is giving me:

org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
support the requested type of object or operation.
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown
Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown
Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse( MaybeParseGenerator.java:183)

This is so whether the node I'm trying to import is an
org.apache.xerces.dom.DeferredElementImpl (i.e. parsed with Xerces) or a
org.apache.crimson.tree.ElementNode (i.e. parsed with Crimson).

--
simon@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/
Ye hypocrites! are these your pranks? To murder men and give God thanks?
Desist, for shame! Proceed no further: God won't accept your thanks for
murther
-- Robert Burns, 'Thanksgiving For a National Victory'


Joe Kesselman 03-16-2007 02:07 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
Simon Brooke wrote:
> The DOM API has included public Node importNode(Node,boolean) as a method
> of the Document interface for a long time. Does anything actually
> implement it?


Certainly should work; I wrote Xerces' first implementation of that
function, and in fact was one of those who lobbied the DOM WG to include
it in the standard. If the node being imported properly implements the
DOM APIs, and the implementation being imported into doesn't have some
reason for blocking this (eg, that it's specifically a read-only DOM,
such as the DOM view of Xalan's internal data model), the function
should work. It isn't rocket science, after all; it's just a tree-walker
feeding a tree-builder.

I have to believe the problem resides in something you haven't told us.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Simon Brooke 03-16-2007 10:52 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
in message <iJudnWRzduZsZmTYnZ2dnUVZ_hSdnZ2d@comcast.com>, Joe Kesselman
('keshlam-nospam@comcast.net') wrote:

> Simon Brooke wrote:
>> The DOM API has included public Node importNode(Node,boolean) as a
>> method of the Document interface for a long time. Does anything actually
>> implement it?

>
> Certainly should work; I wrote Xerces' first implementation of that
> function, and in fact was one of those who lobbied the DOM WG to include
> it in the standard. If the node being imported properly implements the
> DOM APIs, and the implementation being imported into doesn't have some
> reason for blocking this (eg, that it's specifically a read-only DOM,
> such as the DOM view of Xalan's internal data model), the function
> should work. It isn't rocket science, after all; it's just a tree-walker
> feeding a tree-builder.
>
> I have to believe the problem resides in something you haven't told us.


OK, then I have to believe that, too. Furthermore, this is another of the
bits of my code that have been around for a long time (since 2003 in this
case), and I'm sure it used to work (but it may only ever have worked with
Crimson). I have had occasions in the past where I have inadvertently
depended on bugs in a library, and when that library has been fixed all my
code broke.

If this class fails, it returns a text node with a 'flat' representation of
the embedded markup. Looking at the production server logs I see that it
has been intermittently failing in this way for some time, but that the
failure simply has not been noticed. The failure on the production servers
is different from the failure on the development server, I'll detail that
difference below. The production severs use Crimson to parse, but Xerces
to construct documents - I can't remember why, but probably just an
oversight.

The class in question is:

//************************************************** *********************\
// *
// MaybeParseGenerator.java *
// *
// Author: Simon Brooke *
// Created: 17th January 2003 *
// $Revision: 1.7.4.3 $; $Date: 2006/09/04 13:45:54 $ *
// *
//************************************************** *********************/
package uk.co.weft.domutil;

import org.w3c.dom.Document;
import org.w3c.dom.Node;

import org.xml.sax.InputSource;

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;

import uk.co.weft.htform.ResourceConsumerImpl;


/*
* $Log: MaybeParseGenerator.java,v $
* Revision 1.7.4.3 2006/09/04 13:45:54 simon
* Added more debugging output. Have an intermittent bug in PRES which may
originate here.
*
* Revision 1.7.4.2 2005/12/30 16:54:00 simon
* EkitWidget now working remarkably well. Still some tidying up to do.
*
* Revision 1.7.4.1 2005/12/23 10:48:33 simon
* Brute force tidy up after CVS server crash: this time it should work.
*
* Revision 1.7 2005/02/05 17:40:17 simon
* Improved diagnostics on failure
*
* Revision 1.6 2004/07/14 12:52:34 simon
* Final commit for 1.10.0
*
* Revision 1.5 2004/06/17 15:10:38 simon
* Extends ResourceConsumerImpl to gain access to grs, etc
*
* Revision 1.4 2003/10/30 12:40:21 simon
* Added debug flag in domutil classes
*
* Revision 1.3 2003/08/20 09:38:35 simon
* Code cleanup with eclipse; mostly removal of exccessive includes
*
* Revision 1.2 2003/07/09 09:32:07 simon
* Initial work on HTML generation of widgets.
*
* Revision 1.1 2003/02/06 11:22:26 simon
* New superclass for node generators which may want to parse XML text.
*/

/**
* Abstract superclass for TextNodeGenerator and ElementGenerator, which
may
* want to parse their content. Parsing is potentially expensive, so if
* you're confident the value won't contain XML markup it may be worth
* setting allowEmbeddeMarkup( false).
*
* @author Simon Brooke
* @version $Revision: 1.7.4.3 $ This revision: $Author: simon $
*/
public abstract class MaybeParseGenerator extends ResourceConsumerImpl
{
//~ Instance fields -----------------------------------------------------

/**
* whether or not I'm in debug mode; if I am I may print debugging
* messages to System.err
*/
protected boolean debug = false;

/** By default we allow embedded markup in children */
protected boolean embeddedMarkup = true;

//~ Constructors --------------------------------------------------------

/**
* Creates a new MaybeParseGenerator object.
*/
public MaybeParseGenerator( )
{
// ...nothing...
}

//~ Methods -------------------------------------------------------------

/**
* whether or not to set debugging mode. If true, the generator _may_
* write debugging messages to System.err
*
* @param debug whether or not to set debugging mode
*
* @since Jacquard 1.10
*/
public void setDebug( boolean debug )
{
this.debug = debug;
}

/**
* Do we allow (and parse for) embedded markup within the value of this
* node? default is we do.
*
* @param allow if true, then allow embedded markup within my value
*/
public void allowEmbeddedMarkup( boolean allow )
{
embeddedMarkup = allow;
}

/**
* Construct a node representing this value. It's perfectly possible (and
* possibly legitimate) that the value of a child should contain embedded
* markup. If so, try to parse a node out of it.
*
* @param doc the document in which the node is to be created
* @param unparsed the string, possibly with embedded markup, to parse
*
* @exception GenerationException if parsing fails
*/
protected Node maybeParse( Document doc, String unparsed )
throws GenerationException
{
Node val = doc.createTextNode( unparsed ); // safe default

if ( debug )
{
System.err.println( "MaybeParseGenerator.maybeParse: parsing [" +
unparsed + "]" );
}

if ( unparsed != null ) // defensive
{
if ( embeddedMarkup && (
// if we allow embedded markup
unparsed.indexOf( "<" ) > -1 ) ) // it looks like markup
{
if ( !unparsed.trim( ).startsWith( "<" ) )
{
// nasty: if it contains markup, but
// isn't contained in markup, the
// parser will barf.
unparsed = "<parsed>" + unparsed + "</parsed>";
}

try
{
DocumentBuilder parser = DOMStub.getParser( );

if ( parser == null )
{
System.err.println( "Could not initialise XML parser" );
}

InputSource i =
new InputSource( new StringReader( unparsed ) );

// i.setCharacterStream( new StringReader( unparsed ) );
Document parsed = parser.parse( i );

if ( debug )
{
System.err.println( "Parsed document: " +
parsed.toString( ) );

if ( parsed != null )
{
Node root = parsed.getDocumentElement( );

if ( root != null )
{
System.err.println( "Root node: (" +
root.getClass( ).getName( ) + "): " +
root.toString( ) );
}
}
}

val = doc.importNode( parsed, true );

if ( debug )
{
System.err.println(
"MaybeParseGenerator.maybeParse: parse successful" );
new Printer( ).print( val, System.err );
}
}
catch ( Exception e )
{
System.err.println(
"MaybeParseGenerator.maybeParse(): Could not parse '" +
unparsed + "'as XML" );
e.printStackTrace( System.err );
}
}
}

return val;
}
}

/* [end of file] */


What I'm getting in the error stream on the development server is (with
parser unconfigured, i.e. using Tomcat's default, which is Xerces; see
below for Crimson):

ElementGenerator.generate: attempting to parse <div class="Intro">
Here be dragons!
</div>
MaybeParseGenerator.maybeParse: parsing [<div class="Intro">
Here be dragons!
</div>]
Parsed document: [#document: null]
Root node: (org.apache.xerces.dom.DeferredElementImpl): [div: null]
MaybeParseGenerator.maybeParse(): Could not parse '<div class="Intro">
Here be dragons!
</div>'as XML
org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
support the requested type of object or operation.
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse( MaybeParseGenerator.java:183)


(with parser configured as org.apache.crimson.tree.DOMImplementationImpl):

ElementGenerator.generate: attempting to parse <div class="Intro">
Here be dragons!
</div>
MaybeParseGenerator.maybeParse: parsing [<div class="Intro">
Here be dragons!
</div>]
Parsed document: org.apache.crimson.tree.XmlDocument@e9a0e9a
Root node: <div class="Intro">
Here be dragons!
</div>
MaybeParseGenerator.maybeParse(): Could not parse '<div class="Intro">
Here be dragons!
</div>'as XML
org.w3c.dom.DOMException: NOT_SUPPORTED_ERR: The implementation does not
support the requested type of object or operation.
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse( MaybeParseGenerator.java:173)


What's showing up in the production server logs is:
(Firstly, evidence that it sometimes does work):
ElementGenerator.generate: attempting to parse <div
class="Introduction"><p>Copies of documentation issued to licensees is
available in this section.</p></div>
ElementGenerator.generate: attempting to parse Cockle Bags - further
information


(Secondly, evidence that it sometimes doesn't):
ElementGenerator.generate: attempting to parse <div class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>
MayberParseGenerator.maybeParse(): Could not parse '<div
class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>'as XML
java.lang.NullPointerException
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown
Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown
Source)
at org.apache.xerces.dom.CoreDocumentImpl.importNode( Unknown
Source)
at
uk.co.weft.domutil.MaybeParseGenerator.maybeParse( MaybeParseGenerator
..java:163)

I've checked the libraries and the two instances above use the same
versions of the same libraries with the same configuration, so why

<div class="Introduction"><p>Copies of documentation issued to licensees is
available in this section.</p></div>

parses successfully and

<div class="Introduction">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>

fails to parse is frankly baffling me.

--
simon@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/
;; Let's have a moment of silence for all those Americans who are stuck
;; in traffic on their way to the gym to ride the stationary bicycle.
;; Rep. Earl Blumenauer (Dem, OR)


Joe Kesselman 03-16-2007 12:55 PM

Re: Document.importNode(Node,boolean) - what supports it?
 
Just a quick observation: Your "sometimes works" and "sometimes doesn't"
are significantly different:

> (Firstly, evidence that it sometimes does work):
> ElementGenerator.generate: attempting to parse <div
> class="Introduction"><p>Copies of documentation issued to licensees is
> available in this section.</p></div>


<div> has a <p> child.


> (Secondly, evidence that it sometimes doesn't):
> ElementGenerator.generate: attempting to parse <div class="Introduction">
> Ayrshire and Dumfrieshire Cyclists Association is a regional
> association
> of cycling clubs within the structure of Scottish Cycling.
> </div>


<div> contains only text. Haven't looked at the code yet, but are you
sure you aren't doing something simple like trying to import the string
value rather than a TextNode object?

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Joe Kesselman 03-17-2007 12:57 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
Also: You didn't show us the implementation of DOMStub... but with that
name, I wouldn't be at all surprised if you've got a subset
implementation there.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Joe Kesselman 03-17-2007 01:33 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
Well, I've reproduced the error message under Eclipse. Lemme see if I
can reproduce it with a current version of Xerces...



--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Joe Kesselman 03-17-2007 03:54 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
Oh. That's stupid; I should have remembered this:

http://www.w3.org/TR/2000/REC-DOM-Le...ent-importNode

You're attempting to import a Document node. That's forbidden. Import
its root element instead.

Yes, the error message could have been more helpful. I'd suggest posting
that as a suggestion on the Xerces users mailing list, since I'm not
sure any of the current Xerces maintainers are reading this list.


--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry

Bjoern Hoehrmann 03-17-2007 04:30 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
* Joe Kesselman wrote in comp.text.xml:
>You're attempting to import a Document node. That's forbidden. Import
>its root element instead.


Heh, I actually had a quick look into the Xerces source code when I
looked at the question, but that case was the only where the specific
claimed exception would be raised, and Simon said he tried to import
element nodes, so I concluded the issue is too weird to investigate
further...
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Simon Brooke 03-17-2007 09:24 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
in message <96-dnakiEL7D-2bYnZ2dnUVZ_q-vnZ2d@comcast.com>, Joe Kesselman
('keshlam-nospam@comcast.net') wrote:

> Oh. That's stupid; I should have remembered this:
>
>

http://www.w3.org/TR/2000/REC-DOM-Le...ent-importNode
>
> You're attempting to import a Document node. That's forbidden. Import
> its root element instead.
>
> Yes, the error message could have been more helpful. I'd suggest posting
> that as a suggestion on the Xerces users mailing list, since I'm not
> sure any of the current Xerces maintainers are reading this list.


Thank you. I was going to say indignantly 'oh no I don't', but on reading
through my code I see I get the root node of the document... and then
don't use it. Having fixed that, /this/ problem is solved, and I can now
replace vintage Crimson with current Xerces and my code still works.

Still can't get it to work with current Xalan, but that's another set of
problems...

--
simon@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; Good grief, I can remember when England won the Ashes.

Simon Brooke 03-17-2007 09:27 AM

Re: Document.importNode(Node,boolean) - what supports it?
 
in message <H7-dnWYAs-tAoWbYnZ2dnUVZ_q3inZ2d@comcast.com>, Joe Kesselman
('keshlam-nospam@comcast.net') wrote:

> Also: You didn't show us the implementation of DOMStub... but with that
> name, I wouldn't be at all surprised if you've got a subset
> implementation there.


No, it just allows me to select and configure the DOMImplementation I use:

/**
* Should be called before DOMStub is used, but perfectly safe to call
* more than once. If I've already been initialised, don't intialise me
* again.
*
* @param config my configuration
*
* @exception InitialisationException if requested DOM implementation
* can't be found
*/
public static void init( Context config ) throws InitialisationException
{
String s = config.getValueAsString( "dom_implementation_class" );

if ( domImp == null )
{
/* i.e., I have not already been initialised */
try
{
if ( s != null )
{
domImpName = s;
}

domImp =
(DOMImplementation) Class.forName( domImpName )
.newInstance( );
}
catch ( Exception any )
{
throw new InitialisationException( "Could not find DOM " +
"implementation " + domImpName );
}
}

Boolean b = config.getValueAsBoolean( "dom_coalescing" );

if ( b != null )
{
dbf.setCoalescing( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_expand_entity_references" );

if ( b != null )
{
dbf.setExpandEntityReferences( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_ignore_comments" );

if ( b != null )
{
dbf.setIgnoringComments( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_ignore_whitespace" );

if ( b != null )
{
dbf.setIgnoringElementContentWhitespace( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_namespace_aware" );

if ( b != null )
{
dbf.setNamespaceAware( b.booleanValue( ) );
}

b = config.getValueAsBoolean( "dom_validating" );

if ( b != null )
{
dbf.setValidating( b.booleanValue( ) );
}
}
}


--
simon@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

X-no-archive: No, I'm not *that* naive.



All times are GMT. The time now is 12:01 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.