Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XML version without UTF8

Reply
Thread Tools

XML version without UTF8

 
 
Hapa
Guest
Posts: n/a
 
      07-28-2009
Hello all,
we are using msxml.dll (version 1) and Visual C++6.0.

There is a way to automatically write the processing instruction prior
saving the XMLDOMDocument.

VARIANT NodeType;
NodeType.vt = VT_I4; V_I4(&NodeType) = MSXML::NODE_PROCESSING_INSTRUCTION;
CComBSTR PITarget = ("xml");

XMLDOMNodePtr pProcInstr;
m_pXMLDocumentNode->createNode(NodeType, PITarget, NULL, &pProcInstr)
m_pXMLDocumentNode->appendChild( pProcInstr, NULL


Question: Why our processing instruction always has the UTF8 missing.
and looking like this. <?xml version="1.0" ?> instead of <?xml
version="1.0" encoding="UTF-8" ?>
This is related to _UNICODE or _MBCS?

Thanks.
hapa


 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      07-28-2009
UTF8 or UTF16 are the default assumptions if no encoding is specified.

http://www.w3.org/TR/REC-xml/#charencoding
 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      07-28-2009
Hapa wrote:
> Hello all,
> we are using msxml.dll (version 1) and Visual C++6.0.


Version 1? The latest is MSXML 6, I don't think MSXML 1 is supported.

> There is a way to automatically write the processing instruction prior
> saving the XMLDOMDocument.
>
> VARIANT NodeType;
> NodeType.vt = VT_I4; V_I4(&NodeType) = MSXML::NODE_PROCESSING_INSTRUCTION;
> CComBSTR PITarget = ("xml");
>
> XMLDOMNodePtr pProcInstr;
> m_pXMLDocumentNode->createNode(NodeType, PITarget, NULL, &pProcInstr)
> m_pXMLDocumentNode->appendChild( pProcInstr, NULL
>
>
> Question: Why our processing instruction always has the UTF8 missing.
> and looking like this. <?xml version="1.0" ?> instead of <?xml
> version="1.0" encoding="UTF-8" ?>
> This is related to _UNICODE or _MBCS?


I don't see you creating any encoding. You could do it using the
createProcessingInstruction method
(http://msdn.microsoft.com/en-us/libr...39(VS.85).aspx)
(JScript pseudo code, please translate to C++ yourself)
var pi = doc.createProcessingInstruction('xml',
'version="1.0" encoding="UTF-8"');

That way MSXML will write out the XML declaration with the specified
encoding when saving the DOM document to a file or stream. If you simply
access the xml property to get a string serialization of the DOM
document then I think (depeding on the MSXML version) you will either
not get any encoding shown or you will get encoding="UTF-16" as that is
the encoding of the string.





--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      07-28-2009
> I don't see you creating any encoding. You could do it using the
> createProcessingInstruction method
> (http://msdn.microsoft.com/en-us/libr...39(VS.85).aspx)
> (JScript pseudo code, please translate to C++ yourself)
> var pi = doc.createProcessingInstruction('xml',
> 'version="1.0" encoding="UTF-8"');


Uhm... No. The XML Declaration, while it has the syntax of a processing
instruction, is not a processing instruction. If MSXML is doing it this
way, MSXML is wrong. Modern versions of XML APIs (DOM, SAX, etc) should
all have an explicit mechanism for specifying encoding, and the
serializer should Do The Right Thing with that information.
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      07-28-2009
Joe Kesselman wrote:
>> I don't see you creating any encoding. You could do it using the
>> createProcessingInstruction method
>> (http://msdn.microsoft.com/en-us/libr...39(VS.85).aspx)
>> (JScript pseudo code, please translate to C++ yourself)
>> var pi = doc.createProcessingInstruction('xml',
>> 'version="1.0" encoding="UTF-8"');

>
> Uhm... No. The XML Declaration, while it has the syntax of a processing
> instruction, is not a processing instruction. If MSXML is doing it this
> way, MSXML is wrong. Modern versions of XML APIs (DOM, SAX, etc) should
> all have an explicit mechanism for specifying encoding, and the
> serializer should Do The Right Thing with that information.


MSXML has DOM Level 1 but the only way to create an XML declaration is
to create it as a processing instruction, even if it technically is none.

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      07-28-2009
Martin Honnen wrote:

>> Uhm... No. The XML Declaration, while it has the syntax of a
>> processing instruction, is not a processing instruction. If MSXML is
>> doing it this way, MSXML is wrong. Modern versions of XML APIs (DOM,
>> SAX, etc) should all have an explicit mechanism for specifying
>> encoding, and the serializer should Do The Right Thing with that
>> information.

>
> MSXML has DOM Level 1 but the only way to create an XML declaration is
> to create it as a processing instruction, even if it technically is none.


And with MSXML it is not only for outputting that you treat the XML
declaration as a pi, if you load an XML document with an XML declaration
then in MSXML's DOM tree the XML declaration shows up as the first child
of the document node and the nodeType is 7 for pi.

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      07-28-2009
Martin Honnen wrote:
> Martin Honnen wrote:
>
>>> Uhm... No. The XML Declaration, while it has the syntax of a
>>> processing instruction, is not a processing instruction. If MSXML is
>>> doing it this way, MSXML is wrong. Modern versions of XML APIs (DOM,
>>> SAX, etc) should all have an explicit mechanism for specifying
>>> encoding, and the serializer should Do The Right Thing with that
>>> information.

>>
>> MSXML has DOM Level 1 but the only way to create an XML declaration is
>> to create it as a processing instruction, even if it technically is none.


And with the DOM Level 2 implementations in Firefox or Safari I think
the only way to suggest an encoding for serialization is to use
createProcessingInstruction to create an XML declaration as a pi and
insert that as the first child of the XML DOM document.

DOM Level 3 Load and Save kind of never made it into browsers I think,
besides Opera which has some support for that.


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      07-28-2009
Martin Honnen wrote:
> MSXML has DOM Level 1


Ah. So they never upgraded to DOM Level 2 or Level 3? That's a distinct
shame -- it means their users are stuck working with a
non-namespace-aware DOM, don't have DOM event handling, don't have the
DOM serialization interface, and are missing some of the other items we
put in because they were clearly needed based on real-world experience
and the evolving standards.

I presume IE's DOM implementation is more up-to-date, though I haven't
checked recently.

(Since I don't use the MS tools, I haven't been keeping track of their
status.)
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      07-29-2009
Joe Kesselman wrote:
> Martin Honnen wrote:
>> MSXML has DOM Level 1

>
> Ah. So they never upgraded to DOM Level 2 or Level 3? That's a distinct
> shame -- it means their users are stuck working with a
> non-namespace-aware DOM, don't have DOM event handling, don't have the
> DOM serialization interface, and are missing some of the other items we
> put in because they were clearly needed based on real-world experience
> and the evolving standards.


No, MSXML is namespace aware, but not the way the W3C specifies it in
DOM Level 2 or 3. Instead of createElementNS or createAttributeNS MSXML
has a method createNode on the document interface that allows you to
create elements or attributes in a namespace by passing in the node
type, the name and the namespace if needed. And to find elements or
attributes in a namespace there is no getElementsByTagNameNS but rather
methods like selectNodes and selectSingleNode that take an XPath 1.0
expression.

> I presume IE's DOM implementation is more up-to-date, though I haven't
> checked recently.


For XML documents IE uses MSXML anyway. In my view the HTML DOM in IE is
also only on DOM Level 1 as far as compliance to the W3C DOM goes but it
has rich proprietary extensions to deal with events, stylesheets, editing.


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      07-29-2009
Martin Honnen wrote:
> No, MSXML is namespace aware, but not the way the W3C specifies it in
> DOM Level 2 or 3.


So MS's customers can't code portable solutions. As I said, I consider
that a distinct pity, since the whole point of creating the DOM API was
to allow code to be moved/reused without rewriting.

(I will admit that my own code deliberately violated one detail of
another W3C recommendation ... but it's one that almost nobody uses and
that the W3C itself is now phasing out.)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
given char* utf8, how to read unicode line by line, and output utf8 gry C++ 2 03-13-2012 04:32 AM
Is it possible to consume UTF8 XML documents using xml.dom.pulldom? Simon Willison Python 10 07-31-2008 10:41 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? MowGreen [MVP] ASP .Net 5 02-09-2008 01:55 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? PA Bear [MS MVP] ASP .Net 0 02-05-2008 03:28 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? V Green ASP .Net 0 02-05-2008 02:45 AM



Advertisments