Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Reading UTF-8 Data from XML file

Reply
Thread Tools

Reading UTF-8 Data from XML file

 
 
=?Utf-8?B?TWF0dCBIb2xsaW5nd29ydGg=?=
Guest
Posts: n/a
 
      05-26-2005
We have an XML file that contains text in various languages , ie English,
French, German and Chinese etc.
We currently have a StringWriter object that reads this in and transforms
against an XslTransform object.
the problem arises when we encounter Chinese characters; these characters
just come out as garbage in the internet explorer browser.

Setting the charset type on the .aspx page, in the web.config and in the
..xsl file to be transformed against has no effect.

Using a simple transform in classic ASP,
we can correctly display the text as its meant to be seen, however getting
the same output in c# seems a lot more tricky.

After trying various 'fixes' posted on several developer sites, nothing has
prevailed and the problem is still there.
We overloaded the StringWriter object to allow changing of the Encoding type
to force UTF-8 in, but to no avail.

When the transform is complete, we return the StringWriter objects .ToString
method.. This is where the error seems to lie,
because checking the .Encoding.EncodingName just prior to returning, its
labelled as 'Unicode (UTF-', however when output
to screen via a Text Literal, all we see is garbage.


Some of the charachters are replaced with ???????. We know are browser is
functioning correctly because we can see the types of text on
http://www.yahoo.com.hk



 
Reply With Quote
 
 
 
 
Joerg Jooss
Guest
Posts: n/a
 
      05-27-2005
Matt Hollingworth wrote:

> We have an XML file that contains text in various languages , ie
> English, French, German and Chinese etc.
> We currently have a StringWriter object that reads this in and
> transforms against an XslTransform object.


I really don't believe that you use a String*Writer* to *read* input


> the problem arises when we encounter Chinese characters; these
> characters just come out as garbage in the internet explorer browser.
>
> Setting the charset type on the .aspx page, in the web.config and in
> the .xsl file to be transformed against has no effect.
>
> Using a simple transform in classic ASP,
> we can correctly display the text as its meant to be seen, however
> getting the same output in c# seems a lot more tricky.
>
> After trying various 'fixes' posted on several developer sites,
> nothing has prevailed and the problem is still there.
> We overloaded the StringWriter object to allow changing of the
> Encoding type to force UTF-8 in, but to no avail.
>
> When the transform is complete, we return the StringWriter objects
> .ToString method.. This is where the error seems to lie,
> because checking the .Encoding.EncodingName just prior to returning,
> its labelled as 'Unicode (UTF-', however when output
> to screen via a Text Literal, all we see is garbage.
>
>
> Some of the charachters are replaced with ???????. We know are
> browser is functioning correctly because we can see the types of text
> on http://www.yahoo.com.hk


Characters and strings in .NET are always Unicode und use UTF-16 as
internal representation. This means
a) a UTF-8 StringWriter is an oxymoron
b) truely character-based operations aren't susceptible to encoding
problems
c) encodings are only relevant when you need to transport strings using
a byte representation, i.e. when rendering a string on web page. Make
sure that your web application uses UTF-8 (or any other UTF that suits
your needs) as response encoding.

Cheers,
--
http://www.joergjooss.de
(E-Mail Removed)
 
Reply With Quote
 
 
 
 
=?Utf-8?B?TWF0dCBIb2xsaW5nd29ydGg=?=
Guest
Posts: n/a
 
      05-31-2005
Joerg,

Thanks - A developer wrote this question...

We currently have a StringWriter object that reads this in and
> > transforms against an XslTransform object.


Sorry - this means that the result of a transformation of an XmlDocument
object is written to a string writer to clarify.


My Webform does use uft-8 response and request encoding and I have tried
using several other different encoding types to get it to work.

I can get chinese charachters to display but some of the content is still
broken, could the fact that my transformation results in a mixture of html
code + english text + chinese text be part of the problem?

It seems I get something like "藛鈥犆b偓鈥?/P>" notice the question mark and half
a </p> tag. I have disabled output escaping in my xslt but still to no avail.

Your help appreciated,
Thanks
Matt


"Joerg Jooss" wrote:

> Matt Hollingworth wrote:
>
> > We have an XML file that contains text in various languages , ie
> > English, French, German and Chinese etc.
> > We currently have a StringWriter object that reads this in and
> > transforms against an XslTransform object.

>
> I really don't believe that you use a String*Writer* to *read* input
>
>
> > the problem arises when we encounter Chinese characters; these
> > characters just come out as garbage in the internet explorer browser.
> >
> > Setting the charset type on the .aspx page, in the web.config and in
> > the .xsl file to be transformed against has no effect.
> >
> > Using a simple transform in classic ASP,
> > we can correctly display the text as its meant to be seen, however
> > getting the same output in c# seems a lot more tricky.
> >
> > After trying various 'fixes' posted on several developer sites,
> > nothing has prevailed and the problem is still there.
> > We overloaded the StringWriter object to allow changing of the
> > Encoding type to force UTF-8 in, but to no avail.
> >
> > When the transform is complete, we return the StringWriter objects
> > .ToString method.. This is where the error seems to lie,
> > because checking the .Encoding.EncodingName just prior to returning,
> > its labelled as 'Unicode (UTF-', however when output
> > to screen via a Text Literal, all we see is garbage.
> >
> >
> > Some of the charachters are replaced with ???????. We know are
> > browser is functioning correctly because we can see the types of text
> > on http://www.yahoo.com.hk

>
> Characters and strings in .NET are always Unicode und use UTF-16 as
> internal representation. This means
> a) a UTF-8 StringWriter is an oxymoron
> b) truely character-based operations aren't susceptible to encoding
> problems
> c) encodings are only relevant when you need to transport strings using
> a byte representation, i.e. when rendering a string on web page. Make
> sure that your web application uses UTF-8 (or any other UTF that suits
> your needs) as response encoding.
>
> Cheers,
> --
> http://www.joergjooss.de
> (E-Mail Removed)
>

 
Reply With Quote
 
Joerg Jooss
Guest
Posts: n/a
 
      05-31-2005
Matt Hollingworth wrote:

> Joerg,
>
> Thanks - A developer wrote this question...
>
> We currently have a StringWriter object that reads this in and
> > > transforms against an XslTransform object.

>
> Sorry - this means that the result of a transformation of an
> XmlDocument object is written to a string writer to clarify.
>
>
> My Webform does use uft-8 response and request encoding and I have
> tried using several other different encoding types to get it to work.
>
> I can get chinese charachters to display but some of the content is
> still broken, could the fact that my transformation results in a
> mixture of html code + english text + chinese text be part of the
> problem?


Only if you were not using Unicode. But since you use UTF-8 as response
encoding, and assuming you don't mistreat any string objects in your
code, that should not be a problem.

> It seems I get something like "藛鈥犆b偓鈥?/P>" notice the
> question mark and half a </p> tag.


What characters are missing in this string? Is it only the opening '<'?

Cheers,
--
http://www.joergjooss.de
(E-Mail Removed)
 
Reply With Quote
 
=?Utf-8?B?TWF0dCBIb2xsaW5nd29ydGg=?=
Guest
Posts: n/a
 
      06-01-2005


"Joerg Jooss" wrote:

> Matt Hollingworth wrote:
>
> > Joerg,
> >
> > Thanks - A developer wrote this question...
> >
> > We currently have a StringWriter object t

hat reads this in and
> > > > transforms against an XslTransform object.

> >
> > Sorry - this means that the result of a transformation of an
> > XmlDocument object is written to a string writer to clarify.
> >
> >
> > My Webform does use uft-8 response and request encoding and I have
> > tried using several other different encoding types to get it to work.
> >
> > I can get chinese charachters to display but some of the content is
> > still broken, could the fact that my transformation results in a
> > mixture of html code + english text + chinese text be part of the
> > problem?

>
> Only if you were not using Unicode. But since you use UTF-8 as response
> encoding, and assuming you don't mistreat any string objects in your
> code, that should not be a problem.
>
> > It seems I get something like "藛鈥犆b偓鈥?/P>" notice the
> > question mark and half a </p> tag.

>
> What characters are missing in this string? Is it only the opening '<'?
>
> Cheers,
> --
> http://www.joergjooss.de
> (E-Mail Removed)
>


Yes - although if i disable output escaping in my xsl i can see that ?lt; is
in the code as if the & has been replaced with a ?


Here is the code for your ref:
XmlDocument oDoc = new XmlDocument();
XslTransform oXsl = new XslTransform();

oDoc.Load(Server.MapPath(""));
oXsl.Load(Server.MapPath("xsl/x_language_test.xsl"));

StringWriter oSw = new StringWriter();

oXsl.Transform(oDoc,null,oSw);

litTestText.Text = oSw.ToString();


Thanks
Matt


 
Reply With Quote
 
=?Utf-8?B?TWF0dCBIb2xsaW5nd29ydGg=?=
Guest
Posts: n/a
 
      06-01-2005


"Matt Hollingworth" wrote:

>
>
> "Joerg Jooss" wrote:
>
> > Matt Hollingworth wrote:
> >
> > > Joerg,
> > >
> > > Thanks - A developer wrote this question...
> > >
> > > We currently have a StringWriter object t

> hat reads this in and
> > > > > transforms against an XslTransform object.
> > >
> > > Sorry - this means that the result of a transformation of an
> > > XmlDocument object is written to a string writer to clarify.
> > >
> > >
> > > My Webform does use uft-8 response and request encoding and I have
> > > tried using several other different encoding types to get it to work.
> > >
> > > I can get chinese charachters to display but some of the content is
> > > still broken, could the fact that my transformation results in a
> > > mixture of html code + english text + chinese text be part of the
> > > problem?

> >
> > Only if you were not using Unicode. But since you use UTF-8 as response
> > encoding, and assuming you don't mistreat any string objects in your
> > code, that should not be a problem.
> >
> > > It seems I get something like "藛鈥犆b偓鈥?/P>" notice the
> > > question mark and half a </p> tag.

> >
> > What characters are missing in this string? Is it only the opening '<'?
> >
> > Cheers,
> > --
> > http://www.joergjooss.de
> > (E-Mail Removed)
> >

>
> Yes - although if i disable output escaping in my xsl i can see that ?lt; is
> in the code as if the & has been replaced with a ?
>
>
> Here is the code for your ref:
> XmlDocument oDoc = new XmlDocument();
> XslTransform oXsl = new XslTransform();
>
> oDoc.Load(Server.MapPath(""));
> oXsl.Load(Server.MapPath("xsl/x_language_test.xsl"));
>
> StringWriter oSw = new StringWriter();
>
> oXsl.Transform(oDoc,null,oSw);
>
> litTestText.Text = oSw.ToString();
>
>
> Thanks
> Matt



having further investigated, i forgot to say that i only see what i do by
changing the encoding to simplified chinese in the browser, if i choose utf8
it is all still encoded like it appears in notepad if you click view source.

i did the same page in asp and it all displays correctly without issue.


>
>

 
Reply With Quote
 
Joerg Jooss
Guest
Posts: n/a
 
      06-03-2005
Matt Hollingworth wrote:

> Yes - although if i disable output escaping in my xsl i can see that
> ?lt; is in the code as if the & has been replaced with a ?
>
>
> Here is the code for your ref:
> XmlDocument oDoc = new XmlDocument();
> XslTransform oXsl = new XslTransform();
>
> oDoc.Load(Server.MapPath(""));
> oXsl.Load(Server.MapPath("xsl/x_language_test.xsl"));
>
> StringWriter oSw = new StringWriter();
>
> oXsl.Transform(oDoc,null,oSw);
>
> litTestText.Text = oSw.ToString();


Save for the wird Server.MapPath(""), there seems to be nothing wrong
here. I can only imagine that there's something wrong with the XSL
itself -- maybe somebody over in the XML group can help out.

Cheers,
--
http://www.joergjooss.de
(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Reading data from user-submitted XML file. L. Ximenes Javascript 7 10-28-2008 09:37 PM
Reading Data From An XML File mich Java 8 07-28-2007 01:18 PM
UnauthorizedAccessException when reading XML files (no problem when reading other file-types) blabla120@gmx.net ASP .Net 0 09-15-2006 02:08 PM
Problem to insert an XML-element by XSLT-converting from one XML-file into another XML-file jkflens XML 2 05-30-2006 09:41 AM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM



Advertisments