Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > "Mangled" Servlet Unicode Output Characters

Reply
Thread Tools

"Mangled" Servlet Unicode Output Characters

 
 
Wolfgang
Guest
Posts: n/a
 
      06-09-2004
I have a very simple servlet, see code at

http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java

The servlet reads lines from a file tmp1.txt and just writes them back
to a Web page.

The lines in tmp1.txt contain UTF-8 encoded text, including some
special characters, such as the Norwegian 'ø' as in Magerøya, or the
German 'ö' as in Sömmerda.

The servlet generates the Web page ok, listing the lines from file
tmp1.txt.

However, the special characters like 'ø' and 'ö' don't show up on the
Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
regular characters are fine).

Why are the special characters mangled, and what do I do to have them
show up properly on the Web page?

Thanks for your help and advice.

Wolfgang,
Santa Barbara, CA
 
Reply With Quote
 
 
 
 
John C. Bollinger
Guest
Posts: n/a
 
      06-09-2004
Wolfgang wrote:

> I have a very simple servlet, see code at
>
> http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java
>
> The servlet reads lines from a file tmp1.txt and just writes them back
> to a Web page.
>
> The lines in tmp1.txt contain UTF-8 encoded text, including some
> special characters, such as the Norwegian 'ø' as in Magerøya, or the
> German 'ö' as in Sömmerda.
>
> The servlet generates the Web page ok, listing the lines from file
> tmp1.txt.
>
> However, the special characters like 'ø' and 'ö' don't show up on the
> Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
> regular characters are fine).
>
> Why are the special characters mangled, and what do I do to have them
> show up properly on the Web page?


These are typical symptoms of a character encoding mismatch. I see in
your source code that you use the system's default encoding to read the
text file. If the system default is not UTF-8 then that will be a
problem. You almost have it right in that regard: just pass the string
"UTF-8" an additional parameter to your InputStreamReader's constructor
(you will also have to add a handler for an additional checked exception).

Your output HTML is also a bit funky, as you are declaring an XML
document with the HTML 4 / transitional DTD. Transitional HTML 4 is not
necessarily well-formed XML. You should probably either drop the XML
declaration or (if you can) go all the way to XHTML. This is probably
not the cause of your current problem, but either way, I recommend that
you specify the charset in the response's content-type, rather than
relying on the XML declaration. To do so, use
response.setContentType("text/html; charset=UTF-8");


John Bollinger

 
Reply With Quote
 
 
 
 
Wolfgang
Guest
Posts: n/a
 
      06-09-2004
Thanks, John

for your corrections to my code. This makes things work.

For those interested, I also found essentially the same advice (with
more detail) at
http://www.jorendorff.com/articles/unicode/java.html

Wolfgang


"John C. Bollinger" <> wrote:
>
>These are typical symptoms of a character encoding mismatch. I see in
>your source code that you use the system's default encoding to read the
>text file. If the system default is not UTF-8 then that will be a
>problem. You almost have it right in that regard: just pass the string
>"UTF-8" an additional parameter to your InputStreamReader's constructor
>(you will also have to add a handler for an additional checked exception).
>
>Your output HTML is also a bit funky, as you are declaring an XML
>document with the HTML 4 / transitional DTD. Transitional HTML 4 is not
>necessarily well-formed XML. You should probably either drop the XML
>declaration or (if you can) go all the way to XHTML. This is probably
>not the cause of your current problem, but either way, I recommend that
>you specify the charset in the response's content-type, rather than
>relying on the XML declaration. To do so, use
> response.setContentType("text/html; charset=UTF-8");
>
>John Bollinger
>
>
>Wolfgang wrote:
>
>> I have a very simple servlet, see code at
>>
>> http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java
>>
>> The servlet reads lines from a file tmp1.txt and just writes them back
>> to a Web page.
>>
>> The lines in tmp1.txt contain UTF-8 encoded text, including some
>> special characters, such as the Norwegian 'ø' as in Magerøya, or the
>> German 'ö' as in Sömmerda.
>>
>> The servlet generates the Web page ok, listing the lines from file
>> tmp1.txt.
>>
>> However, the special characters like 'ø' and 'ö' don't show up on the
>> Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
>> regular characters are fine).
>>
>> Why are the special characters mangled, and what do I do to have them
>> show up properly on the Web page?

>


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python unicode utf-8 characters and MySQL unicode utf-8 characters Grzegorz ¦liwiñski Python 2 01-19-2011 07:31 AM
Re: convert unicode characters to visibly similar ascii characters Laszlo Nagy Python 6 07-02-2008 04:42 PM
Re: convert unicode characters to visibly similar ascii characters M.-A. Lemburg Python 0 07-02-2008 08:39 AM
Re: convert unicode characters to visibly similar ascii characters Terry Reedy Python 0 07-01-2008 07:46 PM
Servlet question(Tomcat, web.xml, servlet-class, servlet-name) circuit_breaker Java 2 04-04-2004 03:26 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57