Wolfgang wrote:
> I have a very simple servlet, see code at
>
> http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java
>
> The servlet reads lines from a file tmp1.txt and just writes them back
> to a Web page.
>
> The lines in tmp1.txt contain UTF-8 encoded text, including some
> special characters, such as the Norwegian 'ø' as in Magerøya, or the
> German 'ö' as in Sömmerda.
>
> The servlet generates the Web page ok, listing the lines from file
> tmp1.txt.
>
> However, the special characters like 'ø' and 'ö' don't show up on the
> Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
> regular characters are fine).
>
> Why are the special characters mangled, and what do I do to have them
> show up properly on the Web page?
These are typical symptoms of a character encoding mismatch. I see in
your source code that you use the system's default encoding to read the
text file. If the system default is not UTF-8 then that will be a
problem. You almost have it right in that regard: just pass the string
"UTF-8" an additional parameter to your InputStreamReader's constructor
(you will also have to add a handler for an additional checked exception).
Your output HTML is also a bit funky, as you are declaring an XML
document with the HTML 4 / transitional DTD. Transitional HTML 4 is not
necessarily well-formed XML. You should probably either drop the XML
declaration or (if you can) go all the way to XHTML. This is probably
not the cause of your current problem, but either way, I recommend that
you specify the charset in the response's content-type, rather than
relying on the XML declaration. To do so, use
response.setContentType("text/html; charset=UTF-8");
John Bollinger