![]() |
|
|
|
#1 |
|
I have the following code fragment in a tiny webserver:
... os = sock.socket().getOutputStream(); osr = new PrintWriter(new PrintStream(os, true, "UTF-8")); osr.println("HTTP/1.1 200 OK"); osr.println("Content-Type: text/html; charset=utf-8"); osr.println(); osr.println(test()); ... private String test() { String ret = null; try { StringBuffer tmpl = new StringBuffer ("<html><head></head><body>H\u00e2n</body></html>"); ret = tmpl.toString(); } catch (Exception e) { e.printStackTrace(); } System.out.println(ret); return ret; } With Linux, firefox and opera there is no problem and the a with circumflex is printed nicely. On Windows xp I get neither firefox nor IE to work correctly. Firefox shows some FFFD square, but when I change from the (detected) UTF-8 encoding to ISO-8859-1, it displays things correctly. But that would be the rwong encoding!? IE shows some empty rectangle in the main browser window, but when looking at the page source, everything is shown correctly!? I have seen the correct output, but don't remember how I got it; so it's not missing glyphs. This is probably not a Java question, as I suspect some windows magic to happen here. Maybe it has something to do with the infamous BOM? (I tried setting "file.encoding" to "UTF-8" for what it's worth. And the cmd prompt from the out.println then o with circumflex, but that's due to the windows legacy encoding, I think.) Michael Michael Jung |
|
|
|
|
#2 |
|
Posts: n/a
|
Michael Jung wrote:
> I have the following code fragment in a tiny webserver: > > ... > os = sock.socket().getOutputStream(); > osr = new PrintWriter(new PrintStream(os, true, "UTF-8")); > osr.println("HTTP/1.1 200 OK"); > osr.println("Content-Type: text/html; charset=utf-8"); > osr.println(); > osr.println(test()); > ... > > private String test() { > String ret = null; > try { > StringBuffer tmpl = new StringBuffer > ("<html><head></head><body>H\u00e2n</body></html>"); > ret = tmpl.toString(); > } > catch (Exception e) { > e.printStackTrace(); > } > System.out.println(ret); > return ret; > } > > With Linux, firefox and opera there is no problem and > the a with circumflex is printed nicely. > > On Windows xp I get neither firefox nor IE to work correctly. > > Firefox shows some FFFD square, but when I change from the (detected) > UTF-8 encoding to ISO-8859-1, it displays things correctly. But that > would be the rwong encoding!? > > IE shows some empty rectangle in the main browser window, but when > looking at the page source, everything is shown correctly!? > > I have seen the correct output, but don't remember how I got it; so > it's not missing glyphs. > > This is probably not a Java question, as I suspect some windows magic > to happen here. Maybe it has something to do with the infamous BOM? > (I tried setting "file.encoding" to "UTF-8" for what it's worth. And > the cmd prompt from the out.println then o with circumflex, but that's > due to the windows legacy encoding, I think.) > > Michael Michael: I've been playing around with this and I can't get it to work correctly on Windows or Linux. I tried just putting a file with the 0xE2 character on my web server (which is set to default to UTF- a black square rotated 45 degrees with a white ? in it. If I reset the character encoding to IS0-8859-1 on the browser the character appears correctly. There is something I don't understand here and hopefully you will get a better answer. -- Knute Johnson email s/nospam/knute2009/ -- Posted via NewsDemon.com - Premium Uncensored Newsgroup Service ------->>>>>>http://www.NewsDemon.com<<<<<<------ Unlimited Access, Anonymous Accounts, Uncensored Broadband Access Knute Johnson |
|
|
|
#3 |
|
Posts: n/a
|
On Mon, 10 Aug 2009 23:29:04 +0200, Michael Jung
<> wrote, quoted or indirectly quoted someone who said : >With Linux, firefox and opera there is no problem and >the a with circumflex is printed nicely. 0x00e2 is supposed to be â in UTF-8, Unicode and ISO-8859-1 However, in a proprietary windows encoding, it could be anything. What encoding is your System.out.println using? To find out, dump a set of chars 0 .. 255 to System.out and redirect them to a file. Then look at the file with the EncodingRecogniser utility. See http://mindprod.com/jgloss/encoding.html You might find windows-1252, Cp437, Cp850... Also try dumping out the character as hex. You will see it is likely just fine. It is just System.out screwing it up. -- Roedy Green Canadian Mind Products http://mindprod.com "You can have quality software, or you can have pointer arithmetic; but you cannot have both at the same time." ~ Bertrand Meyer (born: 1950 age: 59) 1989, creator of design by contract and the Eiffel language. Roedy Green |
|
|
|
#4 |
|
Posts: n/a
|
On Mon, 10 Aug 2009 23:29:04 +0200, Michael Jung
<> wrote, quoted or indirectly quoted someone who said : > >On Windows xp I get neither firefox nor IE to work correctly. some other things to try: 1. use Wireshark to snoop on the messages your server is sending. See if problem is in the server or the client browser. Make sure your headers and body are encoded as you intended. see http://mindprod.com/jgloss/wireshark.html 2. Check the font. If your font does not support â it won't support an embedded 0x00e2. Try embedding â (the entity, not the hex) in your text body. use http://mindprod.com/jgloss/fontshower.html to make sure the font supports â -- Roedy Green Canadian Mind Products http://mindprod.com "You can have quality software, or you can have pointer arithmetic; but you cannot have both at the same time." ~ Bertrand Meyer (born: 1950 age: 59) 1989, creator of design by contract and the Eiffel language. Roedy Green |
|
|
|
#5 |
|
Posts: n/a
|
Steven Simpson <> writes:
> Michael Jung wrote: >> I have the following code fragment in a tiny webserver: >> ... >> os = sock.socket().getOutputStream(); >> osr = new PrintWriter(new PrintStream(os, true, "UTF-8")); > This looks rather strange. I'd prefer to go for something like this: > > new PrintWriter(new OutputStreamWriter(os, "UTF-8")) I used to have new PrintWriter(os), but wanted to enforce the encoding and PrintWriter doesn't take one. *That* would be a convenience constructor needed. > Here's what I suspect: > > * PrintStream is an OutputStream, so most of its methods just takes > bytes, and it happens to have a few more which take chars and > Strings. These extra methods will do the char->UTF-8 conversion > (an internal OutputStreamWriter is created), but the byte-based > methods can't - they're already bytes. > * PrintWriter can take an OutputStream. If it does so, it will also > insert its own OutputStreamWriter (using the local system's charset). > * Chars passed to the PrintWriter are converted using its > OutputStreamWriter, and never get passed on to the > char/String-based methods of the PrintStream, so its charset > encoder does not get used. > > Result: you're writing using the native encoding of your server, > regardless of what you tell the PrintStream. Now that you mention it, this is what I found in the PrintStream Javadoc: "All characters printed by a PrintStream are converted into bytes using the platform's default character encoding. The PrintWriter class should be used in situations that require writing characters rather than bytes." It even says so in the Javadoc of the constructor I used. *blush* Thank you very much. Bonus question: what is the encoding parameter good for in the constructor of the PrintStream? It actually lead me on the false track. Michael Michael Jung |
|
|
|
#6 |
|
Posts: n/a
|
> osr.println("HTTP/1.1 200 OK");
> osr.println("Content-Type: text/html; charset=utf-8"); > osr.println(); > osr.println(test()); > With Linux, firefox and opera there is no problem and > the a with circumflex is printed nicely. I don't think it is required to work even with plain ASCII, especially on linux.: 1. public void println() Terminate the current line by writing the line separator string. The line separator string is defined by the system property line.separator, and is not necessarily a single newline character ('\n'). 2. Response = Status-Line ; Section 6.1 *(( general-header ; Section 4.5 | response-header ; Section 6.2 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 7.2 CRLF = CR LF CR = <US-ASCII CR, carriage return (13)> LF = <US-ASCII LF, linefeed (10)> jolz |
|
|
|
#7 |
|
Posts: n/a
|
Thomas Pornin <> writes:
> In the Javadoc of JDK-1.1.8, PrintStream was documented as > being deprecated. Both public constructors include the comment: > "Note: PrintStream() is deprecated." and go on to state that > PrintWriter should be used. > > In JDK-1.3.1, the comments about deprecation are gone (I do not have > the Javadoc for JDK-1.2, so I cannot check there). PrintStream got > "reprecated". At some point between 1.1.8 and 1.3.1, Sun realized > that explicit deprecation is not enough to get rid of a troublesome > class, and that too much code was using PrintStream to allow for > a simple removal (it would break too much existing code). It would not be enough, but it would help. Or does the danger of refactoring wrongly (by people trying to get rid of every warning in sight) outweigh the benfits of a cleaner IF with deprecated parts? Michael Michael Jung |
|
|
|
#8 |
|
Posts: n/a
|
Thomas Pornin wrote:
> Backward compatibility goes to > a great extent to explain why Java is as it is nowadays. Examples > of quirks include the following: .... > -- There are both java.net.URI and java.net.URL, with oh-so-slightly > different handlings of nominally invalid URLs (especially when there > are spaces in the string). That one doesn't belong on your list. The classes exist to handle the functional differences between URIs generally and URLs specifically. As the URI Javadocs state: > The conceptual distinction between URIs and URLs is reflected in the > differences between this class and the URL class. -- Lew Lew |
|
|
|
#9 |
|
Posts: n/a
|
Lew wrote:
> Thomas Pornin wrote: >> Backward compatibility goes to >> a great extent to explain why Java is as it is nowadays. Examples >> of quirks include the following: >> Strings consist in sequences of 'char', not 'int'. I'd put this one as "chars are fixed at 16 bits rather than simply 'big enough to hold all Unicode characters'". 24 bits would be sufficient to get rid of surrogates. And I'd add: NullPointerExceptions in a language that insists it doesn't have pointers. In DOM, the null namespace is represents by a null String. In SAX, by an empty string. >> -- There are both java.net.URI and java.net.URL, with >> oh-so-slightly >> different handlings of nominally invalid URLs (especially when >> there >> are spaces in the string). > > That one doesn't belong on your list. The classes exist to handle > the > functional differences between URIs generally and URLs specifically. > As the URI Javadocs state: >> The conceptual distinction between URIs and URLs is reflected in >> the >> differences between this class and the URL class. It belongs on a different list, one where Java accurately models a historical quirk in a different domain. Mike Schilling |
|
|
|
#10 |
|
Posts: n/a
|
jolz <> writes:
>> osr.println("HTTP/1.1 200 OK"); >> osr.println("Content-Type: text/html; charset=utf-8"); >> osr.println(); >> osr.println(test()); [...] > I don't think it is required to work even with plain ASCII, especially > on linux.: > public void println() > Terminate the current line by writing the line separator > string. The line separator string is defined by the system property > line.separator, and is not necessarily a single newline character > ('\n'). > Response = Status-Line ; Section 6.1 > *(( general-header ; Section 4.5 > | response-header ; Section 6.2 > | entity-header ) CRLF) ; Section 7.1 > CRLF > [ message-body ] ; Section 7.2 1. What I described would have been a strange phenomenom of this error indeed. 2. You are right. 3. I have yet to meet a client to complain. Michael Michael Jung |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to Reset / Recover Forgotten Windows NT / 2000 / XP / 2003 Administrator Password | wskaihd | Software | 2 | 11-17-2009 02:01 AM |
| How to activate Remote Assistance with XP using Windows Live Messenger | Oziisr | General Help Related Topics | 0 | 02-01-2008 04:45 PM |
| Computer Security | aldrich.chappel.com.use@gmail.com | A+ Certification | 0 | 11-27-2007 02:11 AM |
| MCITP: Enterprise Support Technician | MileHighWelch | MCITP | 1 | 06-19-2007 10:25 PM |
| Re: Question about MS critical updates | John Coode | A+ Certification | 0 | 06-30-2004 06:08 PM |