Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Character encoding (2)

Thread Tools

Character encoding (2)
Posts: n/a
Hello again,

Sorry for posting this again, but since my thread of last saturday kind
of ended on a dead track, I decided to post it brand new. Refer also

The problem I'm having is basically only on the server side...
I'm working on a server that should receive HTTP requests. It is
however possible that the request that arrives at the server is not
HTTP. This possibility is verified on the first byte of data.
(in other words:
if the first byte is equal to 0x01,
then not HTTP
else ... )

Given that the information is posted according to HTTP, I'm trying to
resolve the following: I don't know a priori which encoding is used for
the data stream. The following rules for encoding apply:

If the string (using regex) <?xml [^>]+encoding="([^"]+)" is
encountered, $1 is used for decoding, otherwise a default char set is
used. My goal is to both use the characters (i.e. the server's
'interpretation' of the bytes received) as the original byte stream. I
want to write to a file the original byte stream, while using the
derived character stream for processing (using beans, XSL
transformation etc.)

I tried simulating the client using a basic HTML page, with a FORM
action to my server's url. Now in HTML I can specify the meta element
Content-type, and set it to "text/xml; charset=utf-8 or whatever I
like. I recall that by default HTML Forms encode using the platform
default charset and content-type application/x-www-form-urlencoded

Also tried to simulate the client with a JAVA application that makes
use of the Here I have set the
requestProperty "Content-type" to "text/xml; charset=utf-8".

Now I'm not sure whether in either one or both cases the stream is mime

Someone in the previous thread suggested me to use HttpURLConnection
also on the serverside, but since I'm expecting also non-HTTP requests,
I'm not sure if I can. Most likely I cannot use a BufferedReader,
because it is based on a character stream, so I lose the original byte


Reply With Quote
Posts: n/a
Actually, rereading my post, I would like to add: I want to write to a
file the original byte representation of characters after having
processed them. The bean object(s) I'm using have their own write()
method, which writes to an outputstream.

I guess the solution here is remind what was the original encoding, use
the bean's write method to write to a ByteArrayOutputStream, and then
parse that to a String using platform default encoding, and then
rewrite that using the original encoding to the file output stream...

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Reading Text File Encoding and converting to Perls internal UTF-8 encoding Perl Misc 2 04-17-2009 11:22 PM
character encoding +missing character sequence raavi Java 2 03-02-2006 05:01 AM
Character encoding H van de Ven Firefox 4 12-30-2004 10:32 PM
changing JVM encoding; setting -Dfile.encoding doesn't work Java 1 10-08-2004 09:50 PM
Encoding.Default and Encoding.UTF8 Hardy Wang ASP .Net 5 06-09-2004 04:04 PM