Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > URI encoding ASCII, LATIN1 or UNICODE?

Reply
Thread Tools

URI encoding ASCII, LATIN1 or UNICODE?

 
 
Fritz Bayer
Guest
Posts: n/a
 
      04-08-2005
Hello,

I have stumbled across something, which seems to be of ambuiguity.
Recently I decoded the URI of a servlet request.

At first I could not get the expected result. The umlauts would
not show up correctly, which made me wonder why.

I then tried URLDecoder.decode(uri, "UTF-8"), which also did not work.

After googling a bit I found out that tomcat 5.0 (which I use) used to
send the URI's in the encoding of the document transferred but now
always sends the URI in ISO-8859-1 but that another encoding can be
specified in the connector by setting the attribute URIEncoding="..".

So I set it to "utf8" and now I can decode the URI's correctly.
However, I was wondering how it can be that this does not seem to be
specified.

I though that the HTTP 1.1 protocoll encoding is ASCII only. Of course
the documents transfered can have a different encoding. But the URI
part belongs to the startline of the message and therefore to the
protocoll.

Anyway if somebody wants to elaborate a bit on this uri issue I would
be interested in having a little conversation about the subject.

Fritz

BTW: So it seems that how uri's get treated depend on the
implementation of each servlet engine?!
 
Reply With Quote
 
 
 
 
Arjunan Venkatesh
Guest
Posts: n/a
 
      04-08-2005
Hi,
yes, you are right , i was having the same problem 2 weeks back as we
want to use UTF-8 characters for Japanese etc in the URI.
RFC for URI initally suggested ASCII only and leaves the support for
UTF-8 to the implementation details to the servers.

Looking at the tomcat source for 5.5 , I came to realize Tomcat does
the %uu escaping first , followed by the decoding for the charset
(defaults to ISO-xxxx stuff ) but we can specify UTF-8 as u said . Also
there is another flag 'useBodyEncodingForURI' which it says there for
compatibility for Tomcat 4.x.

yes, if we want compatibility across servers we have to stick with only
ASCII in URI. One of the RFC suggested jokingly we recommend
supporting UTF-8 for URI but that transition may take 50 years ... (
that was written in 1999 ) sorry forgot the RFC #'s i looked up

hope this helps

-v

Fritz Bayer wrote:
> Hello,
>
> I have stumbled across something, which seems to be of ambuiguity.
> Recently I decoded the URI of a servlet request.
>
> At first I could not get the expected result. The umlauts

would
> not show up correctly, which made me wonder why.
>
> I then tried URLDecoder.decode(uri, "UTF-8"), which also did not

work.
>
> After googling a bit I found out that tomcat 5.0 (which I use) used

to
> send the URI's in the encoding of the document transferred but now
> always sends the URI in ISO-8859-1 but that another encoding can be
> specified in the connector by setting the attribute URIEncoding="..".
>
> So I set it to "utf8" and now I can decode the URI's correctly.
> However, I was wondering how it can be that this does not seem to be
> specified.
>
> I though that the HTTP 1.1 protocoll encoding is ASCII only. Of

course
> the documents transfered can have a different encoding. But the URI
> part belongs to the startline of the message and therefore to the
> protocoll.
>
> Anyway if somebody wants to elaborate a bit on this uri issue I would
> be interested in having a little conversation about the subject.
>
> Fritz
>
> BTW: So it seems that how uri's get treated depend on the
> implementation of each servlet engine?!


 
Reply With Quote
 
 
 
 
Fritz Bayer
Guest
Posts: n/a
 
      04-20-2005
"Arjunan Venkatesh" <(E-Mail Removed)> wrote in message news:<(E-Mail Removed) oups.com>...
> Hi,
> yes, you are right , i was having the same problem 2 weeks back as we
> want to use UTF-8 characters for Japanese etc in the URI.
> RFC for URI initally suggested ASCII only and leaves the support for
> UTF-8 to the implementation details to the servers.
>


So how is a uri encoded, which contains non ASCII characters like
and for example greek characters?

> Looking at the tomcat source for 5.5 , I came to realize Tomcat does
> the %uu escaping first , followed by the decoding for the charset
> (defaults to ISO-xxxx stuff ) but we can specify UTF-8 as u said . Also
> there is another flag 'useBodyEncodingForURI' which it says there for
> compatibility for Tomcat 4.x.
>
> yes, if we want compatibility across servers we have to stick with only
> ASCII in URI. One of the RFC suggested jokingly we recommend
> supporting UTF-8 for URI but that transition may take 50 years ... (
> that was written in 1999 ) sorry forgot the RFC #'s i looked up
>
> hope this helps
>
> -v
>
> Fritz Bayer wrote:
> > Hello,
> >
> > I have stumbled across something, which seems to be of ambuiguity.
> > Recently I decoded the URI of a servlet request.
> >
> > At first I could not get the expected result. The umlauts =E4=FC=F6

> would
> > not show up correctly, which made me wonder why.
> >
> > I then tried URLDecoder.decode(uri, "UTF-8"), which also did not

> work.
> >
> > After googling a bit I found out that tomcat 5.0 (which I use) used

> to
> > send the URI's in the encoding of the document transferred but now
> > always sends the URI in ISO-8859-1 but that another encoding can be
> > specified in the connector by setting the attribute URIEncoding=3D"..".
> >
> > So I set it to "utf8" and now I can decode the URI's correctly.
> > However, I was wondering how it can be that this does not seem to be
> > specified.
> >
> > I though that the HTTP 1.1 protocoll encoding is ASCII only. Of

> course
> > the documents transfered can have a different encoding. But the URI
> > part belongs to the startline of the message and therefore to the
> > protocoll.
> >
> > Anyway if somebody wants to elaborate a bit on this uri issue I would
> > be interested in having a little conversation about the subject.
> >
> > Fritz
> >
> > BTW: So it seems that how uri's get treated depend on the
> > implementation of each servlet engine?!

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with encoding latin1/UTF8 Mark Toth Ruby 1 01-07-2008 08:39 AM
encoding latin1 to utf-8 Harshad Modi Python 6 09-12-2007 01:24 PM
restructuredtext latin1 encoding (FAQ?) Helmut Jarausch Python 2 07-03-2007 10:31 AM
java.net.URI.relativize(java.net.URI) not really working Stanimir Stamenkov Java 1 08-17-2005 06:24 PM
Help with error: "Invalid URI: The format of the URI could not be determined." Simon Harris ASP .Net 0 05-10-2005 04:33 PM



Advertisments