On 6 May 2005 08:33:26 -0700,
wrote:
>Hi folks,
>
>Do you know if there is a way to automaticly detect the charset from a
>bytes array ? In fact, I would like to decode a byte array, with the
>good charset interpretor, given that I do not know which charset was
>used to encode it.
>
>The CharsetDecoder class seems to have a "isAutoDetecting" boolean
>method : this means that there should exists a 'generic' charset
>decoder implementation which could auto detect the charset. Am I right
>?
Unfortunately, that auto-detect feature is very limited. If you know
you're reading Chinese text, but don't know which of the several
Chinese encodings it was written in, you can use an auto-detecting
"wrapper" Charset that figures it out for you. I think there's one
for Japanese text as well, but there's no built-in universal
auto-detecting Charset.
I use this tool:
http://glaforge.free.fr/wiki/index.p...=GuessEncoding
It only works with a limited set of Unicode and Western encodings, but
it's perfect for my needs. If you need something with broader
applicability, look for the CharDet package from Mozilla.