Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Is there any way to discover what charset encoding a file is using?

Reply
Thread Tools

Is there any way to discover what charset encoding a file is using?

 
 
James
Guest
Posts: n/a
 
      06-29-2004
Hi all,

Is there any way, to discover what charset eoncoding a file is
actually by reading the content of it.

For example, I may have a file which contains some Japanese Character,
how could I determine if those character are actually Japanese ones.

Thank You.

James
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      06-29-2004
On 29 Jun 2004 00:30:56 -0700, http://www.velocityreviews.com/forums/(E-Mail Removed) (James) wrote or
quoted :

>Is there any way, to discover what charset eoncoding a file is
>actually by reading the content of it.
>
>For example, I may have a file which contains some Japanese Character,
>how could I determine if those character are actually Japanese ones.


see http://mindprod.com/projects/encodin...ification.html
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
 
 
 
Michael Borgwardt
Guest
Posts: n/a
 
      07-01-2004
James wrote:
> Is there any way, to discover what charset eoncoding a file is
> actually by reading the content of it.


Not with anything remotely approaching certainty.

> For example, I may have a file which contains some Japanese Character,
> how could I determine if those character are actually Japanese ones.


Your best bet would be to take some common japanese words, encode them
in each of the three(!) charsets commonly used in Japan plus UTF-8
and UTF-16 and look for matches.

If you just have a file that might be any language in any encoding,
you're pretty much f*cked. In the worst case, it might be a *mix*
of languages encoded in ISO-2022 (which, if I understood it correctly,
is stateful and uses special command sequences to switch between modes
in which different languages can be encoded).
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
javascript charset <> page charset optimistx Javascript 2 08-15-2008 12:42 PM
501 PIX "deny any any" "allow any any" Any Anybody? Networking Student Cisco 4 11-16-2006 10:40 PM
Default Charset Encoding in JSP page is ASCII Fritz Bayer Java 1 05-30-2005 06:01 PM
mail headers to automatically detect the encoding/charset for mail clients sunil Java 0 07-28-2004 08:43 PM
Problem with default Charset Encoding Servlet (Windows vs. RedHat) J.P.Jarolim Java 0 02-27-2004 04:11 PM



Advertisments