Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Reading Windows CSV file with LCID entries under Linux.

Reply
Thread Tools

Reading Windows CSV file with LCID entries under Linux.

 
 
Thomas Troeger
Guest
Posts: n/a
 
      09-22-2008
Dear all,

I've stumbled over a problem with Windows Locale ID information and
codepages. I'm writing a Python application that parses a CSV file,
the format of a line in this file is "LCID;Text1;Text2". Each line can
contain a different locale id (LCID) and the text fields contain data
that is encoded in some codepage which is associated with this LCID. My
current data file contains the codes 1033 for German and 1031 for
English US (as listed in
http://www.microsoft.com/globaldev/r...lcid-all.mspx).
Unfortunately, I cannot find out which Codepage (like cp-1252 or
whatever) belongs to which LCID.

My question is: How can I convert this data into something more
reasonable like unicode? Basically, what I want is something like
"Text1;Text2", both fields encoded as UTF-8. Can this be done with
Python? How can I find out which codepage I have to use for 1033 and 1031?

Any help appreciated,
Thomas.
 
Reply With Quote
 
 
 
 
skip@pobox.com
Guest
Posts: n/a
 
      09-22-2008

Thomas> My question is: How can I convert this data into something more
Thomas> reasonable like unicode? Basically, what I want is something
Thomas> like "Text1;Text2", both fields encoded as UTF-8. Can this be
Thomas> done with Python? How can I find out which codepage I have to
Thomas> use for 1033 and 1031?

There are examples at end of the CSV module documentation which show how to
create Unicode readers and writers. You can extend the UnicodeReader class
to peek at the LCID field and save the corresponding codepage for the
remainder of the line. (This would assume you're not creating CSV files
which contain newlines. Each line read would be assumed to be a new record
in the file.)

Skip
 
Reply With Quote
 
 
 
 
Tim Golden
Guest
Posts: n/a
 
      09-22-2008
Thomas Troeger wrote:
> I've stumbled over a problem with Windows Locale ID information and
> codepages. I'm writing a Python application that parses a CSV file,
> the format of a line in this file is "LCID;Text1;Text2". Each line can
> contain a different locale id (LCID) and the text fields contain data
> that is encoded in some codepage which is associated with this LCID. My
> current data file contains the codes 1033 for German and 1031 for
> English US (as listed in
> http://www.microsoft.com/globaldev/r...lcid-all.mspx).
> Unfortunately, I cannot find out which Codepage (like cp-1252 or
> whatever) belongs to which LCID.
>
> My question is: How can I convert this data into something more
> reasonable like unicode? Basically, what I want is something like
> "Text1;Text2", both fields encoded as UTF-8. Can this be done with
> Python? How can I find out which codepage I have to use for 1033 and 1031?



The GetLocaleInfo API call can do that conversion:

http://msdn.microsoft.com/en-us/libr...70(VS.85).aspx

You'll need to use ctypes (or write a c extension) to
use it. Be aware that if it doesn't succeed you may need
to fall back on cp 65001 -- utf8.

TJG
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[JDK7]Java's many registry entries under Windows Stefan Ram Java 1 09-13-2010 05:33 AM
Picking X random entries from linked list of Y entries Don Bruder C Programming 3 08-03-2010 09:10 AM
read and write csv file using csv module jliu66 Python 0 10-19-2007 03:12 PM
How to move data from a CSV file to a JTable, and from a JTable to a CSV file ? Tintin92 Java 1 02-14-2007 06:51 PM
Tying up Port Login table entries with Port Table Entries in CISCO SNMP John Ramsden Cisco 0 07-24-2004 04:03 PM



Advertisments