Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > problems with charsets

Reply
Thread Tools

problems with charsets

 
 
peter pilsl
Guest
Posts: n/a
 
      09-05-2003

I've a long csv-file that needs to be imported into a sql-database. My
problem now is, that I dont know the charset this file is encoded in and
afterwards I would not know how to convert it to what I need (latin1 for
output and utf8 for storage).
For current transformation unicode<->latin1 the Unicode::String-module is
what I use but the file seems not to be latin1 after all. (It comes from a
mac and I'm working on a linux-machine)

I'm aware of the fact that my problem is not really a perl-problem, but I
use perl to detect and convert the charset, so I hope its ok here.

An example for the text I need to process is available at:
http://www.goldfisch.at/temporary/text.cvs for download (its only one line
with 276 bytes).


thnx a lot for your help,
peter





--
peter pilsl
http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.goldfisch.at

 
Reply With Quote
 
 
 
 
Alan J. Flavell
Guest
Posts: n/a
 
      09-05-2003
On Fri, Sep 5, peter pilsl inscribed on the eternal scroll:

> I've a long csv-file that needs to be imported into a sql-database. My
> problem now is, that I dont know the charset this file is encoded in


Text files are meaningless without the accompanying character coding
(MIME terminology: "charset") meta-information, really. That isn't a
Perl problem, no matter that you could use Perl as part of the
solution.

> afterwards I would not know how to convert it to what I need (latin1 for
> output and utf8 for storage).


Easy, once you identify the source coding.

> For current transformation unicode<->latin1 the Unicode::String-module is
> what I use but the file seems not to be latin1 after all. (It comes from a
> mac and I'm working on a linux-machine)


Sounds as if the coding is likely to be macRoman. Verdammt nochmal,
das isses auch.

> I'm aware of the fact that my problem is not really a perl-problem, but I
> use perl to detect and convert the charset, so I hope its ok here.


Not really, but by chance it happens to be one of my specialist
subjects...

> An example for the text I need to process is available at:
> http://www.goldfisch.at/temporary/text.cvs for download (its only one line
> with 276 bytes).


What I did was simply to view it in Mozilla and play with the
view->coding settings until it started to make sense.

Now go to the Perl encoding pages to find out how to define the
encoding layer (5.8.0+) or the explicit en/de/coding calls to handle
it. After that it's a doddle (=Spaziergang, Kleinigkeit, or
whatever).

see also: http://www.perldoc.com/perl5.8.0/lib/Encode.html
http://www.perldoc.com/perl5.8.0/lib...Supported.html

good luck
 
Reply With Quote
 
 
 
 
peter pilsl
Guest
Posts: n/a
 
      09-07-2003
Alan J. Flavell wrote:

>
> Now go to the Perl encoding pages to find out how to define the
> encoding layer (5.8.0+) or the explicit en/de/coding calls to handle
> it. After that it's a doddle (=Spaziergang, Kleinigkeit, or
> whatever).
>
> see also: http://www.perldoc.com/perl5.8.0/lib/Encode.html
> http://www.perldoc.com/perl5.8.0/lib...Supported.html
>


You are one of the good ghosts in this group. Like many times before you
helped me a lot with accurate information.
Thnx a lot, (tausend dank und so)

peter

--
peter pilsl
(E-Mail Removed)
http://www.goldfisch.at

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Asp .Net page scraping Charsets jose ASP .Net 0 07-18-2006 12:10 PM
Asp .Net and Charsets jose ASP .Net 0 07-18-2006 12:08 PM
Servlets and charsets Marius Waldal Java 0 02-15-2005 09:58 AM
InputStreamReader and charsets Mike Lischke Java 5 07-12-2004 11:08 PM
Converting XML between charsets vKp XML 1 01-31-2004 09:16 PM



Advertisments