Spamtrap wrote:
> Ok let me try to redefine the problem.
>
> I have a text file, [ in Windows 98], which by definition is in plain
> 256 character ASCII. When I view it I see Español - which I assumed
> was originally UTF8 - but I want to see Español [which of course
> could exist in ASCII, without even having to go to Unicode or anything
> fancy] so the encoding is using the two characters ñ for the single
> character ñ
ASCII is only 128 characters. Character codes 128 to 255 can be
1) ISO-8859-1 (the Latin-1 alphabet), for western European languages.
2) Some Microsoft CP (code page). There are many.
3) Special bit patterns used in the UTF8 encoding scheme.
For Español, all you need is a UTF8-to-ISO8859 conversion utility.
> The data from that text file is being imported into a database [this
> part is not Perl programming]. When I display the data, it displays
> Español not Español
That means that whatever program you are using to display the data
does not understand UTF8. There are terminal emulators and command
consoles that do understand UTF8.
> Then a program will manipulate that database and create a Microsoft
> Word document [or possibly an Adobe PDF document] and I assume the
> text will continue to be incorrect. Therefore I want to use Perl to
> fix that text data before I do the other processing.
You could try playing around with
open IN,':utf8',$input_file or die;
open OUT,':crlf',$output_file or die;
print OUT <IN>;
> I also have things like СубъеР- which is supposed to be Russian
> and judeţul which is Romanian.
Russian characters simply cannot be displayed in ASCII or ISO-8859-1.
ISO-8859-9 has Cyrillic, but not western european accented characters.
Read
http://czyborra.com/charsets/iso8859.html (or Google's cache).
> It is possible I might have to maitain 2 copies of the strings in the
> database tables, one as an ASCII close match for display purposes,
> [since the database will not support UNICODE directly] and one as
> actual UNICODE for passing into Word.
The major databases do support Unicode directly. Often it is as simple
as exporting the database to a flat file, defining a new database
with UTF8 enabled, and importing the data. You will have to ask the
DBA to perform this operation.
-Joe