Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > HTML::Entities::encode() returning wrong(?) entities

Reply
Thread Tools

HTML::Entities::encode() returning wrong(?) entities

 
 
Jim Higson
Guest
Posts: n/a
 
      07-23-2004
I'm calling encode_entities on some text I have read from a file, to turn it
into a webpage. According to file:

$ file text/text.en
$ text/text.en: UTF-8 Unicode English text, with very long lines

(although this might not matter)
Anyway, the letter ä appears in the text, and should be changed to ä

However, instead it is changed to:
ä

I can't see anything unusual about my code. Any ideas why I'm having this
problem?



 
Reply With Quote
 
 
 
 
Jim Higson
Guest
Posts: n/a
 
      07-23-2004
Jim Higson wrote:

> I'm calling encode_entities on some text I have read from a file, to turn
> it into a webpage. According to file:
>
> $ file text/text.en
> $ text/text.en: UTF-8 Unicode English text, with very long lines
>
> (although this might not matter)
> Anyway, the letter ä appears in the text, and should be changed to ä
>
> However, instead it is changed to:
> ä
>
> I can't see anything unusual about my code. Any ideas why I'm having this
> problem?



I just found the answer myself - as I suspected it was to do with reading
the unicode in perl. Adding use open ':utf8'; to the top of the source
fixed this (although I'm not quite certain exactly what this means)
 
Reply With Quote
 
 
 
 
Joe Smith
Guest
Posts: n/a
 
      07-25-2004
Jim Higson wrote:

> $ text/text.en: UTF-8 Unicode English text, with very long lines
> Anyway, the letter ä appears in the text, and should be changed to ä


In UTF-8 encoding, the single character "ä" is stored as two bytes:
"\xC3" and "\xA9". If you allow perl to think that the file is ISO-8859-1,
it will interpret those two bytes as "Ã" and "©". You need to tell perl
that the file is :utf8 in order for it to recognize those two bytes as
being a single Unicode character.

-Joe
 
Reply With Quote
 
Eric Amick
Guest
Posts: n/a
 
      07-25-2004
On Fri, 23 Jul 2004 20:43:44 +0100, Jim Higson <(E-Mail Removed)> wrote:

>Jim Higson wrote:
>
>> I'm calling encode_entities on some text I have read from a file, to turn
>> it into a webpage. According to file:
>>
>> $ file text/text.en
>> $ text/text.en: UTF-8 Unicode English text, with very long lines
>>
>> (although this might not matter)
>> Anyway, the letter appears in the text, and should be changed to &auml;
>>
>> However, instead it is changed to:
>> &Atilde;&curren;
>>
>> I can't see anything unusual about my code. Any ideas why I'm having this
>> problem?

>
>
>I just found the answer myself - as I suspected it was to do with reading
>the unicode in perl. Adding use open ':utf8'; to the top of the source
>fixed this (although I'm not quite certain exactly what this means)


It tells Perl to open all files with UTF-8 encoding set by default. Only
you can say whether that is the right thing. If it isn't, you can
specify it for specific files by using ':utf8' as the second argument of
a three-argument open or with a binmode call on the appropriate
filehandle.

--
Eric Amick
Columbia, MD
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
returning none when it should be returning a list? randomtalk@gmail.com Python 11 05-02-2006 10:26 AM
returning multiple entities from a method BemusedByQM Java 38 08-18-2005 04:35 PM
instances of entities vs components Andy Peters VHDL 1 07-13-2005 01:47 AM
Blocks vs. Entities? gth_n0_spam_orpe@ee.ryerson.ca VHDL 1 01-14-2005 07:17 PM
Questions about sending 'transaction attribute behavior across entities. R Paley VHDL 2 11-20-2004 02:37 PM



Advertisments