Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > utf8 pragma - strange behavior

Reply
Thread Tools

utf8 pragma - strange behavior

 
 
ryang
Guest
Posts: n/a
 
      03-17-2005
I am trying to understand how to work with Unicode in Perl. I have
read the relevant man pages (perluniintro, perlunicode, etc.) and have
written severl scripts to test/verifiy my understanding. However, I
created a script that has unexpected output. The script is below and
it contains some UTF-8 encoded characters which represent all five
Spanish accented vowels plus the enye (n with a tilde over it) in upper
and lower case. I hope that this post comes through as UTF-8 encoded
as the source code is. I am posting from Google groups which does use
UTF-8 encoding.

BEGIN CODE >>
#!/usr/bin/perl

use warnings;
use strict;
#use utf8;
use Encode;

# using utf8 causes the characters to be printed in latin-1 encoding

my %table = (
# spanish
# hexidecimal UTF-8 => actual UTF-8
'0xc381' => chr(hex('c3')) . chr(hex('81')), # '',
'0xc389' => encode("utf8", "\x{00c9}"), # '',
'0xc38d' => '',
'0xc393' => '',
'0xc391' => '',
'0xc39a' => '',
'0xc3a1' => '',
'0xc3a9' => '',
'0xc3ad' => '',
'0xc3b3' => '',
'0xc3b1' => '',
'0xc3ba' => '',
);

foreach (sort keys %table) {
print "$_ = $table{$_}\n";
}
<< END CODE

When the 'use utf8' line is commented out, the script outputs the UTF-8
characters correctly. However, when the utf8 pragma is used, the
characters that are actually hard coded into the hash as UTF-8 (not the
or ) are printed in Latin-1. To my understanding, in Perl 5.8.x,
the only effect of the utf8 pragma is to tell the parser that literals
and variables may contain UTF-8 encoded characters. However in
practice, the utf8 pragma is effecting the script's output.

I have tested the script on Mac OSX 10.3.8 with Perl 5.8.1 and on
Fedora Core (not sure which version) running perl 5.8.3.

Can anyone explain why the utf8 pragma effects the output of the script?

 
Reply With Quote
 
 
 
 
Wes Groleau
Guest
Posts: n/a
 
      04-11-2005
ryang wrote:
> I am trying to understand how to work with Unicode in Perl. I have
> read the relevant man pages (perluniintro, perlunicode, etc.) and have
> written severl scripts to test/verifiy my understanding. However, I
> created a script that has unexpected output. The script is below and


Welcome to the club.

> Can anyone explain why the utf8 pragma effects the output of the script?


My problem (different post) is slightly different, but
I'm going to try commenting out the pragma to see what happens.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
given char* utf8, how to read unicode line by line, and output utf8 gry C++ 2 03-13-2012 04:32 AM
character classes, locale and utf8 - strange behaviour Michal Jankowski Perl Misc 0 04-29-2011 12:56 PM
Question regarding pragma translate_off/on , synthesis_off/on suman.nandan@gmail.com VHDL 9 07-21-2005 04:51 AM
[Q] pragma no-cache : what about the images ? Paul J. Le Genial HTML 5 03-16-2005 02:58 AM
<meta http-equiv="pragma" content="no-cache"> - What does this mean??? Titus A Ducksass HTML 1 08-01-2003 07:44 PM



Advertisments