Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   problem with regex (http://www.velocityreviews.com/forums/t885039-problem-with-regex.html)

Paul Johnston 02-06-2004 11:38 AM

problem with regex
 
Hi
I have a file encoded using unicode (utf-8) on a Redhat 9 system and
using Perl 5.8.0
It contains mixed estonian and English like below:

<ee> Kaks vana sõpra </ee>
<en> Two old friends </en>
<ee> Tere Piret ! </ee>
<en> Hello Piret ! </en>
<ee> Tere Tõnu ! </ee>
<en> Hello Tõnu ! </en>

I need to do some processing but the expression
(/õ/) will not match with the õ in any line
The perl script and the file I wish to process were both created using
the same editor (kedit) so I assume they are encoding using the same
scheme.
Any ideas why I cannot for example extract all lines which contain
this symbol "õ"
TIA
Paul



Paul Lall 02-06-2004 02:56 PM

Re: problem with regex
 
On Fri, 6 Feb 2004, Paul Johnston wrote:

> Hi
> I have a file encoded using unicode (utf-8) on a Redhat 9 system and
> using Perl 5.8.0
> It contains mixed estonian and English like below:
>
> <ee> Kaks vana sõpra </ee>
> <en> Two old friends </en>
> <ee> Tere Piret ! </ee>
> <en> Hello Piret ! </en>
> <ee> Tere Tõnu ! </ee>
> <en> Hello Tõnu ! </en>
>
> I need to do some processing but the expression
> (/õ/) will not match with the õ in any line
> The perl script and the file I wish to process were both created using
> the same editor (kedit) so I assume they are encoding using the same
> scheme.
> Any ideas why I cannot for example extract all lines which contain
> this symbol "õ"
> TIA
> Paul



Without having seen your code, my guess would be that your locale is not
correctly set up. See perldoc perllocale and perldoc locale

Paul Lalli

Ben Morrow 02-06-2004 03:45 PM

Re: problem with regex
 

Paul Lall <ittyspam@yahoo.com> wrote:
> On Fri, 6 Feb 2004, Paul Johnston wrote:
>
> > I have a file encoded using unicode (utf-8) on a Redhat 9 system and
> > using Perl 5.8.0
> > It contains mixed estonian and English like below:
> >
> > <ee> Kaks vana sõpra </ee>
> > <en> Two old friends </en>
> > <ee> Tere Piret ! </ee>
> > <en> Hello Piret ! </en>
> > <ee> Tere Tõnu ! </ee>
> > <en> Hello Tõnu ! </en>
> >
> > I need to do some processing but the expression
> > (/õ/) will not match with the õ in any line
> > The perl script and the file I wish to process were both created using
> > the same editor (kedit) so I assume they are encoding using the same
> > scheme.
> > Any ideas why I cannot for example extract all lines which contain
> > this symbol "õ"

>
>
> Without having seen your code, my guess would be that your locale is not
> correctly set up. See perldoc perllocale and perldoc locale


NO! Don't mix locales and unicode with 5.8. It doesn't work.

If you wish to use utf8 literals in your source, you have to 'use
utf8;' at the top.

Ben

--
Joy and Woe are woven fine,
A Clothing for the Soul divine William Blake
Under every grief and pine 'Auguries of Innocence'
Runs a joy with silken twine. ben@morrow.me.uk

Paul Johnston 02-09-2004 11:06 AM

Re: problem with regex
 
On Fri, 6 Feb 2004 15:45:44 +0000 (UTC), Ben Morrow
<usenet@morrow.me.uk> wrote:

>
>Paul Lall <ittyspam@yahoo.com> wrote:
>> On Fri, 6 Feb 2004, Paul Johnston wrote:
>>
>> > I have a file encoded using unicode (utf-8) on a Redhat 9 system and
>> > using Perl 5.8.0
>> > It contains mixed estonian and English like below:
>> >
>> > <ee> Kaks vana sõpra </ee>
>> > <en> Two old friends </en>
>> > <ee> Tere Piret ! </ee>
>> > <en> Hello Piret ! </en>
>> > <ee> Tere Tõnu ! </ee>
>> > <en> Hello Tõnu ! </en>
>> >
>> > I need to do some processing but the expression
>> > (/õ/) will not match with the õ in any line
>> > The perl script and the file I wish to process were both created using
>> > the same editor (kedit) so I assume they are encoding using the same
>> > scheme.
>> > Any ideas why I cannot for example extract all lines which contain
>> > this symbol "õ"

>>
>>
>> Without having seen your code, my guess would be that your locale is not
>> correctly set up. See perldoc perllocale and perldoc locale

>
>NO! Don't mix locales and unicode with 5.8. It doesn't work.
>
>If you wish to use utf8 literals in your source, you have to 'use
>utf8;' at the top.
>
>Ben


Just as a follow up I have discover the script works i.e matches õ on
Solaris 5.8 Perl version 5.005
However adding
use utf8; to the script on the Redhat machine also works so my
problems have been solved (for now at least :-) )
Many thanks
Paul


All times are GMT. The time now is 07:44 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.