Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to determine if a word has an extended character?

Reply
Thread Tools

How to determine if a word has an extended character?

 
 
ambarish.mitra@gmail.com
Guest
Posts: n/a
 
      05-20-2008
I have a file which contains just one word. My task is just to find
out if the word has any extended character. Thats all.

I can use regex, but am not able to find out a regex pattern for
extended character. Any hints?


For example, if the file content is: sample, then the Perl code prints
false; and if the file content is samplÚ, then the Perl code prints
true.

Thanks.
 
Reply With Quote
 
 
 
 
JŘrgen Exner
Guest
Posts: n/a
 
      05-20-2008
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>I have a file which contains just one word. My task is just to find
>out if the word has any extended character. Thats all.
>
>I can use regex, but am not able to find out a regex pattern for
>extended character. Any hints?


[Interpreting 'extended' as non-ASCII]

You could simply use the POSIX character class [:ASCII:]

Another way would be to check for each character, if its ord() is less
than 128. That should work at least for the most common encodings like
ISO-Latin-1, Windows-1252, ...

Or: [untested]
if (/^[A-Za-z]*$/) {
print 'false';
} else {
print 'true';
}

You could probably also set your locale to EN-US and use
if (/\W/) {
print 'true';
} else {
print 'false';
}

All of these do somewhat different things, so you have some options to
choose the one that most closely matches your needs.

jue
 
Reply With Quote
 
 
 
 
Hartmut Camphausen
Guest
Posts: n/a
 
      05-20-2008
In <<(E-Mail Removed)>>
schrieb ...
> I have a file which contains just one word. My task is just to find
> out if the word has any extended character. Thats all.
>
> I can use regex, but am not able to find out a regex pattern for
> extended character. Any hints?
>
>
> For example, if the file content is: sample, then the Perl code prints
> false; and if the file content is samplÚ, then the Perl code prints
> true.



$string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";

should do the trick.

This prints "has extended" if $string contains any characters other
([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
character class).

If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]
If you want to include more "valid" characters, expand the [^...]
accordingly (note: if you want to inlcude '-' as valid character, put it
at the very end of the characters list).

See
perldoc perlre
perldoc perlrequick
perldoc perlreref
perldoc perlretut



hth, Hartmut

--
------------------------------------------------
Hartmut Camphausen h.camp[bei]textix[punkt]de
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      05-20-2008
Hartmut Camphausen wrote:
> In <<(E-Mail Removed)>>
> schrieb ...
>> I have a file which contains just one word. My task is just to find
>> out if the word has any extended character. Thats all.
>>
>> I can use regex, but am not able to find out a regex pattern for
>> extended character. Any hints?
>>
>>
>> For example, if the file content is: sample, then the Perl code prints
>> false; and if the file content is samplÚ, then the Perl code prints
>> true.

>
>
> $string =~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";


[^\w] is usually written as \W.


> should do the trick.
>
> This prints "has extended" if $string contains any characters other
> ([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w
> character class).


From perlre.pod:

<QUOTE>
If "use locale" is in effect, the list of alphabetic characters
generated by "\w" is taken from the current locale. See perllocale.
</QUOTE>

In other words, if your locale supports it then 'Ú' will be included in\w.


> If you want to exclude the '_' (contained in \w), use [^a-zA-Z0-9]


[^a-zA-Z0-9] means any character that is *not* alphanumeric. You
probably meant [a-zA-Z0-9].



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Downloaded document has disappeared by the time Word has opened Rob Nicholson ASP .Net 12 12-06-2005 04:59 PM
837 and "ip accesslist extended WORD" Tom Pouce Cisco 1 02-23-2004 01:50 PM
nuby: determine method passed and determine the receiver that received the method Pe˝a, Botp Ruby 1 01-24-2004 07:51 PM
REQ: Extended printf() with word-wrapping column format flag nimdez C Programming 5 08-10-2003 10:53 PM



Advertisments