Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How get all digits, letters and punctuation characters in perl?

Reply
Thread Tools

How get all digits, letters and punctuation characters in perl?

 
 
Peng Yu
Guest
Posts: n/a
 
      12-04-2012
Hi,

In python, I can do the following to get a category of characters. But
I don't find a corresponding thing in perl. Could anybody let me know
if there is one? Thanks!

~/linux/test/python/man/library/string/printable$ cat main.py
#!/usr/bin/env python

import string
print string.digits + string.letters + string.punctuation
print string.printable



Regards,
Peng
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      12-04-2012
Peng Yu <(E-Mail Removed)> wrote:
>Hi,
>
>In python, I can do the following to get a category of characters. But
>I don't find a corresponding thing in perl. Could anybody let me know
>if there is one? Thanks!
>
>~/linux/test/python/man/library/string/printable$ cat main.py
>#!/usr/bin/env python
>
>import string
>print string.digits + string.letters + string.punctuation
>print string.printable


I am smelling an x-y problem here. Why do think you need this set of
characters?

If you just want to test if a certain character belongs to a specific
class then you can use POSIX character classes in an RE, e.g.
m/[[:alpha:]]/;
I suppose you could also use this test to enumerate all characters of
this class although this does seem to be somewhat backwards indeed.

jue
 
Reply With Quote
 
 
 
 
Peter Makholm
Guest
Posts: n/a
 
      12-04-2012
Peng Yu <(E-Mail Removed)> writes:

> In python, I can do the following to get a category of characters. But
> I don't find a corresponding thing in perl. Could anybody let me know
> if there is one? Thanks!


It depends on your definitions.

The Unicode standard defines 9293 letters, 350 digits, and 582
punctuation characters - and this is just the Basic Multilingual Plane.

Tom Christiansen has made a tool called `unichars` to list characters
matching a number of conditions (availaable in the Unicode::Tussle
distribution). His code basically just iterates over all relevant
codepoints excluding a number of special cases:

for my $codepoint ( $first_codepoint .. $last_codepoint ) {

# gaggy UTF-16 surrogates are invalid UTF-8 code points
next if $codepoint >= 0xD800 && $codepoint <= 0xDFFF;

# from utf8.c in perl src; must avoid fatals in 5.10
next if $codepoint >= 0xFDD0 && $codepoint <= 0xFDEF;

next if 0xFFFE == ($codepoint & 0xFFFE); # both FFFE and FFFF

# debug("testing codepoint $codepoint");

# see "Unicode non-character %s is illegal for interchange" in
perldiag(1)
$_ = do { no warnings "utf8"; chr($codepoint) };

# fixes "the Unicode bug"
unless (utf8::is_utf8($_)) {
$_ = decode("iso-8859-1", $_);
}

# Test the given conditions, e.g. /\p{Digit}/
}

But given your python example, this is probably way overkill for what
you are trying.

//Makholm

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Identifying unicode punctuation characters with Python regex Shiao Python 4 11-19-2008 11:36 AM
'' is not a valid name. Make sure that it does not include invalid characters or punctuation and that it is not too long. rote ASP .Net 2 01-23-2008 03:07 PM
making all letters Caps/Small Letters Merrigan Python 4 12-14-2007 10:10 AM
function that removes the punctuation and some characters like (*&^%$#@!<>?"} from a text string Beznas ASP General 8 09-10-2003 05:34 PM



Advertisments