Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Regular expression for matching IPA characters in Unicode?

Reply
Thread Tools

Regular expression for matching IPA characters in Unicode?

 
 
=?ISO-8859-1?Q?Mickel_Gr=F6nroos?=
Guest
Posts: n/a
 
      10-11-2004
Hi Pythoneers,

Which is the best way of checking that a given unicode string only
contains IPA characters, e.g. characters in the range \u0250-\u02AF?
I guess a regular expression would do it, just can't figure out how to
implement that expression.

Code snippets are most welcome.

Best regards,

Mickel Grönroos

--
Mickel Grönroos, application specialist, linguistics, CSC
PL 405 (Tekniikantie 15 a D), 02101 Espoo, Finland,
CSC is the Finnish IT center for science, www.csc.fi
 
Reply With Quote
 
 
 
 
Michael Hoffman
Guest
Posts: n/a
 
      10-11-2004
Mickel Grönroos wrote:

> Which is the best way of checking that a given unicode string only
> contains IPA characters, e.g. characters in the range \u0250-\u02AF?


Well, I'll give you an example that only includes characters in the
range [\u0250, \u02AF] but those are just the IPA *extensions.* You also
need to include basic latin and greek characters from other blocks.

See: http://www.unicode.org/charts/PDF/U0250.pdf

And why do you want to do this anyway?

This example uses the itertools example all() which tells you whether a
predicate is true for every item in an iterable. The predicate here is
whether the item is contained in IPA_CHARS, which you can expand...

=====

import itertools
from sets import Set # set() is a built-in in 2.4

IPA_CHARS = Set(map(unichr, xrange(0x250, 0x2b0)))

def all(seq, pred=bool):
# http://www.python.org/doc/current/li...s-example.html
"Returns True if pred(x) is True for every element in the iterable"
return False not in itertools.imap(pred, seq)

def is_ipa(iterable):
return all(iterable, IPA_CHARS.__contains__)

print is_ipa(u"aeiou") # this is valid IPA, but not in the extensions block
print is_ipa(u"\u0260\u02af") # valid IPA in the extensions block

====output===

False
True
--
Michael Hoffman
 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      10-12-2004
Mickel Grönroos wrote:
> Which is the best way of checking that a given unicode string only
> contains IPA characters, e.g. characters in the range \u0250-\u02AF?


The regular expression for that is [\u0250-\u02AF]. You can either make
the regular expression a Unicode string itself, or you can make it a
normal (byte) string, and put the backslash-u-number sequence into it
(e.g. with double-backslash quotation).

Regards,
Martin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression - Matching Multiples of 3 Characters exactly. blaine Python 6 04-28-2008 05:23 PM
Displaying an IPA symbol Dung Ping HTML 2 02-22-2006 03:21 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
RE: Displaying ipa in python.exe Ben Last Python 0 08-19-2004 07:00 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments