Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > regex help

Reply
Thread Tools

regex help

 
 
Gabriel Rossetti
Guest
Posts: n/a
 
      12-16-2009
Hello everyone,

I'm going nuts with some regex, could someone please show me what I'm
doing wrong?

I have an XMPP msg :

<message xmlns='jabber:client' to=''>
<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
<parameters>
<param1>123</param1>
<param2>456</param2>
</parameters>
<payload type='plain'>...</payload>
</mynode>
<x xmlns='jabber:expire' seconds='15'/>
</message>

the <parameter> node may be absent or empty (<parameter/>), the <x> node
may be absent. I'd like to grab everything exept the <payload> nod and
create something new using regex, with the XMPP message example above
I'd get this :

<message xmlns='jabber:client' to=''>
<mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
<parameters>
<param1>123</param1>
<param2>456</param2>
</parameters>
</mynode>
<x xmlns='jabber:expire' seconds='15'/>
</message>

for some reason my regex doesn't work correctly :

r"(<message .*?>).*?(<mynode
..*?>).*?(?<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"

I group the opening <message> node, the opening <mynode> node and if the
<parameters> node is present and not empty I group it and if the <x>
node is present I group it. For some reason this doesn't work correctly :

>>> import re
>>> s1 = "<message xmlns='jabber:client' to=''><mynode

xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload
type='plain'>...</payload></mynode><x xmlns='jabber:expire'
seconds='15'/></message>"
>>> s2 = "<message xmlns='jabber:client' to=''><mynode

xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters/><payload
type='plain'>...</payload></mynode><x xmlns='jabber:expire'
seconds='15'/></message>"
>>> s3 = "<message xmlns='jabber:client' to=''><mynode

xmlns='myprotocol:core' version='1.0' type='mytype'><payload
type='plain'>...</payload></mynode><x xmlns='jabber:expire'
seconds='15'/></message>"
>>> s4 = "<message xmlns='jabber:client' to=''><mynode

xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload
type='plain'>...</payload></mynode></message>"
>>> s5 = "<message xmlns='jabber:client' to=''><mynode

xmlns='myprotocol:core' version='1.0'
type='mytype'><parameters/><payload
type='plain'>...</payload></mynode></message>"
>>> s6 = "<message xmlns='jabber:client' to=''><mynode

xmlns='myprotocol:core' version='1.0' type='mytype'><payload
type='plain'>...</payload></mynode></message>"
>>> exp = r"(<message .*?>).*?(<mynode

..*?>).*?(?<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"
>>>
>>> re.match(exp, s1).groups()

("<message xmlns='jabber:client' to=''>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>",
'<parameters><param1>123</param1><param2>456</param2></parameters>', None)
>>>
>>> re.match(exp, s2).groups()

("<message xmlns='jabber:client' to=''>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s3).groups()

("<message xmlns='jabber:client' to=''>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s4).groups()

("<message xmlns='jabber:client' to=''>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>",
'<parameters><param1>123</param1><param2>456</param2></parameters>', None)
>>>
>>> re.match(exp, s5).groups()

("<message xmlns='jabber:client' to=''>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>
>>> re.match(exp, s6).groups()

("<message xmlns='jabber:client' to=''>", "<mynode
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
>>>



Does someone know what is wrong with my expression? Thank you, Gabriel
 
Reply With Quote
 
 
 
 
r0g
Guest
Posts: n/a
 
      12-16-2009
Gabriel Rossetti wrote:
> Hello everyone,
>
> I'm going nuts with some regex, could someone please show me what I'm
> doing wrong?
>
> I have an XMPP msg :
>

<snip>
>
>
> Does someone know what is wrong with my expression? Thank you, Gabriel





Gabriel, trying to debug a long regex in situ can be a nightmare however
the following technique always works for me...

Use the interactive interpreter and see if half the regex works, if it
does your problem is in the second half, if not it's in the first so try
the first half of that and so on an so forth. You'll find the point at
which it goes wrong in a snip.

Non-trivial regexes are always best built up and tested a bit at a time,
the interactive interpreter is great for this.

Roger.
 
Reply With Quote
 
 
 
 
Intchanter / Daniel Fackrell
Guest
Posts: n/a
 
      12-16-2009
On Dec 16, 10:22*am, r0g <aioe....@technicalbloke.com> wrote:
> Gabriel Rossetti wrote:
> > Hello everyone,

>
> > I'm going nuts with some regex, could someone please show me what I'm
> > doing wrong?

>
> > I have an XMPP msg :

>
> <snip>
>
> > Does someone know what is wrong with my expression? Thank you, Gabriel

>
> Gabriel, trying to debug a long regex in situ can be a nightmare however
> the following technique always works for me...
>
> Use the interactive interpreter and see if half the regex works, if it
> does your problem is in the second half, if not it's in the first so try
> the first half of that and so on an so forth. You'll find the point at
> which it goes wrong in a snip.
>
> Non-trivial regexes are always best built up and tested a bit at a time,
> the interactive interpreter is great for this.
>
> Roger.


I'll just add that the "now you have two problems" quip applies here,
especially when there are very good XML parsing libraries for Python
that will keep you from having to reinvent the wheel for every little
change.

See sections 20.5 through 20.13 of the Python Documentation for
several built-in options, and I'm sure there are many community
projects that may fit the bill if none of those happen to.

Personally, I consider regular expressions of any substantial length
and complexity to be bad practice as it inhibits readability and
maintainability. They are also decidedly non-Zen on at least
"Readability counts" and "Sparse is better than dense".

Intchanter
Daniel Fackrell

P.S. I'm not sure how any of these libraries are implemented yet, but
I'd hope they're using a finite state machine tailored to the parsing
task rather than using regexes, but even if they do the latter, having
that abstracted out in a mature library with a clean interface is
still a huge win.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM
Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine? =?Utf-8?B?SmViQnVzaGVsbA==?= ASP .Net 2 10-22-2005 02:43 PM
Java regex imposture re: Perl regex compatibility a_c_Attlee@yahoo.com Java 2 05-06-2005 12:16 AM
perl regex to java regex Rick Venter Java 5 11-06-2003 10:55 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57