Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > brackets content regular expression

Reply
Thread Tools

brackets content regular expression

 
 
netimen
Guest
Posts: n/a
 
      10-31-2008
I have a text containing brackets (or what is the correct term for
'>'?). I'd like to match text in the uppermost level of brackets.

So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
bbb < a <tt > ff > > 2 )?

P.S. sorry for my english.
 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      10-31-2008
On Oct 31, 12:25*pm, netimen <(E-Mail Removed)> wrote:
> I have a text containing brackets (or what is the correct term for
> '>'?). I'd like to match text in the uppermost level of brackets.
>
> So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt *> ff > > 2 >
> bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> bbb < a <tt *> ff > > 2 )?
>
> P.S. sorry for my english.


To match opening and closing parens, delimiters, whatever (I refer to
these '<>' as "angle brackets" when talking about them in this
context, otherwise they are just "less than" and "greater than"), you
will need some kind of stack-based parser. You can write your own
without much trouble - there are built-ins in pyparsing that do most
of the work.

Here is the nestedExpr method:
>>> from pyparsing import nestedExpr
>>> print nestedExpr('<','>').searchString('aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 > bbbbb')

[[['1', 'aaa', ['t', 'bbb', ['a', ['tt'], 'ff']], '2']]]

Note that the results show not the original nested text, but the
parsed words in a fully nested structure.

If all you want is the highest-level text, then you can wrap your
nestedExpr parser inside a call to originalTextFor:

>>> from pyparsing import originalTextFor
>>> print originalTextFor(nestedExpr('<','>')).searchString( 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 > bbbbb')

[['< 1 aaa < t bbb < a <tt > ff > > 2 >']]

More on pyparsing at http://pyparsing.wikispaces.com.

-- Paul
 
Reply With Quote
 
 
 
 
Matimus
Guest
Posts: n/a
 
      10-31-2008
On Oct 31, 10:25*am, netimen <(E-Mail Removed)> wrote:
> I have a text containing brackets (or what is the correct term for
> '>'?). I'd like to match text in the uppermost level of brackets.
>
> So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt *> ff > > 2 >
> bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> bbb < a <tt *> ff > > 2 )?
>
> P.S. sorry for my english.


I think most people call them "angle brackets". Anyway it should be
easy to just match the outer most brackets:

>>> import re
>>> text = "aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >"
>>> r = re.compile("<(.+)>")
>>> m = r.search(text)
>>> m.group(1)

' 1 aaa < t bbb < a <tt > ff > > 2 '

In this case the regular expression is automatically greedy, matching
the largest area possible. Note however that it won't work if you have
something like this: "<first> <second>".

Matt
 
Reply With Quote
 
netimen
Guest
Posts: n/a
 
      10-31-2008
Thank's but if i have several top-level groups and want them match one
by one:

text = "a < b < > d > here starts a new group: < e < f > g >"

I want to match first " b < > d " and then " e < f > g " but not "
b < > d > here starts a new group: < e < f > g "
On 31 , 20:53, Matimus <(E-Mail Removed)> wrote:
> On Oct 31, 10:25am, netimen <(E-Mail Removed)> wrote:
>
> > I have a text containing brackets (or what is the correct term for
> > '>'?). I'd like to match text in the uppermost level of brackets.

>
> > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
> > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > bbb < a <tt > ff > > 2 )?

>
> > P.S. sorry for my english.

>
> I think most people call them "angle brackets". Anyway it should be
> easy to just match the outer most brackets:
>
> >>> import re
> >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >"
> >>> r = re.compile("<(.+)>")
> >>> m = r.search(text)
> >>> m.group(1)

>
> ' 1 aaa < t bbb < a <tt > ff > > 2 '
>
> In this case the regular expression is automatically greedy, matching
> the largest area possible. Note however that it won't work if you have
> something like this: "<first> <second>".
>
> Matt


 
Reply With Quote
 
netimen
Guest
Posts: n/a
 
      10-31-2008
there may be different levels of nesting:

"a < b < Ó > d > here starts a new group: < 1 < e < f > g > 2 >
another group: < 3 >"

On 31 окт, 21:57, netimen <(E-Mail Removed)> wrote:
> Thank's but if i have several top-level groups and want them match one
> by one:
>
> text = "a < b < Ó > d > here starts a new group: *< e < f *> g >"
>
> I want to match first " b < Ó > d " and then " e < f *> g " but not "
> b < Ó > d > here starts a new group: *< e < f *> g "
> On 31 ÏËÔ, 20:53, Matimus <(E-Mail Removed)> wrote:
>
>
>
> > On Oct 31, 10:25šam, netimen <(E-Mail Removed)> wrote:

>
> > > I have a text containing brackets (or what is the correct term for
> > > '>'?). I'd like to match text in the uppermost level of brackets.

>
> > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > bbb < a <tt š> ff > > 2 )?

>
> > > P.S. sorry for my english.

>
> > I think most people call them "angle brackets". Anyway it should be
> > easy to just match the outer most brackets:

>
> > >>> import re
> > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > >>> r = re.compile("<(.+)>")
> > >>> m = r.search(text)
> > >>> m.group(1)

>
> > ' 1 aaa < t bbb < a <tt š> ff > > 2 '

>
> > In this case the regular expression is automatically greedy, matching
> > the largest area possible. Note however that it won't work if you have
> > something like this: "<first> <second>".

>
> > Matt


 
Reply With Quote
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      10-31-2008
netimen:
> Thank's but if i have several top-level groups and want them match one
> by one:
> text = "a < b < > d > here starts a new group: *< e < f *> g >"


What other requirements do you have? If you list them all at once
people will write you the code faster.

bye,
Bearophile
 
Reply With Quote
 
Pierre Quentel
Guest
Posts: n/a
 
      10-31-2008
On 31 oct, 20:38, netimen <(E-Mail Removed)> wrote:
> there may be different levels of nesting:
>
> "a < b < Ó > d > here starts a new group: < 1 < e < f *> g > 2 >
> another group: < 3 >"
>
> On 31 окт, 21:57, netimen <(E-Mail Removed)> wrote:
>
> > Thank's but if i have several top-level groups and want them match one
> > by one:

>
> > text = "a < b < Ó > d > here starts a new group: *< e < f *> g >"

>
> > I want to match first " b < Ó > d " and then " e < f *> g " but not "
> > b < Ó > d > here starts a new group: *< e < f *> g "
> > On 31 ÏËÔ, 20:53, Matimus <(E-Mail Removed)> wrote:

>
> > > On Oct 31, 10:25šam, netimen <(E-Mail Removed)> wrote:

>
> > > > I have a text containing brackets (or what is the correct term for
> > > > '>'?). I'd like to match text in the uppermost level of brackets.

>
> > > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > > bbb < a <tt š> ff > > 2 )?

>
> > > > P.S. sorry for my english.

>
> > > I think most people call them "angle brackets". Anyway it should be
> > > easy to just match the outer most brackets:

>
> > > >>> import re
> > > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > > >>> r = re.compile("<(.+)>")
> > > >>> m = r.search(text)
> > > >>> m.group(1)

>
> > > ' 1 aaa < t bbb < a <tt š> ff > > 2 '

>
> > > In this case the regular expression is automatically greedy, matching
> > > the largest area possible. Note however that it won't work if you have
> > > something like this: "<first> <second>".

>
> > > Matt

>
>


Hi,

Regular expressions or pyparsing might be overkill for this problem ;
you can use a simple algorithm to read each character, increment a
counter when you find a < and decrement when you find a > ; when the
counter goes back to its initial value you have the end of a top level
group

Something like :

def top_level(txt):
level = 0
start = None
groups = []
for i,car in enumerate(txt):
if car == "<":
level += 1
if not start:
start = i
elif car == ">":
level -= 1
if start and level == 0:
groups.append(txt[start+1:i])
start = None
return groups

print top_level("a < b < 0 > d > < 1 < e < f > g > 2 > < 3 >")

>> [' b < 0 > d ', ' 1 < e < f > g > 2 ', ' 3 ']


Best,
Pierre
 
Reply With Quote
 
Matimus
Guest
Posts: n/a
 
      10-31-2008
On Oct 31, 11:57*am, netimen <(E-Mail Removed)> wrote:
> Thank's but if i have several top-level groups and want them match one
> by one:
>
> text = "a < b < > d > here starts a new group: *< e < f *> g >"
>
> I want to match first " b < > d " and then " e < f *> g " but not "
> b < > d > here starts a new group: *< e < f *> g "
> On 31 , 20:53, Matimus <(E-Mail Removed)> wrote:
>
> > On Oct 31, 10:25am, netimen <(E-Mail Removed)> wrote:

>
> > > I have a text containing brackets (or what is the correct term for
> > > '>'?). I'd like to match text in the uppermost level of brackets.

>
> > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
> > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > bbb < a <tt > ff > > 2 )?

>
> > > P.S. sorry for my english.

>
> > I think most people call them "angle brackets". Anyway it should be
> > easy to just match the outer most brackets:

>
> > >>> import re
> > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >"
> > >>> r = re.compile("<(.+)>")
> > >>> m = r.search(text)
> > >>> m.group(1)

>
> > ' 1 aaa < t bbb < a <tt > ff > > 2 '

>
> > In this case the regular expression is automatically greedy, matching
> > the largest area possible. Note however that it won't work if you have
> > something like this: "<first> <second>".

>
> > Matt

>
>


As far as I know, you can't do that with a regular expressions (by
definition regular expressions aren't recursive). You can use a
regular expression to aid you, but there is no magic expression that
will give it to you for free.

In this case it is actually pretty easy to do it without regular
expressions at all:

>>> text = "a < b < O > d > here starts a new group: < e < f > g >"
>>> def get_nested_strings(text, depth=0):

.... stack = []
.... for i, c in enumerate(text):
.... if c == '<':
.... stack.append(i)
.... elif c == '>':
.... start = stack.pop() + 1
.... if len(stack) == depth:
.... yield text[start:i]
....
>>> for seg in get_nested_strings(text):

.... print seg
....
b < O > d
e < f > g


Matt
 
Reply With Quote
 
netimen
Guest
Posts: n/a
 
      11-01-2008
Yeah, I know it's quite simple to do manually. I was just interested
if it could be done by regular expressions. Thank you anyway.
On 1 нояб, 00:36, Matimus <(E-Mail Removed)> wrote:
> On Oct 31, 11:57*am, netimen <(E-Mail Removed)> wrote:
>
>
>
>
>
> > Thank's but if i have several top-level groups and want them match one
> > by one:

>
> > text = "a < b < Ó > d > here starts a new group: *< e < f *> g >"

>
> > I want to match first " b < Ó > d " and then " e < f *> g " but not "
> > b < Ó > d > here starts a new group: *< e < f *> g "
> > On 31 ÏËÔ, 20:53, Matimus <(E-Mail Removed)> wrote:

>
> > > On Oct 31, 10:25šam, netimen <(E-Mail Removed)> wrote:

>
> > > > I have a text containing brackets (or what is the correct term for
> > > > '>'?). I'd like to match text in the uppermost level of brackets.

>
> > > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > > bbb < a <tt š> ff > > 2 )?

>
> > > > P.S. sorry for my english.

>
> > > I think most people call them "angle brackets". Anyway it should be
> > > easy to just match the outer most brackets:

>
> > > >>> import re
> > > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > > >>> r = re.compile("<(.+)>")
> > > >>> m = r.search(text)
> > > >>> m.group(1)

>
> > > ' 1 aaa < t bbb < a <tt š> ff > > 2 '

>
> > > In this case the regular expression is automatically greedy, matching
> > > the largest area possible. Note however that it won't work if you have
> > > something like this: "<first> <second>".

>
> > > Matt

>
> As far as I know, you can't do that with a regular expressions (by
> definition regular expressions aren't recursive). You can use a
> regular expression to aid you, but there is no magic expression that
> will give it to you for free.
>
> In this case it is actually pretty easy to do it without regular
> expressions at all:
>
> >>> text = "a < b < O > d > here starts a new group: *< e < f *> g >"
> >>> def get_nested_strings(text, depth=0):

>
> ... * * stack = []
> ... * * for i, c in enumerate(text):
> ... * * * * if c == '<':
> ... * * * * * * stack.append(i)
> ... * * * * elif c == '>':
> ... * * * * * * start = stack.pop() + 1
> ... * * * * * * if len(stack) == depth:
> ... * * * * * * * * yield text[start:i]
> ...>>> for seg in get_nested_strings(text):
>
> ... *print seg
> ...
> *b < O > d
> *e < f *> g
>
> Matt


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching regular expr: How to match the following pattern with quotes, brackets and semicolons? Peter Stacy Perl Misc 1 11-08-2009 09:27 AM
eliminate excessive brackets in C++ expression Dennis Yurichev C++ 0 01-14-2007 07:34 AM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Regular expressions when searching for string containing brackets or parans .. Joe Halbrook Perl Misc 2 10-22-2003 12:27 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments