Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to insert string in each match using RegEx iterator

Reply
Thread Tools

How to insert string in each match using RegEx iterator

 
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-10-2009
By what method would a string be inserted at each instance of a RegEx
match?

For example:

string = '123 abc 456 def 789 ghi'
newstring = ' INSERT 123 abc INSERT 456 def INSERT 789 ghi'

Here's the code I started with:

>>> rePatt = re.compile('\d+\s')
>>> iterator = rePatt.finditer(string)
>>> count = 0
>>> for match in iterator:

if count < 1:
print string[0:match.start()] + ' INSERT ' + string[match.start
():match.end()]
elif count >= 1:
print ' INSERT ' + string[match.start():match.end()]
count = count + 1

My code returns an empty string.

I'm new to Python, but I'm finding it really enjoyable (with the
exception of this challenging puzzle).

Thanks in advance.
 
Reply With Quote
 
 
 
 
Roy Smith
Guest
Posts: n/a
 
      06-10-2009
In article
<cb258e51-8c54-4b33-9b88->,
"" <> wrote:

> By what method would a string be inserted at each instance of a RegEx
> match?
>
> For example:
>
> string = '123 abc 456 def 789 ghi'
> newstring = ' INSERT 123 abc INSERT 456 def INSERT 789 ghi'


If you want to do what I think you are saying, you should be looking at the
join() string method. I'm thinking something along the lines of:

groups = match_object.groups()
newstring = " INSERT ".join(groups)
 
Reply With Quote
 
 
 
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-10-2009
On Jun 9, 11:19*pm, Roy Smith <r...@panix.com> wrote:
> In article
> <cb258e51-8c54-4b33-9b88-f23fc70a3...@z14g2000yqa.googlegroups.com>,
>
> *"504cr...@gmail.com" <504cr...@gmail.com> wrote:
> > By what method would a string be inserted at each instance of a RegEx
> > match?

>
> > For example:

>
> > string = '123 abc 456 def 789 ghi'
> > newstring = ' INSERT 123 abc INSERT 456 def INSERT 789 ghi'

>
> If you want to do what I think you are saying, you should be looking at the
> join() string method. *I'm thinking something along the lines of:
>
> groups = match_object.groups()
> newstring = " INSERT ".join(groups)


Fast answer, Roy. Thanks. That would be a graceful solution if it
works. I'll give it a try and post a solution.

Meanwhile, I know there's a logical problem with the way I was
concatenating strings in the iterator loop.

Here's a single instance example of what I'm trying to do:

>>> string = 'abc 123 def 456 ghi 789'
>>> match = rePatt.search(string)
>>> print string[0:match.start()] + 'INSERT ' + string[match.end():len(string)]

abc INSERT def 456 ghi 789
 
Reply With Quote
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-10-2009
On Jun 9, 11:35*pm, "504cr...@gmail.com" <504cr...@gmail.com> wrote:
> On Jun 9, 11:19*pm, Roy Smith <r...@panix.com> wrote:
>
>
>
> > In article
> > <cb258e51-8c54-4b33-9b88-f23fc70a3...@z14g2000yqa.googlegroups.com>,

>
> > *"504cr...@gmail.com" <504cr...@gmail.com> wrote:
> > > By what method would a string be inserted at each instance of a RegEx
> > > match?

>
> > > For example:

>
> > > string = '123 abc 456 def 789 ghi'
> > > newstring = ' INSERT 123 abc INSERT 456 def INSERT 789 ghi'

>
> > If you want to do what I think you are saying, you should be looking at the
> > join() string method. *I'm thinking something along the lines of:

>
> > groups = match_object.groups()
> > newstring = " INSERT ".join(groups)

>
> Fast answer, Roy. Thanks. That would be a graceful solution if it
> works. I'll give it a try and post a solution.
>
> Meanwhile, I know there's a logical problem with the way I was
> concatenating strings in the iterator loop.
>
> Here's a single instance example of what I'm trying to do:
>
> >>> string = 'abc 123 def 456 ghi 789'
> >>> match = rePatt.search(string)
> >>> print string[0:match.start()] + 'INSERT ' + string[match.end():len(string)]

>
> abc INSERT def 456 ghi 789


Thanks Roy. A little closer to a solution. I'm still processing how to
step forward, but this is a good start:

>>> string = 'abc 123 def 456 ghi 789'
>>> rePatt = re.compile('\s\d+\s')
>>> foundGroup = rePatt.findall(string)
>>> newstring = ' INSERT '.join(foundGroup)
>>> print newstring

123 INSERT 456

What I really want to do is return the full string, not just the
matches -- concatenated around the ' INSERT ' string.
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      06-10-2009
wrote:

> By what method would a string be inserted at each instance of a RegEx
> match?
>
> For example:
>
> string = '123 abc 456 def 789 ghi'
> newstring = ' INSERT 123 abc INSERT 456 def INSERT 789 ghi'


Have a look at re.sub():

>>> s = '123 abc 456 def 789 ghi'
>>> re.compile(r"(\d+\s)").sub(r"INSERT \1", s)

'INSERT 123 abc INSERT 456 def INSERT 789 ghi'

Peter

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      06-10-2009
On Jun 9, 11:13*pm, "504cr...@gmail.com" <504cr...@gmail.com> wrote:
> By what method would a string be inserted at each instance of a RegEx
> match?
>


Some might say that using a parsing library for this problem is
overkill, but let me just put this out there as another data point for
you. Pyparsing (http://pyparsing.wikispaces.com) supports callbacks
that allow you to embellish the matched tokens, and create a new
string containing the modified text for each match of a pyparsing
expression. Hmm, maybe the code example is easier to follow than the
explanation...


from pyparsing import Word, nums, Regex

# an integer is a 'word' composed of numeric characters
integer = Word(nums)

# or use this if you prefer
integer = Regex(r'\d+')

# attach a parse action to prefix 'INSERT ' before the matched token
integer.setParseAction(lambda tokens: "INSERT " + tokens[0])

# use transformString to search through the input, applying the
# parse action to all matches of the given expression
test = '123 abc 456 def 789 ghi'
print integer.transformString(test)

# prints
# INSERT 123 abc INSERT 456 def INSERT 789 ghi


I offer this because often the simple examples that get posted are
just the barest tip of the iceberg of what the poster eventually plans
to tackle.

Good luck in your Pythonic adventure!
-- Paul
 
Reply With Quote
 
Brian D
Guest
Posts: n/a
 
      06-10-2009
On Jun 10, 5:17*am, Paul McGuire <pt...@austin.rr.com> wrote:
> On Jun 9, 11:13*pm, "504cr...@gmail.com" <504cr...@gmail.com> wrote:
>
> > By what method would a string be inserted at each instance of a RegEx
> > match?

>
> Some might say that using a parsing library for this problem is
> overkill, but let me just put this out there as another data point for
> you. *Pyparsing (http://pyparsing.wikispaces.com) supports callbacks
> that allow you to embellish the matched tokens, and create a new
> string containing the modified text for each match of a pyparsing
> expression. *Hmm, maybe the code example is easier to follow than the
> explanation...
>
> from pyparsing import Word, nums, Regex
>
> # an integer is a 'word' composed of numeric characters
> integer = Word(nums)
>
> # or use this if you prefer
> integer = Regex(r'\d+')
>
> # attach a parse action to prefix 'INSERT ' before the matched token
> integer.setParseAction(lambda tokens: "INSERT " + tokens[0])
>
> # use transformString to search through the input, applying the
> # parse action to all matches of the given expression
> test = '123 abc 456 def 789 ghi'
> print integer.transformString(test)
>
> # prints
> # INSERT 123 abc INSERT 456 def INSERT 789 ghi
>
> I offer this because often the simple examples that get posted are
> just the barest tip of the iceberg of what the poster eventually plans
> to tackle.
>
> Good luck in your Pythonic adventure!
> -- Paul


Thanks for all of the instant feedback. I have enumerated three
responses below:

First response:

Peter,

I wonder if you (or anyone else) might attempt a different explanation
for the use of the special sequence '\1' in the RegEx syntax.

The Python documentation explains:

\number
Matches the contents of the group of the same number. Groups are
numbered starting from 1. For example, (.+) \1 matches 'the the' or
'55 55', but not 'the end' (note the space after the group). This
special sequence can only be used to match one of the first 99 groups.
If the first digit of number is 0, or number is 3 octal digits long,
it will not be interpreted as a group match, but as the character with
octal value number. Inside the '[' and ']' of a character class, all
numeric escapes are treated as characters.

In practice, this appears to be the key to the key device to your
clever solution:

>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)

'abc INSERT 123 def INSERT 456 ghi INSERT 789'

>>> re.compile(r"(\d+)").sub(r"INSERT ", string)

'abc INSERT def INSERT ghi INSERT '

I don't, however, precisely understand what is meant by "the group of
the same number" -- or maybe I do, but it isn't explicit. Is this just
a shorthand reference to match.group(1) -- if that were valid --
implying that the group match result is printed in the compile
execution?


Second response:

I've encountered a problem with my RegEx learning curve which I'll be
posting in a new thread -- how to escape hash characters # in strings
being matched, e.g.:

>>> string = re.escape('123#456')
>>> match = re.match('\d+', string)
>>> print match

<_sre.SRE_Match object at 0x00A6A800>
>>> print match.group()

123


Third response:

Paul,

Thanks for the referring me to the Pyparsing module. I'm thoroughly
enjoying Python, but I'm not prepared right now to say I've mastered
the Pyparsing module. As I continue my work, however, I'll be tackling
the problem of parsing addresses, exactly as the Pyparsing module
example illustrates. I'm sure I'll want to use it then.
 
Reply With Quote
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-10-2009
On Jun 10, 5:17*am, Paul McGuire <pt...@austin.rr.com> wrote:
> On Jun 9, 11:13*pm, "504cr...@gmail.com" <504cr...@gmail.com> wrote:
>
> > By what method would a string be inserted at each instance of a RegEx
> > match?

>
> Some might say that using a parsing library for this problem is
> overkill, but let me just put this out there as another data point for
> you. *Pyparsing (http://pyparsing.wikispaces.com) supports callbacks
> that allow you to embellish the matched tokens, and create a new
> string containing the modified text for each match of a pyparsing
> expression. *Hmm, maybe the code example is easier to follow than the
> explanation...
>
> from pyparsing import Word, nums, Regex
>
> # an integer is a 'word' composed of numeric characters
> integer = Word(nums)
>
> # or use this if you prefer
> integer = Regex(r'\d+')
>
> # attach a parse action to prefix 'INSERT ' before the matched token
> integer.setParseAction(lambda tokens: "INSERT " + tokens[0])
>
> # use transformString to search through the input, applying the
> # parse action to all matches of the given expression
> test = '123 abc 456 def 789 ghi'
> print integer.transformString(test)
>
> # prints
> # INSERT 123 abc INSERT 456 def INSERT 789 ghi
>
> I offer this because often the simple examples that get posted are
> just the barest tip of the iceberg of what the poster eventually plans
> to tackle.
>
> Good luck in your Pythonic adventure!
> -- Paul


Thanks for all of the instant feedback. I have enumerated three
responses below:

First response:

Peter,

I wonder if you (or anyone else) might attempt a different explanation
for the use of the special sequence '\1' in the RegEx syntax.

The Python documentation explains:

\number
Matches the contents of the group of the same number. Groups are
numbered starting from 1. For example, (.+) \1 matches 'the the' or
'55 55', but not 'the end' (note the space after the group). This
special sequence can only be used to match one of the first 99 groups.
If the first digit of number is 0, or number is 3 octal digits long,
it will not be interpreted as a group match, but as the character with
octal value number. Inside the '[' and ']' of a character class, all
numeric escapes are treated as characters.

In practice, this appears to be the key to the key device to your
clever solution:

>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)


'abc INSERT 123 def INSERT 456 ghi INSERT 789'

>>> re.compile(r"(\d+)").sub(r"INSERT ", string)


'abc INSERT def INSERT ghi INSERT '

I don't, however, precisely understand what is meant by "the group of
the same number" -- or maybe I do, but it isn't explicit. Is this just
a shorthand reference to match.group(1) -- if that were valid --
implying that the group match result is printed in the compile
execution?

Second response:

I've encountered a problem with my RegEx learning curve which I'll be
posting in a new thread -- how to escape hash characters # in strings
being matched, e.g.:

>>> string = re.escape('123#456')
>>> match = re.match('\d+', string)
>>> print match


<_sre.SRE_Match object at 0x00A6A800>
>>> print match.group()


123

Third response:

Paul,

Thanks for the referring me to the Pyparsing module. I'm thoroughly
enjoying Python, but I'm not prepared right now to say I've mastered
the Pyparsing module. As I continue my work, however, I'll be tackling
the problem of parsing addresses, exactly as the Pyparsing module
example illustrates. I'm sure I'll want to use it then.
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      06-10-2009
wrote:

> I wonder if you (or anyone else) might attempt a different explanation
> for the use of the special sequence '\1' in the RegEx syntax.
>
> The Python documentation explains:
>
> \number
> Matches the contents of the group of the same number. Groups are
> numbered starting from 1. For example, (.+) \1 matches 'the the' or
> '55 55', but not 'the end' (note the space after the group). This
> special sequence can only be used to match one of the first 99 groups.
> If the first digit of number is 0, or number is 3 octal digits long,
> it will not be interpreted as a group match, but as the character with
> octal value number. Inside the '[' and ']' of a character class, all
> numeric escapes are treated as characters.
>
> In practice, this appears to be the key to the key device to your
> clever solution:
>
>>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)

>
> 'abc INSERT 123 def INSERT 456 ghi INSERT 789'
>
>>>> re.compile(r"(\d+)").sub(r"INSERT ", string)

>
> 'abc INSERT def INSERT ghi INSERT '
>
> I don't, however, precisely understand what is meant by "the group of
> the same number" -- or maybe I do, but it isn't explicit. Is this just
> a shorthand reference to match.group(1) -- if that were valid --
> implying that the group match result is printed in the compile
> execution?


If I understand you correctly you are right. Another example:

>>> re.compile(r"([a-z]+)(\d+)").sub(r"number=\2 word=\1", "a1 zzz42")

'number=1 word=a number=42 word=zzz'

For every match of "[a-z]+\d+" in the original string "\1" in
"number=\2 word=\1" is replaced with the actual match for "[a-z]+" and
"\2" is replaced with the actual match for "\d+".

The result, e. g. "number=1 word=a", is then used to replace the actual
match for group 0, i. e. "a1" in the example.

Peter


 
Reply With Quote
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-11-2009
On Jun 10, 10:13*am, Peter Otten <__pete...@web.de> wrote:
> 504cr...@gmail.com wrote:
> > I wonder if you (or anyone else) might attempt a different explanation
> > for the use of the special sequence '\1' in the RegEx syntax.

>
> > The Python documentation explains:

>
> > \number
> > * * Matches the contents of the group of the same number. Groups are
> > numbered starting from 1. For example, (.+) \1 matches 'the the' or
> > '55 55', but not 'the end' (note the space after the group). This
> > special sequence can only be used to match one of the first 99 groups.
> > If the first digit of number is 0, or number is 3 octal digits long,
> > it will not be interpreted as a group match, but as the character with
> > octal value number. Inside the '[' and ']' of a character class, all
> > numeric escapes are treated as characters.

>
> > In practice, this appears to be the key to the key device to your
> > clever solution:

>
> >>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)

>
> > 'abc INSERT 123 def INSERT 456 ghi INSERT 789'

>
> >>>> re.compile(r"(\d+)").sub(r"INSERT ", string)

>
> > 'abc INSERT *def INSERT *ghi INSERT '

>
> > I don't, however, precisely understand what is meant by "the group of
> > the same number" -- or maybe I do, but it isn't explicit. Is this just
> > a shorthand reference to match.group(1) -- if that were valid --
> > implying that the group match result is printed in the compile
> > execution?

>
> If I understand you correctly you are right. Another example:
>
> >>> re.compile(r"([a-z]+)(\d+)").sub(r"number=\2 word=\1", "a1 zzz42")

>
> 'number=1 word=a number=42 word=zzz'
>
> For every match of "[a-z]+\d+" in the original string "\1" in
> "number=\2 word=\1" is replaced with the actual match for "[a-z]+" and
> "\2" is replaced with the actual match for "\d+".
>
> The result, e. g. "number=1 word=a", is then used to replace the actual
> match for group 0, i. e. "a1" in the example.
>
> Peter- Hide quoted text -
>
> - Show quoted text -


Wow! That is so cool. I had to process it for a little while to get
it.

>>> s = '111bbb333'
>>> re.compile('(\d+)([b]+)(\d+)').sub(r'First string: \1 Second string: \2 Third string: \3', s)

'First string: 111 Second string: bbb Third string: 333'

MRI scans would no doubt reveal that people who attain a mastery of
RegEx expressions must have highly developed areas of the brain. I
wonder where the RegEx part of the brain might be located.

That was a really clever teaching device. I really appreciate you
taking the time to post it, Peter. I'm definitely getting a schooling
on this list.

Thanks!
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
regex =~ string or string =~ regex? Ruby Newbee Ruby 3 01-04-2010 06:04 PM
String#match vs. Regexp#match - confused Old Echo Ruby 1 09-04-2008 06:11 PM
difference between the each iterator and the collect iterator? vasten@gmail.com Ruby 4 10-28-2005 03:34 AM
Java regex can't match lengthy match? hiwa Java 0 01-29-2004 10:09 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57