Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > python replace/sub/wildcard/regex issue

Reply
Thread Tools

python replace/sub/wildcard/regex issue

 
 
tom
Guest
Posts: n/a
 
      01-19-2010
hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo Choi</span>LONGEDITBOX">Apryl Berney
Soo Choi</span>LONGEDITBOX">Joel Franks
Joel Franks</span>GEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

tom
 
Reply With Quote
 
 
 
 
Chris Rebert
Guest
Posts: n/a
 
      01-19-2010
On Mon, Jan 18, 2010 at 8:04 PM, tom <(E-Mail Removed)> wrote:
> hi...
>
> trying to figure out how to solve what should be an easy python/regex/
> wildcard/replace issue.
>
> i've tried a number of different approaches.. so i must be missing
> something...
>
> my initial sample text are:
>
> Soo Choi</span>LONGEDITBOX">Apryl Berney
> Soo Choi</span>LONGEDITBOX">Joel Franks
> Joel Franks</span>GEDITBOX">Alexander Yamato
>
> and i'm trying to get
>
> Soo Choi foo Apryl Berney
> Soo Choi foo Joel Franks
> Joel Franks foo Alexander Yamato
>
> the issue i'm facing.. is how to start at "</" and end at '">' and
> substitute inclusive of the stuff inside the regex...
>
> i've tried derivations of
>
> name=re.sub("</s[^>]*\">"," foo ",name)
>
> but i'm missing something...
>
> thoughts... thanks


"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems."

Assuming your sample text is representative of all your test:

new_text = "\n".join(line[:line.index('<')] +
line[line.rindex('>')+1:] for line in your_text.split('\n'))

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
 
 
 
alex23
Guest
Posts: n/a
 
      01-19-2010
On Jan 19, 2:04*pm, tom <(E-Mail Removed)> wrote:
> trying to figure out how to solve what should be an easy python/regex/
> wildcard/replace issue.
> but i'm missing something...


Well, some would say you've missed the most obvious solution of _not_
using regexps

I'd probably do it via string methods wrapped up in a helper function:

>>> def extract(text):

.... first, rest = text.split('<', 1)
.... ignore, last = rest.rsplit('>', 1)
.... return '%s foo %s' % (first, last)
....
>>> extract('Soo Choi</span>LONGEDITBOX">Apryl Berney')

'Soo Choi foo Apryl Berney'
>>> extract('Soo Choi</span>LONGEDITBOX">Joel Franks')

'Soo Choi foo Joel Franks'
>>> extract('Joel Franks</span>GEDITBOX">Alexander Yamato')

'Joel Franks foo Alexander Yamato'



 
Reply With Quote
 
Chris Rebert
Guest
Posts: n/a
 
      01-19-2010
On Mon, Jan 18, 2010 at 8:31 PM, Chris Rebert <(E-Mail Removed)> wrote:
> On Mon, Jan 18, 2010 at 8:04 PM, tom <(E-Mail Removed)> wrote:
>> hi...
>>
>> trying to figure out how to solve what should be an easy python/regex/
>> wildcard/replace issue.
>>
>> i've tried a number of different approaches.. so i must be missing
>> something...
>>
>> my initial sample text are:
>>
>> Soo Choi</span>LONGEDITBOX">Apryl Berney
>> Soo Choi</span>LONGEDITBOX">Joel Franks
>> Joel Franks</span>GEDITBOX">Alexander Yamato
>>
>> and i'm trying to get
>>
>> Soo Choi foo Apryl Berney
>> Soo Choi foo Joel Franks
>> Joel Franks foo Alexander Yamato
>>
>> the issue i'm facing.. is how to start at "</" and end at '">' and
>> substitute inclusive of the stuff inside the regex...
>>
>> i've tried derivations of
>>
>> name=re.sub("</s[^>]*\">"," foo ",name)
>>
>> but i'm missing something...
>>
>> thoughts... thanks

>
> "Some people, when confronted with a problem, think 'I know, I'll use
> regular expressions.' Now they have two problems."
>
> Assuming your sample text is representative of all your test:
>
> new_text = "\n".join(line[:line.index('<')] + line[line.rindex('>')+1:] for line in your_text.split('\n'))


Erm, remembering to intersperse the "foo" (should be all 1-line, bloody Gmail):
new_text = "\n".join(line[:line.index('<')] + " foo " +
line[line.rindex('>')+1:] for line in your_text.split('\n'))

Or just use alex23's method, which seems all-round superior.

Cheers,
Chris
 
Reply With Quote
 
dippim
Guest
Posts: n/a
 
      01-19-2010
On Jan 18, 11:04*pm, tom <(E-Mail Removed)> wrote:
> hi...
>
> trying to figure out how to solve what should be an easy python/regex/
> wildcard/replace issue.
>
> i've tried a number of different approaches.. so i must be missing
> something...
>
> my initial sample text are:
>
> Soo Choi</span>LONGEDITBOX">Apryl Berney
> Soo Choi</span>LONGEDITBOX">Joel Franks
> Joel Franks</span>GEDITBOX">Alexander Yamato
>
> and i'm trying to get
>
> Soo Choi foo Apryl Berney
> Soo Choi foo Joel Franks
> Joel Franks foo Alexander Yamato
>
> the issue i'm facing.. is how to start at "</" and end at '">' and
> substitute inclusive of the stuff inside the regex...
>
> i've tried derivations of
>
> name=re.sub("</s[^>]*\">"," foo ",name)
>
> but i'm missing something...
>
> thoughts... thanks
>
> tom


The problem here is that </s matches itself correctly. However, [^>]*
consumes anything that's not > and then stops when it hits something
that is >. So, [^>]* consumes "pan" in each case, then tries to match
\">, but fails since there isn't a ", so the match ends. It never
makes it to the second >.

I agree with Chris Rebert, regexes are dangerous because the number of
possible cases where you can match isn't always clear (see the above
explanation . Also, if the number of comparisons you have to do
isn't high, they can be inefficient. However, for your limited set of
examples the following should work:

aList = ['Soo Choi</span>LONGEDITBOX">Apryl Berney',
'Soo Choi</span>LONGEDITBOX">Joel Franks',
'Joel Franks</span>GEDITBOX">Alexander Yamato']

matcher = re.compile(r"<[\w\W]*>")

newList = []
for x in aList:
newList.append(matcher.sub(" foo ", x))

print newList

David
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it a hardware issue or a config issue or something else Scooty Cisco 0 06-14-2008 04:02 PM
Data Storage Issue (Basic Issue) Srini Java 11 06-01-2008 01:17 AM
Service Pack 2: login issue's and power management issue's ?!? Skybuck Flying Windows 64bit 0 04-07-2007 03:12 PM
inspiron 8200 video issue and hd issue the pez lover Computer Support 1 02-05-2007 02:44 AM
Major ActiveX Domain issue. NOT LOCAL PC ISSUE joe.valentine@gmail.com Computer Support 8 02-06-2006 09:03 PM



Advertisments