Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: substitution

Reply
Thread Tools

Re: substitution

 
 
Iain King
Guest
Posts: n/a
 
      01-18-2010
On Jan 18, 10:21*am, superpollo <(E-Mail Removed)> wrote:
> superpollo ha scritto:
>
> > hi.

>
> > what is the most pythonic way to substitute substrings?

>
> > eg: i want to apply:

>
> > foo --> bar
> > baz --> quux
> > quuux --> foo

>
> > so that:

>
> > fooxxxbazyyyquuux --> barxxxquuxyyyfoo

>
> > bye

>
> i explain better:
>
> say the subs are:
>
> quuux --> foo
> foo --> bar
> baz --> quux
>
> then i cannot apply the subs in sequence (say, .replace() in a loop),
> otherwise:
>
> fooxxxbazyyyquuux --> fooxxxbazyyyfoo --> barxxxbazyyybar -->
> barxxxquuxyyybar
>
> not as intended...



Not sure if it's the most pythonic, but I'd probably do it like this:

def token_replace(string, subs):
subs = dict(subs)
tokens = {}
for i, sub in enumerate(subs):
tokens[sub] = i
tokens[i] = sub
current = [string]
for sub in subs:
new = []
for piece in current:
if type(piece) == str:
chunks = piece.split(sub)
new.append(chunks[0])
for chunk in chunks[1:]:
new.append(tokens[sub])
new.append(chunk)
else:
new.append(piece)
current = new
output = []
for piece in current:
if type(piece) == str:
output.append(piece)
else:
output.append(subs[tokens[piece]])
return ''.join(output)

>>> token_replace("fooxxxbazyyyquuux", [("quuux", "foo"), ("foo", "bar"), ("baz", "quux")])

'barxxxquuxyyyfoo'

I'm sure someone could whittle that down to a handful of list comps...
Iain
 
Reply With Quote
 
 
 
 
Iain King
Guest
Posts: n/a
 
      01-18-2010
On Jan 18, 12:41*pm, Iain King <(E-Mail Removed)> wrote:
> On Jan 18, 10:21*am, superpollo <(E-Mail Removed)> wrote:
>
>
>
> > superpollo ha scritto:

>
> > > hi.

>
> > > what is the most pythonic way to substitute substrings?

>
> > > eg: i want to apply:

>
> > > foo --> bar
> > > baz --> quux
> > > quuux --> foo

>
> > > so that:

>
> > > fooxxxbazyyyquuux --> barxxxquuxyyyfoo

>
> > > bye

>
> > i explain better:

>
> > say the subs are:

>
> > quuux --> foo
> > foo --> bar
> > baz --> quux

>
> > then i cannot apply the subs in sequence (say, .replace() in a loop),
> > otherwise:

>
> > fooxxxbazyyyquuux --> fooxxxbazyyyfoo --> barxxxbazyyybar -->
> > barxxxquuxyyybar

>
> > not as intended...

>
> Not sure if it's the most pythonic, but I'd probably do it like this:
>
> def token_replace(string, subs):
> * * * * subs = dict(subs)
> * * * * tokens = {}
> * * * * for i, sub in enumerate(subs):
> * * * * * * * * tokens[sub] = i
> * * * * * * * * tokens[i] = sub
> * * * * current = [string]
> * * * * for sub in subs:
> * * * * * * * * new = []
> * * * * * * * * for piece in current:
> * * * * * * * * * * * * if type(piece) == str:
> * * * * * * * * * * * * * * * * chunks = piece.split(sub)
> * * * * * * * * * * * * * * * * new.append(chunks[0])
> * * * * * * * * * * * * * * * * for chunk in chunks[1:]:
> * * * * * * * * * * * * * * * * * * * * new.append(tokens[sub])
> * * * * * * * * * * * * * * * * * * * * new.append(chunk)
> * * * * * * * * * * * * else:
> * * * * * * * * * * * * * * * * new.append(piece)
> * * * * * * * * current = new
> * * * * output = []
> * * * * for piece in current:
> * * * * * * * * if type(piece) == str:
> * * * * * * * * * * * * output.append(piece)
> * * * * * * * * else:
> * * * * * * * * * * * * output.append(subs[tokens[piece]])
> * * * * return ''.join(output)
>
> >>> token_replace("fooxxxbazyyyquuux", [("quuux", "foo"), ("foo", "bar"), ("baz", "quux")])

>
> 'barxxxquuxyyyfoo'
>
> I'm sure someone could whittle that down to a handful of list comps...
> Iain


Slightly better (lets you have overlapping search strings, used in the
order they are fed in):

def token_replace(string, subs):
tokens = {}
if type(subs) == dict:
for i, sub in enumerate(subs):
tokens[sub] = i
tokens[i] = subs[sub]
else:
s = []
for i, (k,v) in enumerate(subs):
tokens[k] = i
tokens[i] = v
s.append(k)
subs = s
current = [string]
for sub in subs:
new = []
for piece in current:
if type(piece) == str:
chunks = piece.split(sub)
new.append(chunks[0])
for chunk in chunks[1:]:
new.append(tokens[sub])
new.append(chunk)
else:
new.append(piece)
current = new
output = []
for piece in current:
if type(piece) == str:
output.append(piece)
else:
output.append(tokens[piece])
return ''.join(output)
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      01-18-2010
Iain King wrote:

> Not sure if it's the most pythonic, but I'd probably do it like this:
>
> def token_replace(string, subs):
> subs = dict(subs)
> tokens = {}
> for i, sub in enumerate(subs):
> tokens[sub] = i
> tokens[i] = sub
> current = [string]
> for sub in subs:
> new = []
> for piece in current:
> if type(piece) == str:
> chunks = piece.split(sub)
> new.append(chunks[0])
> for chunk in chunks[1:]:
> new.append(tokens[sub])
> new.append(chunk)
> else:
> new.append(piece)
> current = new
> output = []
> for piece in current:
> if type(piece) == str:
> output.append(piece)
> else:
> output.append(subs[tokens[piece]])
> return ''.join(output)
>
> >>> token_replace("fooxxxbazyyyquuux", [("quuux", "foo"), ("foo", "bar"),

("baz", "quux")])
> 'barxxxquuxyyyfoo'
>
> I'm sure someone could whittle that down to a handful of list comps...


I tried, but failed:

def join(chunks, separator):
chunks = iter(chunks)
yield next(chunks)
for chunk in chunks:
yield separator
yield chunk

def token_replace(string, subs):
tokens = {}

current = [string]
for i, (find, replace) in enumerate(subs):
tokens[i] = replace
new = []
for piece in current:
if piece in tokens:
new.append(piece)
else:
new.extend(join(piece.split(find), i))
current = new

return ''.join(tokens.get(piece, piece) for piece in current)

You could replace the inner loop with sum(..., []), but that would be really
ugly.

Peter
 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      01-19-2010
On Mon, 18 Jan 2010 14:43:46 +0100, superpollo <(E-Mail Removed)>
declaimed the following in gmane.comp.python.general:

>
> i guess that the algorithm would be easier if it was known in advance
> that the string to substitute must have some specific property, say:
>
> 1) they all must start with "XYZ"
> 2) they all have the same length N (e.g. 5)
>

That now seems to conflict with your previous sample where old=>new
terms were different lengths.

The original description is one in which I'd probably have done a
series of .split()/.join() operations, using some sort of marker string
that is not valid for the original input to hold the position of
"old"... (repeat for each "old", with unique markers) Then repeating the
..split/.join replacing the markers with the proper "new" strings...

So how do you combine item 1 above with the prior multiple
replacements?

-=-=-=-=-=-=-=-=-

INPUT = "qweXYZ12asdXYZ1345XYZ"
OLD = "XYZ"
NEW = "IWAS"
MINLEN = 5

res = []
oldlen = len(OLD)
tail = MINLEN - oldlen
parts = INPUT.split(OLD)
res.append(parts[0])
for term in parts[1:]:
if len(term) >= tail:
res.append(NEW + term)
else:
res.append(OLD + term)
output = "".join(res)

print "'%s'" % output
-=-=-=-=-=-=-=-=-=-
'qweIWAS12asdIWAS1345XYZ'


--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Should this substitution be compilable? valentin tihomirov VHDL 12 11-30-2004 03:44 PM
Substitution Problem Ashok Perl 1 07-18-2004 09:33 PM
adobe multiline substitution Justin Perl 0 12-08-2003 08:28 PM
Q: string substitution in a file Troll Perl 6 09-26-2003 01:50 PM
Converted to Mozilla but one thing missing - key macro/substitution no-spam Firefox 5 07-29-2003 08:07 PM



Advertisments