![]() |
|
|
|||||||
![]() |
Python - Re: How to write replace string for object which will be substituted?[regexp] |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
MRAB wrote:
> ryniek90 wrote: >> Hi. >> I started learning regexp, and some things goes well, but most of >> them still not. >> >> I've got problem with some regexp. Better post code here: >> >> " >> >>> import re >> >>> mail = '\\nname1 [at] mail [dot] com\nname2 [$at$] >> mail [$dot$] com\n' >> '\\nname1 [at] mail [dot] com\nname2 [$at$] mail >> [$dot$] com\n' >> >>> print mail >> >> >> name1 [at] mail [dot] com >> name2 [$at$] mail [$dot$] com >> >> >>> maail = re.sub('^\n|$\n', '', mail) >> >>> print maail >> >> name1 [at] mail [dot] com >> name2 [$at$] mail [$dot$] com >> >>> maail = re.sub(' ', '', maail) >> >>> print maail >> >> name1[at]mail[dot]com >> name2[$at$]mail[$dot$]com >> >>> maail = re.sub('\[at\]|\[\$at\$\]', '@', maail) >> >>> print maail >> >> name1@mail[dot]com >> name2@mail[$dot$]com >> >>> maail = re.sub('\[dot\]|\[\$dot\$\]', '.', maail) >> >>> print maail >> >> >> >> >>> #How must i write the replace string to replace all this >> regexp's with just ONE command, in string 'mail' ? >> >>> maail = re.sub('^\n|$\n| >> |\[at\]|\[\$at\$\]|\[dot\]|\[\$dot\$\]', *?*, mail) >> " >> >> How must i write that replace pattern (look at question mark), to >> maek that substituion work? I didn't saw anything helpful while >> reading Re doc and HowTo (from Python Doc). I tried with >> 'MatchObject.group()' but something gone wrong - didn't wrote it right. >> Is there more user friendly HowTo for Python Re, than this? >> >> I'm new to programming an regexp, sorry for inconvenience. >> > I don't think you can do it in one regex, nor would I want to. Just use > the string's replace() method. > > >>> mail = '\\nname1 [at] mail [dot] com\nname2 [$at$] > mail [$dot$] com\n' > '\\nname1 [at] mail [dot] com\nname2 [$at$] mail [$dot$] > com\n' > >>> print mail > > > name1 [at] mail [dot] com > name2 [$at$] mail [$dot$] com > > >>> maail = mail.strip() > > name1 [at] mail [dot] com > name2 [$at$] mail [$dot$] com > > >>> maail = maail.replace(' ', '') > >>> print maail > > name1[at]mail[dot]com > name2[$at$]mail[$dot$]com > >>> maail = maail.replace('[at]', '@').replace('[$at$]', '@') > >>> print maail > > name1@mail[dot]com > name2@mail[$dot$]com > >>> maail = maail.replace('[dot]', '.').replace('[$dot$]', '.') > >>> print maail > > > This is a good learning exercise demonstrating the impracticality of regular expressions in a given situation. In the light of the fascination regular expressions seem to exert in general, one might conclude that knowing regular expressions in essence is knowing when not to use them. There is nothing wrong with cascading substitutions through multiple expressions. The OP's solution wrapped up in a function and streamlined for needless regex overkill might look something like this: def translate (s): s1 = s.strip () # Instead of: s1 = re.sub ('^\n|$\n', '', s) s2 = s1.replace (' ', '') # Instead of: s2 = re.sub (' ', '', s1) s3 = re.sub ('\[at\]|\[\$at\$\]', '@', s2) s4 = re.sub ('\[dot\]|\[\$dot\$\]', '.', s3) return s4 print translate (mail) # Tested MRAB's solution using replace () avoids needless regex complexity, but doesn't simplify tedious coding if the number of substitutions is significant. Some time ago I proposed a little module I made to alleviate the tedium. It would handle this case like this: import SE Translator = SE.SE ( ' (32)= [at]=@ [$at$]=@ [dot]=. [$dot$]=. ' ) print Translator (mail.strip ()) # Tested So SE.SE compiles a string composed of any number of substitution definitions into an object that translates anything given it. In a running speed contest it would surely come in last, although in most cases the disadvantage would be imperceptible. Another matter is coding speed. Here the advantage is obvious, even with a set of substitutions as small as this one, let alone with sets in the tens or even hundreds. One inconspicuous but significant feature of SE is that it handles precedence correctly if targets overlap (upstream over downstream and long over short). As far as I know there's nothing in the Python system handling substitution precedence. It always needs to be hand-coded from one case to the next and that isn't exactly trivial. SE can be downloaded from http://pypi.python.org/pypi/SE/2.3. Frederic Anthra Norell |
|
|
|
|
#2 |
|
Posts: n/a
|
On 5 Sie, 13:28, Anthra Norell <anthra.nor...@bluewin.ch> wrote:
> MRAB wrote: > > ryniek90 wrote: > >> Hi. > >> I started learning regexp, and some things goes well, but most of > >> them still not. > > >> I've got problem with some regexp. Better post code here: > > >> " > >> *>>> import re > >> *>>> mail = '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] > >> mail [$dot$] com\n' > >> '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail > >> [$dot$] com\n' > >> *>>> print mail > > >> n...@mail.com > >> name1 [at] mail [dot] com > >> name2 [$at$] mail [$dot$] com > > >> *>>> maail = re.sub('^\n|$\n', '', mail) > >> *>>> print maail > >> n...@mail.com > >> name1 [at] mail [dot] com > >> name2 [$at$] mail [$dot$] com > >> *>>> maail = re.sub(' ', '', maail) > >> *>>> print maail > >> n...@mail.com > >> name1[at]mail[dot]com > >> name2[$at$]mail[$dot$]com > >> *>>> maail = re.sub('\[at\]|\[\$at\$\]', '@', maail) > >> *>>> print maail > >> n...@mail.com > >> name1@mail[dot]com > >> name2@mail[$dot$]com > >> *>>> maail = re.sub('\[dot\]|\[\$dot\$\]', '.', maail) > >> *>>> print maail > >> n...@mail.com > >> na...@mail.com > >> na...@mail.com > >> *>>> #How must i write the replace string to replace all this > >> regexp's with just ONE command, in string 'mail' ? > >> *>>> maail = re.sub('^\n|$\n| > >> |\[at\]|\[\$at\$\]|\[dot\]|\[\$dot\$\]', *?*, mail) > >> " > > >> How must i write that replace pattern (look at question mark), to > >> maek that substituion work? I didn't saw anything helpful while > >> reading Re doc and HowTo (from Python Doc). I tried with > >> 'MatchObject.group()' but something gone wrong - didn't wrote it right.. > >> Is there more user friendly HowTo for Python Re, than this? > > >> I'm new to programming an regexp, sorry for inconvenience. > > > I don't think you can do it in one regex, nor would I want to. Just use > > the string's replace() method. > > > >>> mail = '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] > > mail [$dot$] com\n' > > '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail [$dot$] > > com\n' > > >>> print mail > > > n...@mail.com > > name1 [at] mail [dot] com > > name2 [$at$] mail [$dot$] com > > > >>> maail = mail.strip() > > n...@mail.com > > name1 [at] mail [dot] com > > name2 [$at$] mail [$dot$] com > > > >>> maail = maail.replace(' ', '') > > >>> print maail > > n...@mail.com > > name1[at]mail[dot]com > > name2[$at$]mail[$dot$]com > > >>> maail = maail.replace('[at]', '@').replace('[$at$]', '@') > > >>> print maail > > n...@mail.com > > name1@mail[dot]com > > name2@mail[$dot$]com > > >>> maail = maail.replace('[dot]', '.').replace('[$dot$]', '.') > > >>> print maail > > n...@mail.com > > na...@mail.com > > na...@mail.com > > This is a good learning exercise demonstrating the impracticality of > regular expressions in a given situation. In the light of the > fascination regular expressions seem to exert in general, one might > conclude that knowing regular expressions in essence is knowing when not > to use them. > > There is nothing wrong with cascading substitutions through multiple > expressions. The OP's solution wrapped up in a function and streamlined > for needless regex overkill might look something like this: > > def translate (s): > * *s1 = s.strip () * * # Instead of: s1 = re.sub ('^\n|$\n', '', s) > * *s2 = s1.replace (' ', '') * *# Instead of: s2 = re.sub (' ', '', s1) > * *s3 = re.sub ('\[at\]|\[\$at\$\]', '@', s2) > * *s4 = re.sub ('\[dot\]|\[\$dot\$\]', '.', s3) > * *return s4 > > print translate (mail) * # Tested > > MRAB's solution using replace () avoids needless regex complexity, but > doesn't simplify tedious coding if the number of substitutions is > significant. Some time ago I proposed a little module I made to > alleviate the tedium. It would handle this case like this: > > import SE > Translator = SE.SE ( ' (32)= [at]=@ [$at$]=@ [dot]=. [$dot$]=.. ' ) > print Translator (mail.strip ()) * # Tested > > So SE.SE compiles a string composed of any number of substitution > definitions into an object that translates anything given it. In a > running speed contest it would surely come in last, although in most > cases the disadvantage would be imperceptible. Another matter is coding > speed. Here the advantage is obvious, even with a set of substitutions > as small as this one, let alone with sets in the tens or even hundreds. > One inconspicuous but significant feature of SE is that it handles > precedence correctly if targets overlap (upstream over downstream and > long over short). As far as I know there's nothing in the Python system > handling substitution precedence. It always needs to be hand-coded from > one case to the next and that isn't exactly trivial. > > SE can be downloaded fromhttp://pypi.python.org/pypi/SE/2.3. > > Frederic Thanks again. I saw that MRAB is actively developing new implementation of re module. MRAB: You think it'd be good idea adding to Your project some best features of SE module? I didn't seen yet features of Your re module but will try to find time even today, to see what's going on. Greets ryniek |
|
|
|
#3 |
|
Posts: n/a
|
On 5 Sie, 13:28, Anthra Norell <anthra.nor...@bluewin.ch> wrote:
> MRAB wrote: > > ryniek90 wrote: > >> Hi. > >> I started learning regexp, and some things goes well, but most of > >> them still not. > > >> I've got problem with some regexp. Better post code here: > > >> " > >> *>>> import re > >> *>>> mail = '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] > >> mail [$dot$] com\n' > >> '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail > >> [$dot$] com\n' > >> *>>> print mail > > >> n...@mail.com > >> name1 [at] mail [dot] com > >> name2 [$at$] mail [$dot$] com > > >> *>>> maail = re.sub('^\n|$\n', '', mail) > >> *>>> print maail > >> n...@mail.com > >> name1 [at] mail [dot] com > >> name2 [$at$] mail [$dot$] com > >> *>>> maail = re.sub(' ', '', maail) > >> *>>> print maail > >> n...@mail.com > >> name1[at]mail[dot]com > >> name2[$at$]mail[$dot$]com > >> *>>> maail = re.sub('\[at\]|\[\$at\$\]', '@', maail) > >> *>>> print maail > >> n...@mail.com > >> name1@mail[dot]com > >> name2@mail[$dot$]com > >> *>>> maail = re.sub('\[dot\]|\[\$dot\$\]', '.', maail) > >> *>>> print maail > >> n...@mail.com > >> na...@mail.com > >> na...@mail.com > >> *>>> #How must i write the replace string to replace all this > >> regexp's with just ONE command, in string 'mail' ? > >> *>>> maail = re.sub('^\n|$\n| > >> |\[at\]|\[\$at\$\]|\[dot\]|\[\$dot\$\]', *?*, mail) > >> " > > >> How must i write that replace pattern (look at question mark), to > >> maek that substituion work? I didn't saw anything helpful while > >> reading Re doc and HowTo (from Python Doc). I tried with > >> 'MatchObject.group()' but something gone wrong - didn't wrote it right.. > >> Is there more user friendly HowTo for Python Re, than this? > > >> I'm new to programming an regexp, sorry for inconvenience. > > > I don't think you can do it in one regex, nor would I want to. Just use > > the string's replace() method. > > > >>> mail = '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] > > mail [$dot$] com\n' > > '\nn...@mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail [$dot$] > > com\n' > > >>> print mail > > > n...@mail.com > > name1 [at] mail [dot] com > > name2 [$at$] mail [$dot$] com > > > >>> maail = mail.strip() > > n...@mail.com > > name1 [at] mail [dot] com > > name2 [$at$] mail [$dot$] com > > > >>> maail = maail.replace(' ', '') > > >>> print maail > > n...@mail.com > > name1[at]mail[dot]com > > name2[$at$]mail[$dot$]com > > >>> maail = maail.replace('[at]', '@').replace('[$at$]', '@') > > >>> print maail > > n...@mail.com > > name1@mail[dot]com > > name2@mail[$dot$]com > > >>> maail = maail.replace('[dot]', '.').replace('[$dot$]', '.') > > >>> print maail > > n...@mail.com > > na...@mail.com > > na...@mail.com > > This is a good learning exercise demonstrating the impracticality of > regular expressions in a given situation. In the light of the > fascination regular expressions seem to exert in general, one might > conclude that knowing regular expressions in essence is knowing when not > to use them. > > There is nothing wrong with cascading substitutions through multiple > expressions. The OP's solution wrapped up in a function and streamlined > for needless regex overkill might look something like this: > > def translate (s): > * *s1 = s.strip () * * # Instead of: s1 = re.sub ('^\n|$\n', '', s) > * *s2 = s1.replace (' ', '') * *# Instead of: s2 = re.sub (' ', '', s1) > * *s3 = re.sub ('\[at\]|\[\$at\$\]', '@', s2) > * *s4 = re.sub ('\[dot\]|\[\$dot\$\]', '.', s3) > * *return s4 > > print translate (mail) * # Tested > > MRAB's solution using replace () avoids needless regex complexity, but > doesn't simplify tedious coding if the number of substitutions is > significant. Some time ago I proposed a little module I made to > alleviate the tedium. It would handle this case like this: > > import SE > Translator = SE.SE ( ' (32)= [at]=@ [$at$]=@ [dot]=. [$dot$]=.. ' ) > print Translator (mail.strip ()) * # Tested > > So SE.SE compiles a string composed of any number of substitution > definitions into an object that translates anything given it. In a > running speed contest it would surely come in last, although in most > cases the disadvantage would be imperceptible. Another matter is coding > speed. Here the advantage is obvious, even with a set of substitutions > as small as this one, let alone with sets in the tens or even hundreds. > One inconspicuous but significant feature of SE is that it handles > precedence correctly if targets overlap (upstream over downstream and > long over short). As far as I know there's nothing in the Python system > handling substitution precedence. It always needs to be hand-coded from > one case to the next and that isn't exactly trivial. > > SE can be downloaded fromhttp://pypi.python.org/pypi/SE/2.3. > > Frederic Thanks again. I saw that MRAB is actively developing new implementation of re module. MRAB: You think it'd be good idea adding to Your project some best features of SE module? I didn't seen yet features of Your re module but will try to find time even today, to see what's going on. Greets ryniek |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Give you enough string functions in Java web reporting tool | freezea | Software | 0 | 10-08-2009 09:03 AM |
| Java String Problems | rbnbenjamin | General Help Related Topics | 0 | 02-03-2009 11:02 PM |
| ASP.NET: Asign Users in Roles(Array.IndexOf(Of String) method) | msandlana | Software | 0 | 04-25-2008 06:37 AM |
| Hidden linebreaks in string? VB.NET | Jiggy | Software | 0 | 04-23-2008 02:18 PM |