Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > string stripping issues

Reply
Thread Tools

string stripping issues

 
 
orangeDinosaur
Guest
Posts: n/a
 
      03-02-2006
Hello,

I am encountering a behavior I can think of reason for. Sometimes,
when I use the .strip module for strings, it takes away more than what
I've specified. For example:

>>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'


>>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')


returns:

'ughes. John</FONT></TD>\r\n'

However, if I take another string, for example:

>>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'


>>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')


returns:

'Kim, Dong-Hyun</FONT></TD>\r\n'

I don't understand why in one case it eats up the 'H' but in the next
case it leaves the 'K' alone.

 
Reply With Quote
 
 
 
 
Ben Cartwright
Guest
Posts: n/a
 
      03-02-2006
orangeDinosaur wrote:
> I am encountering a behavior I can think of reason for. Sometimes,
> when I use the .strip module for strings, it takes away more than what
> I've specified. For example:
>
> >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

>
> >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')

>
> returns:
>
> 'ughes. John</FONT></TD>\r\n'
>
> However, if I take another string, for example:
>
> >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

>
> >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')

>
> returns:
>
> 'Kim, Dong-Hyun</FONT></TD>\r\n'
>
> I don't understand why in one case it eats up the 'H' but in the next
> case it leaves the 'K' alone.



That method... I do not think it means what you think it means. The
argument to str.strip is a *set* of characters, e.g.:

>>> foo = 'abababaXabbaXabababbbb'
>>> foo.strip('ab')

'XabbaX'
>>> foo.strip('aabababaab') # no difference!

'XabbaX'

For more info, see the string method docs:
http://docs.python.org/lib/string-methods.html
To do what you're trying to do, try this:

>>> prefix = 'hello '
>>> bar = 'hello world!'
>>> if bar.startswith(prefix): bar = bar[:len(prefix)]

...
>>> bar

'world!'

--Ben

 
Reply With Quote
 
 
 
 
=?iso-8859-1?B?aWFuYXLp?=
Guest
Posts: n/a
 
      03-02-2006
from the python manual:

strip( [chars])
The chars argument is not a prefix or suffix; rather, all combinations
of its values are stripped:
>>> 'www.example.com'.strip('cmowz.')

'example'

in your case since the letter 'H' is in your [chars] and the name
starts with an H it gets stripped, but with the second one the first
letter is a K so it stops there.
Maybe you can use:

>>> a[31:]

'Hughes. John</FONT></TD>\r\n'
>>> b[31:]

'Kim, Dong-Hyun</FONT></TD>\r\n'

but maybe what you REALLY want is:

>>> a[31:-14]

'Hughes. John'
>>> b[31:-14]

'Kim, Dong-Hyun'

 
Reply With Quote
 
Ben Cartwright
Guest
Posts: n/a
 
      03-02-2006
Ben Cartwright wrote:
> orangeDinosaur wrote:
> > I am encountering a behavior I can think of reason for. Sometimes,
> > when I use the .strip module for strings, it takes away more than what
> > I've specified. For example:
> >
> > >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

> >
> > >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')

> >
> > returns:
> >
> > 'ughes. John</FONT></TD>\r\n'
> >
> > However, if I take another string, for example:
> >
> > >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

> >
> > >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')

> >
> > returns:
> >
> > 'Kim, Dong-Hyun</FONT></TD>\r\n'
> >
> > I don't understand why in one case it eats up the 'H' but in the next
> > case it leaves the 'K' alone.

>
>
> That method... I do not think it means what you think it means. The
> argument to str.strip is a *set* of characters, e.g.:
>
> >>> foo = 'abababaXabbaXabababbbb'
> >>> foo.strip('ab')

> 'XabbaX'
> >>> foo.strip('aabababaab') # no difference!

> 'XabbaX'
>
> For more info, see the string method docs:
> http://docs.python.org/lib/string-methods.html
> To do what you're trying to do, try this:
>
> >>> prefix = 'hello '
> >>> bar = 'hello world!'
> >>> if bar.startswith(prefix): bar = bar[:len(prefix)]

> ...
> >>> bar

> 'world!'



Apologies, that should be:
>>> prefix = 'hello '
>>> bar = 'hello world!'
>>> if bar.startswith(prefix): bar = bar[len(prefix):]

...
>>> bar

'world!'

--Ben

 
Reply With Quote
 
orangeDinosaur
Guest
Posts: n/a
 
      03-02-2006
thanks!

 
Reply With Quote
 
P Boy
Guest
Posts: n/a
 
      03-03-2006
This seems like a web page parsing question. Another approach can be as
follows if you know the limiting token strings:

a.split(' <TD WIDTH=175><FONT
SIZE=2>')[1].split('</FONT></TD>\r\n')[0]

 
Reply With Quote
 
Iain King
Guest
Posts: n/a
 
      03-03-2006

Ben Cartwright wrote:
> Ben Cartwright wrote:
> > orangeDinosaur wrote:
> > > I am encountering a behavior I can think of reason for. Sometimes,
> > > when I use the .strip module for strings, it takes away more than what
> > > I've specified. For example:
> > >
> > > >>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'
> > >
> > > >>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')
> > >
> > > returns:
> > >
> > > 'ughes. John</FONT></TD>\r\n'
> > >
> > > However, if I take another string, for example:
> > >
> > > >>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'
> > >
> > > >>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')
> > >
> > > returns:
> > >
> > > 'Kim, Dong-Hyun</FONT></TD>\r\n'
> > >
> > > I don't understand why in one case it eats up the 'H' but in the next
> > > case it leaves the 'K' alone.

> >
> >
> > That method... I do not think it means what you think it means. The
> > argument to str.strip is a *set* of characters, e.g.:
> >
> > >>> foo = 'abababaXabbaXabababbbb'
> > >>> foo.strip('ab')

> > 'XabbaX'
> > >>> foo.strip('aabababaab') # no difference!

> > 'XabbaX'
> >
> > For more info, see the string method docs:
> > http://docs.python.org/lib/string-methods.html
> > To do what you're trying to do, try this:
> >
> > >>> prefix = 'hello '
> > >>> bar = 'hello world!'
> > >>> if bar.startswith(prefix): bar = bar[:len(prefix)]

> > ...
> > >>> bar

> > 'world!'

>
>
> Apologies, that should be:
> >>> prefix = 'hello '
> >>> bar = 'hello world!'
> >>> if bar.startswith(prefix): bar = bar[len(prefix):]

> ...
> >>> bar

> 'world!'
>


or instead of:

a.strip(' <TD WIDTH=175><FONT SIZE=2>')

use:

a.replace(' <TD WIDTH=175><FONT SIZE=2>','')

Iain

 
Reply With Quote
 
Larry Bates
Guest
Posts: n/a
 
      03-03-2006
orangeDinosaur wrote:
> Hello,
>
> I am encountering a behavior I can think of reason for. Sometimes,
> when I use the .strip module for strings, it takes away more than what
> I've specified. For example:
>
>>>> a = ' <TD WIDTH=175><FONT SIZE=2>Hughes. John</FONT></TD>\r\n'

>
>>>> a.strip(' <TD WIDTH=175><FONT SIZE=2>')

>
> returns:
>
> 'ughes. John</FONT></TD>\r\n'
>
> However, if I take another string, for example:
>
>>>> b = ' <TD WIDTH=175><FONT SIZE=2>Kim, Dong-Hyun</FONT></TD>\r\n'

>
>>>> b.strip(' <TD WIDTH=175><FONT SIZE=2>')

>
> returns:
>
> 'Kim, Dong-Hyun</FONT></TD>\r\n'
>
> I don't understand why in one case it eats up the 'H' but in the next
> case it leaves the 'K' alone.
>

Others have explained the exact problem, I'll make a suggestion.
Take a few minutes to look at BeautifulSoup. It parses HTML code
and allows for extractions of data from strings like this in a
very easy to use way. If this is a one-off thing, don't bother.
If you do this commonly, BeautifulSoup is worth a little study.

-Larry Bates
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
stripping unwanted chars from string Edward Elliott Python 7 05-04-2006 03:05 PM
string.lstrip stripping too much? joram gemma Python 4 05-16-2005 07:23 AM
stripping non-numeric data from a string Raj C Programming 7 05-11-2005 01:39 AM
Re: stripping a string Jeff Epler Python 4 09-16-2003 12:09 PM
stripping a string Leeds, Mark Python 1 09-16-2003 09:56 AM



Advertisments