Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > feature request: a better str.endswith

Reply
Thread Tools

feature request: a better str.endswith

 
 
Michele Simionato
Guest
Posts: n/a
 
      07-18-2003
I often feel the need to extend the string method ".endswith" to tuple
arguments, in such a way to automatically check for multiple endings.
For instance, here is a typical use case:

if filename.endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"

Currently this is not valid Python and I must use the ugly

if filename.endswith('.jpg') or filename.endswith('.jpeg') \
or filename.endswith('.gif') or filename.endswith('.png'):
print "This is a valid image file"

Of course a direct implementation is quite easy:

import sys

class Str(str):
def endswith(self,suffix,start=0,end=sys.maxint):#not sure about sys.maxint
endswith=super(Str,self).endswith
if isinstance(suffix,tuple):
return sum([endswith(s,start,end) for s in suffix]) # multi-or
return endswith(suffix,start,end)

if Str(filename).endswith(('.jpg','.jpeg','.gif','.pn g')):
print "This is a valid image file"

nevertheless I think this kind of checking is quite common and it would be
worth to have it in standard Python.

Any reaction, comment ?


Michele
 
Reply With Quote
 
 
 
 
Jp Calderone
Guest
Posts: n/a
 
      07-18-2003
On Fri, Jul 18, 2003 at 05:01:47AM -0700, Michele Simionato wrote:
> I often feel the need to extend the string method ".endswith" to tuple
> arguments, in such a way to automatically check for multiple endings.
> For instance, here is a typical use case:
>
> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
> print "This is a valid image file"
>
> Currently this is not valid Python and I must use the ugly
>
> if filename.endswith('.jpg') or filename.endswith('.jpeg') \
> or filename.endswith('.gif') or filename.endswith('.png'):
> print "This is a valid image file"


extensions = ('.jpg', '.jpeg', '.gif', '.png')
if filter(filename.endswith, extensions):
print "This is a valid image file

Jp

--
"Pascal is Pascal is Pascal is dog meat."
-- M. Devine and P. Larson, Computer Science 340

 
Reply With Quote
 
 
 
 
Thomas =?ISO-8859-15?Q?G=FCttler?=
Guest
Posts: n/a
 
      07-18-2003
Michele Simionato wrote:

> I often feel the need to extend the string method ".endswith" to tuple
> arguments, in such a way to automatically check for multiple endings.
> For instance, here is a typical use case:
>
> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
> print "This is a valid image file"
>
> Currently this is not valid Python and I must use the ugly
>
> if filename.endswith('.jpg') or filename.endswith('.jpeg') \
> or filename.endswith('.gif') or filename.endswith('.png'):
> print "This is a valid image file"
>
> Of course a direct implementation is quite easy:
>
> import sys
>
> class Str(str):
> def endswith(self,suffix,start=0,end=sys.maxint):#not sure about
> sys.maxint
> endswith=super(Str,self).endswith
> if isinstance(suffix,tuple):
> return sum([endswith(s,start,end) for s in suffix]) # multi-or
> return endswith(suffix,start,end)
>
> if Str(filename).endswith(('.jpg','.jpeg','.gif','.pn g')):
> print "This is a valid image file"
>
> nevertheless I think this kind of checking is quite common and it would be
> worth to have it in standard Python.


Hi,

I like this feature request.

if the argument to endswith is not a string,
it should try to treat the argument as a list or tuple.

thomas


 
Reply With Quote
 
Skip Montanaro
Guest
Posts: n/a
 
      07-18-2003

Michele> I often feel the need to extend the string method ".endswith"
Michele> to tuple arguments, in such a way to automatically check for
Michele> multiple endings. For instance, here is a typical use case:

Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
Michele> print "This is a valid image file"

This is analogous to how isinstance works, where its second arg can be a
class or type or a tuple containing classes and types.

I suggest you submit a feature request to SF. A patch to stringobject.c and
unicodeobject.c would help improve chances of acceptance, and for symmetry
you should probably also modify the startswith methods of both types.

Skip


 
Reply With Quote
 
Michele Simionato
Guest
Posts: n/a
 
      07-19-2003
Irmen de Jong <(E-Mail Removed)> wrote in message news:<3f17f883$0$49107$(E-Mail Removed)4all.nl>...
> Jp Calderone wrote:
> > On Fri, Jul 18, 2003 at 05:01:47AM -0700, Michele Simionato wrote:
> >
> >>I often feel the need to extend the string method ".endswith" to tuple
> >>arguments, in such a way to automatically check for multiple endings.
> >>For instance, here is a typical use case:
> >>
> >>if filename.endswith(('.jpg','.jpeg','.gif','.png')):
> >> print "This is a valid image file"
> >>
> >>Currently this is not valid Python and I must use the ugly
> >>
> >>if filename.endswith('.jpg') or filename.endswith('.jpeg') \
> >> or filename.endswith('.gif') or filename.endswith('.png'):
> >> print "This is a valid image file"

> >
> >
> > extensions = ('.jpg', '.jpeg', '.gif', '.png')
> > if filter(filename.endswith, extensions):
> > print "This is a valid image file
> >
> > Jp
> >

>
> Using filter Michele's original statement becomes:
>
> if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
> print "This is a valid image file"
>
> IMHO this is simple enough to not require a change to the
> .endswith method...
>
> --Irmen


I haven't thought of "filter". It is true, it works, but is it really
readable? I had to think to understand what it is doing.
My (implicit) rationale for

filename.endswith(('.jpg','.jpeg','.gif','.png'))

was that it works exactly as "isinstance", so it is quite
obvious what it is doing. I am asking just for a convenience,
which has already a precedent in the language and respects
the Principle of Least Surprise.

Michele
 
Reply With Quote
 
Michele Simionato
Guest
Posts: n/a
 
      07-19-2003
Skip Montanaro <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Michele> I often feel the need to extend the string method ".endswith"
> Michele> to tuple arguments, in such a way to automatically check for
> Michele> multiple endings. For instance, here is a typical use case:
>
> Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
> Michele> print "This is a valid image file"
>
> This is analogous to how isinstance works, where its second arg can be a
> class or type or a tuple containing classes and types.
>
> I suggest you submit a feature request to SF. A patch to stringobject.c and
> unicodeobject.c would help improve chances of acceptance, and for symmetry
> you should probably also modify the startswith methods of both types.
>
> Skip


Too bad my skills with C are essentially unexistent


Michele
 
Reply With Quote
 
Skip Montanaro
Guest
Posts: n/a
 
      07-19-2003
>> I suggest you submit a feature request to SF. A patch to
>> stringobject.c and unicodeobject.c would help improve chances of
>> acceptance, and for symmetry you should probably also modify the
>> startswith methods of both types.


Michele> Too bad my skills with C are essentially unexistent

Look at it as an opportunity to enhance those skills. You have plenty of
time until 2.4.

In any case, even if you can't whip up the actual C code, a complete feature
request on SF would keep it from being entirely forgotten.

Skip


 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      07-19-2003
[Michele Simionato]
> > >>I often feel the need to extend the string method ".endswith" to tuple
> > >>arguments, in such a way to automatically check for multiple endings.
> > >>For instance, here is a typical use case:
> > >>
> > >>if filename.endswith(('.jpg','.jpeg','.gif','.png')):
> > >> print "This is a valid image file"


[Jp]
> > > extensions = ('.jpg', '.jpeg', '.gif', '.png')
> > > if filter(filename.endswith, extensions):
> > > print "This is a valid image file
> > >
> > > Jp



[Irmen]
> > Using filter Michele's original statement becomes:
> >
> > if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
> > print "This is a valid image file"
> >
> > IMHO this is simple enough to not require a change to the
> > .endswith method...


[Michele]
> I haven't thought of "filter". It is true, it works, but is it really
> readable? I had to think to understand what it is doing.
> My (implicit) rationale for
>
> filename.endswith(('.jpg','.jpeg','.gif','.png'))
>
> was that it works exactly as "isinstance", so it is quite
> obvious what it is doing. I am asking just for a convenience,
> which has already a precedent in the language and respects
> the Principle of Least Surprise.


I prefer that this feature not be added. Convenience functions
like this one rarely pay for themselves because:

-- The use case is not that common (afterall, endswith() isn't even
used that often).

-- It complicates the heck out of the C code

-- Checking for optional arguments results in a slight slowdown
for the normal case.

-- It is easy to implement a readable version in only two or three
lines of pure python.

-- It is harder to read because it requires background knowledge
of how endswith() handles a tuple (quick, does it take any
iterable or just a tuple, how about a subclass of tuple; is it
like min() and max() in that it *args works just as well as
argtuple; which python version implemented it, etc).

-- It is a pain to keep the language consistent. Change endswith()
and you should change startswith(). Change the string object and
you should also change the unicode object and UserString and
perhaps mmap. Update the docs for each and add test cases for
each (including weird cases with zero-length tuples and such).

-- The use case above encroaches on scanning patterns that are
already efficiently implemented by the re module.

-- Worst of all, it increases the sum total of python language to be
learned without providing much in return.

-- In general, the language can be kept more compact, efficient, and
maintainable by not trying to vectorize everything (the recent addition
of the __builtin__.sum() is a rare exception that is worth it). It is
better to use a general purpose vectorizing function (like map, filter,
or reduce). This particular case is best implemented in terms of the
some() predicate documented in the examples for the new itertools module
(though any() might have been a better name for it):

some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))

The implementation of some() is better than the filter version because
it provides an "early-out" upon the first successful hit.


Raymond Hettinger














 
Reply With Quote
 
Michele Simionato
Guest
Posts: n/a
 
      07-20-2003
"Raymond Hettinger" <(E-Mail Removed)> wrote in message news:<NpkSa.16049$(E-Mail Removed)>..
> I prefer that this feature not be added. Convenience functions
> like this one rarely pay for themselves because:
>
> -- The use case is not that common (afterall, endswith() isn't even
> used that often).


This is arguable.

> -- It complicates the heck out of the C code


Really? Of course, you are the expert. I would do it in analogy to
"isinstance" and internally calling "ifilter" as you suggest.

> -- Checking for optional arguments results in a slight slowdown
> for the normal case.


Perhaps slight enough to be negligible? Of course without
implementation
we cannot say, but I would be surprised to have a sensible slowdown.

> -- It is easy to implement a readable version in only two or three
> lines of pure python.


Yes, but not immediately obvious. See later.

> -- It is harder to read because it requires background knowledge
> of how endswith() handles a tuple (quick, does it take any
> iterable or just a tuple, how about a subclass of tuple; is it
> like min() and max() in that it *args works just as well as
> argtuple; which python version implemented it, etc).


I have used "isinstance" and never wondered about these
technicalities, so
I guess the average user should not be more concerned with .endswith.

> -- It is a pain to keep the language consistent. Change endswith()
> and you should change startswith(). Change the string object and
> you should also change the unicode object and UserString and
> perhaps mmap. Update the docs for each and add test cases for
> each (including weird cases with zero-length tuples and such).


This is true for any modification of the language. One has to balance
costs and benefits. The balance is still largely subjective.

> -- The use case above encroaches on scanning patterns that are
> already efficiently implemented by the re module.


I think the general rule is to avoid regular expressions when
possible.

> -- Worst of all, it increases the sum total of python language to be
> learned without providing much in return.


That it is exactly what I am arguing *against*: there is no additional
learning
effort needed, since a similar feature is already present in
"isinstance"
and an user could be even surprised that it is not implemented in
..endswith.

> -- In general, the language can be kept more compact, efficient, and
> maintainable by not trying to vectorize everything (the recent addition
> of the __builtin__.sum() is a rare exception that is worth it). It is
> better to use a general purpose vectorizing function (like map, filter,
> or reduce). This particular case is best implemented in terms of the
> some() predicate documented in the examples for the new itertools module
> (though any() might have been a better name for it):
>
> some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))


Uhm... don't like "some", nor "any"; what about "the"?

import itertools
the=lambda pred,seq: list(itertools.ifilter(pred,seq))
for filename in os.listdir('.'):
if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
print "This is a valid image"

That's readable enough for me, still not completely obvious. The first
time,
I got it wrong by defining "the=itertools.ifilter". I had the idea
that "ifilter" was acting just as "filter", which of course is not the
case
in this example.

> The implementation of some() is better than the filter version because
> it provides an "early-out" upon the first successful hit.


No point against that.
>
> Raymond Hettinger


Michele Simionato

P.S. I am not going to pursue this further, since I like quite a lot

if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()

Instead, I will suggest this example to be added to the itertools
documentation
I could also submit it as a cookbook recipe, since I think it is
a quite useful trick.
Also, it is good to make people aware of itertool goodies
(myself I have learned something in this thread).
 
Reply With Quote
 
Hartmut Goebel
Guest
Posts: n/a
 
      07-21-2003
Skip Montanaro schrieb:

> I suggest you submit a feature request to SF.


+1 from me

This is a commonly used case. Using things like stripext() is only a
solution for this specific case where filename-extensions are matched.

Michele: I suggesz menatoning this in the feature-request or simple use
a different example (not based on filename extension.)

Regards
Hartmut Goebel
--
| Hartmut Goebel | IT-Security -- effizient |
| http://www.velocityreviews.com/forums/(E-Mail Removed) | www.goebel-consult.de |

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Feature Article in Better Software Magazine loulou384@gmail.com Java 0 07-06-2006 08:42 PM
Feature Article in Better Software Magazine loulou384@gmail.com Java 0 06-05-2006 06:14 PM
What Lies Beneath-Feature Article in Better Software Magazine loulou384@gmail.com Java 0 05-01-2006 06:23 PM
Refactoring-Feature Article in Better Software Magazine loulou384@gmail.com C Programming 4 04-04-2006 06:29 AM
Build a Better Blair (like Build a Better Bush, only better) Kenny Computer Support 0 05-06-2005 04:50 AM



Advertisments