Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Good use for itertools.dropwhile and itertools.takewhile

Reply
Thread Tools

Good use for itertools.dropwhile and itertools.takewhile

 
 
Nick Mellor
Guest
Posts: n/a
 
      12-04-2012
Hi,

I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.

Fate of itertools.dropwhile() and itertools.takewhile() - Python
bytes.com
http://bit.ly/Vi2PqP

Almost nobody else of the 18 respondents seemed to be using them.

And then 2 hours later, a use case came along. I think. Anyone have any better solutions?

I have a file full of things like this:

"CAPSICUM RED fresh from Queensland"

Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:

"CAPSICUM RED fresh from QLD"

I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.

I want to split the above into:

("CAPSICUM RED", "fresh from QLD")

Enter dropwhile and takewhile. 6 lines later:

from itertools import takewhile, dropwhile
def split_product_itertools(s):
words = s.split()
allcaps = lambda word: word == word.upper()
product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
return " ".join(product), " ".join(description)


When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:

(9 lines: using for)

def split_product_1(s):
words = s.split()
product = []
for word in words:
if word == word.upper():
product.append(word)
else:
break
return " ".join(product), " ".join(words[len(product):])


(12 lines: using while)

def split_product_2(s):
words = s.split()
i = 0
product = []
while 1:
word = words[i]
if word == word.upper():
product.append(word)
i += 1
else:
break
return " ".join(product), " ".join(words[i:])


Any thoughts?

Nick
 
Reply With Quote
 
 
 
 
Neil Cerutti
Guest
Posts: n/a
 
      12-04-2012
On 2012-12-04, Nick Mellor <(E-Mail Removed)> wrote:
> I have a file full of things like this:
>
> "CAPSICUM RED fresh from Queensland"
>
> Product names (all caps, at start of string) and descriptions
> (mixed case, to end of string) all muddled up in the same
> field. And I need to split them into two fields. Note that if
> the text had said:
>
> "CAPSICUM RED fresh from QLD"
>
> I would want QLD in the description, not shunted forwards and
> put in the product name. So (uncontrived) list comprehensions
> and regex's are out.
>
> I want to split the above into:
>
> ("CAPSICUM RED", "fresh from QLD")
>
> Enter dropwhile and takewhile. 6 lines later:
>
> from itertools import takewhile, dropwhile
> def split_product_itertools(s):
> words = s.split()
> allcaps = lambda word: word == word.upper()
> product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
> return " ".join(product), " ".join(description)
>
> When I tried to refactor this code to use while or for loops, I
> couldn't find any way that felt shorter or more pythonic:


I'm really tempted to import re, and that means takewhile and
dropwhile need to stay.

But seriously, this is a quick implementation of my first thought.

description = s.lstrip(string.ascii_uppercase + ' ')
product = s[:-len(description)-1]

--
Neil Cerutti
 
Reply With Quote
 
 
 
 
Nick Mellor
Guest
Posts: n/a
 
      12-04-2012
Hi Neil,

Nice! But fails if the first word of the description starts with a capital letter.

Nick


On Wednesday, 5 December 2012 01:23:34 UTC+11, Neil Cerutti wrote:
> On 2012-12-04, Nick Mellor <(E-Mail Removed)> wrote:
>
> > I have a file full of things like this:

>
> >

>
> > "CAPSICUM RED fresh from Queensland"

>
> >

>
> > Product names (all caps, at start of string) and descriptions

>
> > (mixed case, to end of string) all muddled up in the same

>
> > field. And I need to split them into two fields. Note that if

>
> > the text had said:

>
> >

>
> > "CAPSICUM RED fresh from QLD"

>
> >

>
> > I would want QLD in the description, not shunted forwards and

>
> > put in the product name. So (uncontrived) list comprehensions

>
> > and regex's are out.

>
> >

>
> > I want to split the above into:

>
> >

>
> > ("CAPSICUM RED", "fresh from QLD")

>
> >

>
> > Enter dropwhile and takewhile. 6 lines later:

>
> >

>
> > from itertools import takewhile, dropwhile

>
> > def split_product_itertools(s):

>
> > words = s.split()

>
> > allcaps = lambda word: word == word.upper()

>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)

>
> > return " ".join(product), " ".join(description)

>
> >

>
> > When I tried to refactor this code to use while or for loops, I

>
> > couldn't find any way that felt shorter or more pythonic:

>
>
>
> I'm really tempted to import re, and that means takewhile and
>
> dropwhile need to stay.
>
>
>
> But seriously, this is a quick implementation of my first thought.
>
>
>
> description = s.lstrip(string.ascii_uppercase + ' ')
>
> product = s[:-len(description)-1]
>
>
>
> --
>
> Neil Cerutti


 
Reply With Quote
 
Neil Cerutti
Guest
Posts: n/a
 
      12-04-2012
On 2012-12-04, Nick Mellor <(E-Mail Removed)> wrote:
> Hi Neil,
>
> Nice! But fails if the first word of the description starts
> with a capital letter.


Darn edge cases.

--
Neil Cerutti
 
Reply With Quote
 
Nick Mellor
Guest
Posts: n/a
 
      12-04-2012
I love the way you guys can write a line of code that does the same as 20 of mine

I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.

Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.

Best,

Nick

On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
> 2012/12/4 Nick Mellor <(E-Mail Removed)>:
>
> > Hi,

>
> >

>
> > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.

>
> >

>
> > Fate of itertools.dropwhile() and itertools.takewhile() - Python

>
> > bytes.com

>
> > http://bit.ly/Vi2PqP

>
> >

>
> > Almost nobody else of the 18 respondents seemed to be using them.

>
> >

>
> > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?

>
> >

>
> > I have a file full of things like this:

>
> >

>
> > "CAPSICUM RED fresh from Queensland"

>
> >

>
> > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:

>
> >

>
> > "CAPSICUM RED fresh from QLD"

>
> >

>
> > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.

>
> >

>
> > I want to split the above into:

>
> >

>
> > ("CAPSICUM RED", "fresh from QLD")

>
> >

>
> > Enter dropwhile and takewhile. 6 lines later:

>
> >

>
> > from itertools import takewhile, dropwhile

>
> > def split_product_itertools(s):

>
> > words = s.split()

>
> > allcaps = lambda word: word == word.upper()

>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)

>
> > return " ".join(product), " ".join(description)

>
> >

>
> >

>
> > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:

>
> >

>
> > (9 lines: using for)

>
> >

>
> > def split_product_1(s):

>
> > words = s.split()

>
> > product = []

>
> > for word in words:

>
> > if word == word.upper():

>
> > product.append(word)

>
> > else:

>
> > break

>
> > return " ".join(product), " ".join(words[len(product):])

>
> >

>
> >

>
> > (12 lines: using while)

>
> >

>
> > def split_product_2(s):

>
> > words = s.split()

>
> > i = 0

>
> > product = []

>
> > while 1:

>
> > word = words[i]

>
> > if word == word.upper():

>
> > product.append(word)

>
> > i += 1

>
> > else:

>
> > break

>
> > return " ".join(product), " ".join(words[i:])

>
> >

>
> >

>
> > Any thoughts?

>
> >

>
> > Nick

>
> > --

>
> > http://mail.python.org/mailman/listinfo/python-list

>
>
>
> Hi,
>
> the regex approach doesn't actually seem to be very complex, given the
>
> mentioned specification, e.g.
>
>
>
> >>> import re

>
> >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")

>
> [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
>
> >>>

>
>
>
> (It might be necessary to account for some punctuation, whitespace etc. too.)
>
>
>
> hth,
>
> vbr


 
Reply With Quote
 
Nick Mellor
Guest
Posts: n/a
 
      12-04-2012
I love the way you guys can write a line of code that does the same as 20 of mine

I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.

Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.

Best,

Nick

On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
> 2012/12/4 Nick Mellor <(E-Mail Removed)>:
>
> > Hi,

>
> >

>
> > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.

>
> >

>
> > Fate of itertools.dropwhile() and itertools.takewhile() - Python

>
> > bytes.com

>
> > http://bit.ly/Vi2PqP

>
> >

>
> > Almost nobody else of the 18 respondents seemed to be using them.

>
> >

>
> > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?

>
> >

>
> > I have a file full of things like this:

>
> >

>
> > "CAPSICUM RED fresh from Queensland"

>
> >

>
> > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:

>
> >

>
> > "CAPSICUM RED fresh from QLD"

>
> >

>
> > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.

>
> >

>
> > I want to split the above into:

>
> >

>
> > ("CAPSICUM RED", "fresh from QLD")

>
> >

>
> > Enter dropwhile and takewhile. 6 lines later:

>
> >

>
> > from itertools import takewhile, dropwhile

>
> > def split_product_itertools(s):

>
> > words = s.split()

>
> > allcaps = lambda word: word == word.upper()

>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)

>
> > return " ".join(product), " ".join(description)

>
> >

>
> >

>
> > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:

>
> >

>
> > (9 lines: using for)

>
> >

>
> > def split_product_1(s):

>
> > words = s.split()

>
> > product = []

>
> > for word in words:

>
> > if word == word.upper():

>
> > product.append(word)

>
> > else:

>
> > break

>
> > return " ".join(product), " ".join(words[len(product):])

>
> >

>
> >

>
> > (12 lines: using while)

>
> >

>
> > def split_product_2(s):

>
> > words = s.split()

>
> > i = 0

>
> > product = []

>
> > while 1:

>
> > word = words[i]

>
> > if word == word.upper():

>
> > product.append(word)

>
> > i += 1

>
> > else:

>
> > break

>
> > return " ".join(product), " ".join(words[i:])

>
> >

>
> >

>
> > Any thoughts?

>
> >

>
> > Nick

>
> > --

>
> > http://mail.python.org/mailman/listinfo/python-list

>
>
>
> Hi,
>
> the regex approach doesn't actually seem to be very complex, given the
>
> mentioned specification, e.g.
>
>
>
> >>> import re

>
> >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")

>
> [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
>
> >>>

>
>
>
> (It might be necessary to account for some punctuation, whitespace etc. too.)
>
>
>
> hth,
>
> vbr


 
Reply With Quote
 
Alexander Blinne
Guest
Posts: n/a
 
      12-04-2012
Another neat solution with a little help from

http://stackoverflow.com/questions/1...s-a-passed-fun

>>> def split_product(p):

.... w = p.split(" ")
.... j = (i for i,v in enumerate(w) if v.upper() != v).next()
.... return " ".join(w[:j]), " ".join(w[j:])

Greetings
 
Reply With Quote
 
Neil Cerutti
Guest
Posts: n/a
 
      12-04-2012
On 2012-12-04, Nick Mellor <(E-Mail Removed)> wrote:
> I love the way you guys can write a line of code that does the
> same as 20 of mine
>
> I can turn up the heat on your regex by feeding it a null
> description or multiple white space (both in the original
> file.) I'm sure you'd adjust, but at the cost of a more complex
> regex.


A re.split should be able to handle this without too much hassle.

The simplicity of my two-line version will evaporate pretty
quickly to compensate for edge cases.

Here's one that can handle one of the edge cases you mention, but
it's hardly any shorter than what you had, and it doesn't
preserve non-standard whites space, like double spaces.

def prod_desc(s):
"""split s into product name and product description. Product
name is a series of one or more capitalized words followed
by white space. Everything after the trailing white space is
the product description.

>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")

['CAR FIFTY TWO', 'Chrysler LeBaron.']
"""
prod = []
desc = []
target = prod
for word in s.split():
if target is prod and not word.isupper():
target = desc
target.append(word)
return [' '.join(prod), ' '.join(desc)]

When str methods fail I'll usually write my own parser before
turning to re. The following is no longer nice looking at all.

def prod_desc(s):
"""split s into product name and product description. Product
name is a series of one or more capitalized words followed
by white space. Everything after the trailing white space is
the product description.

>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")

['CAR FIFTY TWO', 'Chrysler LeBaron.']

>>> prod_desc("MR. JONESEY Saskatchewan's finest")

['MR. JONESEY', "Saskatchewan's finest"]
"""
i = 0
while not s[i].islower():
i += 1
i -= 1
while not s[i].isspace():
i -= 1
start_desc = i+1
while s[i].isspace():
i -= 1
end_prod = i+1
return [s[:end_prod], s[start_desc:]]

--
Neil Cerutti
 
Reply With Quote
 
DJC
Guest
Posts: n/a
 
      12-04-2012
On 04/12/12 17:18, Alexander Blinne wrote:
> Another neat solution with a little help from
>
> http://stackoverflow.com/questions/1...s-a-passed-fun
>
>>>> def split_product(p):

> .... w = p.split(" ")
> .... j = (i for i,v in enumerate(w) if v.upper() != v).next()
> .... return " ".join(w[:j]), " ".join(w[j:])
>

Python 2.7.3 (default, Sep 26 2012, 21:51:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> w1 = "CAPSICUM RED Fresh from Queensland"
>>> w1.split()

['CAPSICUM', 'RED', 'Fresh', 'from', 'Queensland']
>>> w = w1.split()


>>> (i for i,v in enumerate(w) if v.upper() != v)

<generator object <genexpr> at 0x18b1910>
>>> (i for i,v in enumerate(w) if v.upper() != v).next()

2

Python 3.2.3 (default, Oct 19 2012, 19:53:16)

>>> (i for i,v in enumerate(w) if v.upper() != v).next()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'next'

 
Reply With Quote
 
Alexander Blinne
Guest
Posts: n/a
 
      12-04-2012
Am 04.12.2012 19:28, schrieb DJC:
>>>> (i for i,v in enumerate(w) if v.upper() != v).next()

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'generator' object has no attribute 'next'


Yeah, i saw this problem right after i sent the posting. It now is
supposed to read like this

>>> def split_product(p):

.... w = p.split(" ")
.... j = next(i for i,v in enumerate(w) if v.upper() != v)
.... return " ".join(w[:j]), " ".join(w[j:])

Greetings
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Maybe C is the perfect language for really good systems programmers, but unfortunately not-so-good systems and applications programmers are using it and they shouldn’t be. Casey Hawthorne C Programming 18 11-06-2009 05:05 AM
Looking for a Good Sim to Practice on and a good book? Pat MCSA 2 03-29-2008 12:19 AM
Sony A700 - Good and not so good points Alan Browne Digital Photography 6 09-08-2007 06:56 PM
What is a good free anti-virus is good to use? Robert Computer Support 7 02-23-2007 01:14 PM
good algorithms come with practice and reading good code/books? vlsidesign C Programming 26 01-02-2007 09:50 AM



Advertisments