Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   A gnarly little python loop (http://www.velocityreviews.com/forums/t954405-a-gnarly-little-python-loop.html)

Roy Smith 11-10-2012 10:58 PM

A gnarly little python loop
 
I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call:

data = api.GetSearch(term="foo", page=page)

The way it works, you start with page=1. It returns a list of tweets.
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:

page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1

It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?

Ian Kelly 11-10-2012 11:17 PM

Re: A gnarly little python loop
 
On Sat, Nov 10, 2012 at 3:58 PM, Roy Smith <roy@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. The
> particular one I'm using (python-twitter), has a call:
>
> data = api.GetSearch(term="foo", page=page)
>
> The way it works, you start with page=1. It returns a list of tweets.
> If the list is empty, there are no more tweets. If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> page = 1
> while 1:
> r = api.GetSearch(term="foo", page=page)
> if not r:
> break
> for tweet in r:
> process(tweet)
> page += 1
>
> It works, but it seems excessively fidgety. Is there some cleaner way
> to refactor this?


I'd do something like this:

def get_tweets(term):
for page in itertools.count(1):
r = api.GetSearch(term, page)
if not r:
break
for tweet in r:
yield tweet

for tweet in get_tweets("foo"):
process(tweet)

Steven D'Aprano 11-11-2012 12:23 AM

Re: A gnarly little python loop
 
On Sat, 10 Nov 2012 17:58:14 -0500, Roy Smith wrote:

> The way it works, you start with page=1. It returns a list of tweets.
> If the list is empty, there are no more tweets. If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> page = 1
> while 1:
> r = api.GetSearch(term="foo", page=page)
> if not r:
> break
> for tweet in r:
> process(tweet)
> page += 1
>
> It works, but it seems excessively fidgety. Is there some cleaner way
> to refactor this?



Seems clean enough to me. It does exactly what you need: loop until there
are no more tweets, process each tweet.

If you're allergic to nested loops, move the inner for-loop into a
function. Also you could get rid of the "if r: break".

page = 1
r = ["placeholder"]
while r:
r = api.GetSearch(term="foo", page=page)
process_all(tweets) # does nothing if r is empty
page += 1


Another way would be to use a for list for the outer loop.

for page in xrange(1, sys.maxint):
r = api.GetSearch(term="foo", page=page)
if not r: break
process_all(r)



--
Steven

Steve Howell 11-11-2012 03:03 AM

Re: A gnarly little python loop
 
On Nov 10, 2:58*pm, Roy Smith <r...@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. *The
> particular one I'm using (python-twitter), has a call:
>
> data = api.GetSearch(term="foo", page=page)
>
> The way it works, you start with page=1. *It returns a list of tweets..
> If the list is empty, there are no more tweets. *If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> * * page = 1
> * * while 1:
> * * * * r = api.GetSearch(term="foo", page=page)
> * * * * if not r:
> * * * * * * break
> * * * * for tweet in r:
> * * * * * * process(tweet)
> * * * * page += 1
>
> It works, but it seems excessively fidgety. *Is there some cleaner way
> to refactor this?


I think your code is perfectly readable and clean, but you can flatten
it like so:

def get_tweets(term, get_page):
page_nums = itertools.count(1)
pages = itertools.imap(api.getSearch, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets

Stefan Behnel 11-11-2012 07:56 AM

Re: A gnarly little python loop
 
Steve Howell, 11.11.2012 04:03:
> On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote:
>> I'm trying to pull down tweets with one of the many twitter APIs. The
>> particular one I'm using (python-twitter), has a call:
>>
>> data = api.GetSearch(term="foo", page=page)
>>
>> The way it works, you start with page=1. It returns a list of tweets.
>> If the list is empty, there are no more tweets. If the list is not
>> empty, you can try to get more tweets by asking for page=2, page=3, etc.
>> I've got:
>>
>> page = 1
>> while 1:
>> r = api.GetSearch(term="foo", page=page)
>> if not r:
>> break
>> for tweet in r:
>> process(tweet)
>> page += 1
>>
>> It works, but it seems excessively fidgety. Is there some cleaner way
>> to refactor this?

>
> I think your code is perfectly readable and clean, but you can flatten
> it like so:
>
> def get_tweets(term, get_page):
> page_nums = itertools.count(1)
> pages = itertools.imap(api.getSearch, page_nums)
> valid_pages = itertools.takewhile(bool, pages)
> tweets = itertools.chain.from_iterable(valid_pages)
> return tweets


I'd prefer the original code ten times over this inaccessible beast.

Stefan



rusi 11-12-2012 07:09 AM

Re: A gnarly little python loop
 
On Nov 11, 3:58*am, Roy Smith <r...@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. *The
> particular one I'm using (python-twitter), has a call:
>
> data = api.GetSearch(term="foo", page=page)
>
> The way it works, you start with page=1. *It returns a list of tweets..
> If the list is empty, there are no more tweets. *If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> * * page = 1
> * * while 1:
> * * * * r = api.GetSearch(term="foo", page=page)
> * * * * if not r:
> * * * * * * break
> * * * * for tweet in r:
> * * * * * * process(tweet)
> * * * * page += 1
>
> It works, but it seems excessively fidgety. *Is there some cleaner way
> to refactor this?


This is a classic problem -- structure clash of parallel loops -- nd
Steve Howell has given the classic solution using the fact that
generators in python simulate/implement lazy lists.
As David Beazley http://www.dabeaz.com/coroutines/ explains,
coroutines are more general than generators and you can use those if
you prefer.

The classic problem used to be stated like this:
There is an input in cards of 80 columns.
It needs to be copied onto printer of 132 columns.

The structure clash arises because after reading 80 chars a new card
has to be read; after printing 132 chars a linefeed has to be given.

To pythonize the problem, lets replace the 80,132 by 3,4, ie take the
char-square
abc
def
ghi

and produce
abcd
efgh
i

The important difference (explained nicely by Beazley) is that in
generators the for-loop pulls the generators, in coroutines, the
'generator' pushes the consuming coroutines.


---------------
from __future__ import print_function
s= ["abc", "def", "ghi"]

# Coroutine-infrastructure from pep 342
def consumer(func):
def wrapper(*args,**kw):
gen = func(*args, **kw)
gen.next()
return gen
return wrapper

@consumer
def endStage():
while True:
for i in range(0,4):
print((yield), sep='', end='')
print("\n", sep='', end='')


def genStage(s, target):
for line in s:
for i in range(0,3):
target.send(line[i])


if __name__ == '__main__':
genStage(s, endStage())







rusi 11-12-2012 03:21 PM

Re: A gnarly little python loop
 
On Nov 12, 12:09*pm, rusi <rustompm...@gmail.com> wrote:
> This is a classic problem -- structure clash of parallel loops

<rest snipped>

Sorry wrong solution :D

The fidgetiness is entirely due to python not allowing C-style loops
like these:
>> while ((c=getchar()!= EOF) { ... }



Putting it into coroutine form, it becomes something like the
following [Untested since I dont have the API]. Clearly the
fidgetiness is there as before and now with extra coroutine plumbing

def genStage(term, target):
page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r: break
for tweet in r: target.send(tweet)
page += 1


@consumer
def endStage():
while True: process((yield))

if __name__ == '__main__':
genStage("foo", endStage())

Peter Otten 11-12-2012 03:49 PM

Re: A gnarly little python loop
 
rusi wrote:

> The fidgetiness is entirely due to python not allowing C-style loops
> like these:
> >>> while ((c=getchar()!= EOF) { ... }


for c in iter(getchar, EOF):
...

> Clearly the fidgetiness is there as before and now with extra coroutine
> plumbing


Hmm, very funny...



Steve Howell 11-12-2012 04:09 PM

Re: A gnarly little python loop
 
On Nov 12, 7:21*am, rusi <rustompm...@gmail.com> wrote:
> On Nov 12, 12:09*pm, rusi <rustompm...@gmail.com> wrote:> This is a classic problem -- structure clash of parallel loops
>
> <rest snipped>
>
> Sorry wrong solution :D
>
> The fidgetiness is entirely due to python not allowing C-style loops
> like these:
>
> >> while ((c=getchar()!= EOF) { ... }

> [...]


There are actually three fidgety things going on:

1. The API is 1-based instead of 0-based.
2. You don't know the number of pages in advance.
3. You want to process tweets, not pages of tweets.

Here's yet another take on the problem:

# wrap fidgety 1-based api
def search(i):
return api.GetSearch("foo", i+1)

paged_tweets = (search(i) for i in count())

# handle sentinel
paged_tweets = iter(paged_tweets.next, [])

# flatten pages
tweets = chain.from_iterable(paged_tweets)
for tweet in tweets:
process(tweet)


rusi 11-13-2012 04:14 AM

Re: A gnarly little python loop
 
On Nov 12, 9:09*pm, Steve Howell <showel...@yahoo.com> wrote:
> On Nov 12, 7:21*am, rusi <rustompm...@gmail.com> wrote:
>
> > On Nov 12, 12:09*pm, rusi <rustompm...@gmail.com> wrote:> This is a classic problem -- structure clash of parallel loops

>
> > <rest snipped>

>
> > Sorry wrong solution :D

>
> > The fidgetiness is entirely due to python not allowing C-style loops
> > like these:

>
> > >> while ((c=getchar()!= EOF) { ... }

> > [...]

>
> There are actually three fidgety things going on:
>
> *1. The API is 1-based instead of 0-based.
> *2. You don't know the number of pages in advance.
> *3. You want to process tweets, not pages of tweets.
>
> Here's yet another take on the problem:
>
> * * # wrap fidgety 1-based api
> * * def search(i):
> * * * * return api.GetSearch("foo", i+1)
>
> * * paged_tweets = (search(i) for i in count())
>
> * * # handle sentinel
> * * paged_tweets = iter(paged_tweets.next, [])
>
> * * # flatten pages
> * * tweets = chain.from_iterable(paged_tweets)
> * * for tweet in tweets:
> * * * * process(tweet)


[Steve Howell]
Nice on the whole -- thanks
Could not the 1-based-ness be dealt with by using count(1)?
ie use
paged_tweets = (api.GetSearch("foo", i) for i in count(1))

{Peter]
> >>> while ((c=getchar()!= EOF) { ... }


for c in iter(getchar, EOF):
...

Thanks. Learnt something


All times are GMT. The time now is 06:18 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.