![]() |
A gnarly little python loop
I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call: data = api.GetSearch(term="foo", page=page) The way it works, you start with page=1. It returns a list of tweets. If the list is empty, there are no more tweets. If the list is not empty, you can try to get more tweets by asking for page=2, page=3, etc. I've got: page = 1 while 1: r = api.GetSearch(term="foo", page=page) if not r: break for tweet in r: process(tweet) page += 1 It works, but it seems excessively fidgety. Is there some cleaner way to refactor this? |
Re: A gnarly little python loop
On Sat, Nov 10, 2012 at 3:58 PM, Roy Smith <roy@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. The > particular one I'm using (python-twitter), has a call: > > data = api.GetSearch(term="foo", page=page) > > The way it works, you start with page=1. It returns a list of tweets. > If the list is empty, there are no more tweets. If the list is not > empty, you can try to get more tweets by asking for page=2, page=3, etc. > I've got: > > page = 1 > while 1: > r = api.GetSearch(term="foo", page=page) > if not r: > break > for tweet in r: > process(tweet) > page += 1 > > It works, but it seems excessively fidgety. Is there some cleaner way > to refactor this? I'd do something like this: def get_tweets(term): for page in itertools.count(1): r = api.GetSearch(term, page) if not r: break for tweet in r: yield tweet for tweet in get_tweets("foo"): process(tweet) |
Re: A gnarly little python loop
On Sat, 10 Nov 2012 17:58:14 -0500, Roy Smith wrote:
> The way it works, you start with page=1. It returns a list of tweets. > If the list is empty, there are no more tweets. If the list is not > empty, you can try to get more tweets by asking for page=2, page=3, etc. > I've got: > > page = 1 > while 1: > r = api.GetSearch(term="foo", page=page) > if not r: > break > for tweet in r: > process(tweet) > page += 1 > > It works, but it seems excessively fidgety. Is there some cleaner way > to refactor this? Seems clean enough to me. It does exactly what you need: loop until there are no more tweets, process each tweet. If you're allergic to nested loops, move the inner for-loop into a function. Also you could get rid of the "if r: break". page = 1 r = ["placeholder"] while r: r = api.GetSearch(term="foo", page=page) process_all(tweets) # does nothing if r is empty page += 1 Another way would be to use a for list for the outer loop. for page in xrange(1, sys.maxint): r = api.GetSearch(term="foo", page=page) if not r: break process_all(r) -- Steven |
Re: A gnarly little python loop
On Nov 10, 2:58*pm, Roy Smith <r...@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. *The > particular one I'm using (python-twitter), has a call: > > data = api.GetSearch(term="foo", page=page) > > The way it works, you start with page=1. *It returns a list of tweets.. > If the list is empty, there are no more tweets. *If the list is not > empty, you can try to get more tweets by asking for page=2, page=3, etc. > I've got: > > * * page = 1 > * * while 1: > * * * * r = api.GetSearch(term="foo", page=page) > * * * * if not r: > * * * * * * break > * * * * for tweet in r: > * * * * * * process(tweet) > * * * * page += 1 > > It works, but it seems excessively fidgety. *Is there some cleaner way > to refactor this? I think your code is perfectly readable and clean, but you can flatten it like so: def get_tweets(term, get_page): page_nums = itertools.count(1) pages = itertools.imap(api.getSearch, page_nums) valid_pages = itertools.takewhile(bool, pages) tweets = itertools.chain.from_iterable(valid_pages) return tweets |
Re: A gnarly little python loop
Steve Howell, 11.11.2012 04:03:
> On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote: >> I'm trying to pull down tweets with one of the many twitter APIs. The >> particular one I'm using (python-twitter), has a call: >> >> data = api.GetSearch(term="foo", page=page) >> >> The way it works, you start with page=1. It returns a list of tweets. >> If the list is empty, there are no more tweets. If the list is not >> empty, you can try to get more tweets by asking for page=2, page=3, etc. >> I've got: >> >> page = 1 >> while 1: >> r = api.GetSearch(term="foo", page=page) >> if not r: >> break >> for tweet in r: >> process(tweet) >> page += 1 >> >> It works, but it seems excessively fidgety. Is there some cleaner way >> to refactor this? > > I think your code is perfectly readable and clean, but you can flatten > it like so: > > def get_tweets(term, get_page): > page_nums = itertools.count(1) > pages = itertools.imap(api.getSearch, page_nums) > valid_pages = itertools.takewhile(bool, pages) > tweets = itertools.chain.from_iterable(valid_pages) > return tweets I'd prefer the original code ten times over this inaccessible beast. Stefan |
Re: A gnarly little python loop
On Nov 11, 3:58*am, Roy Smith <r...@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. *The > particular one I'm using (python-twitter), has a call: > > data = api.GetSearch(term="foo", page=page) > > The way it works, you start with page=1. *It returns a list of tweets.. > If the list is empty, there are no more tweets. *If the list is not > empty, you can try to get more tweets by asking for page=2, page=3, etc. > I've got: > > * * page = 1 > * * while 1: > * * * * r = api.GetSearch(term="foo", page=page) > * * * * if not r: > * * * * * * break > * * * * for tweet in r: > * * * * * * process(tweet) > * * * * page += 1 > > It works, but it seems excessively fidgety. *Is there some cleaner way > to refactor this? This is a classic problem -- structure clash of parallel loops -- nd Steve Howell has given the classic solution using the fact that generators in python simulate/implement lazy lists. As David Beazley http://www.dabeaz.com/coroutines/ explains, coroutines are more general than generators and you can use those if you prefer. The classic problem used to be stated like this: There is an input in cards of 80 columns. It needs to be copied onto printer of 132 columns. The structure clash arises because after reading 80 chars a new card has to be read; after printing 132 chars a linefeed has to be given. To pythonize the problem, lets replace the 80,132 by 3,4, ie take the char-square abc def ghi and produce abcd efgh i The important difference (explained nicely by Beazley) is that in generators the for-loop pulls the generators, in coroutines, the 'generator' pushes the consuming coroutines. --------------- from __future__ import print_function s= ["abc", "def", "ghi"] # Coroutine-infrastructure from pep 342 def consumer(func): def wrapper(*args,**kw): gen = func(*args, **kw) gen.next() return gen return wrapper @consumer def endStage(): while True: for i in range(0,4): print((yield), sep='', end='') print("\n", sep='', end='') def genStage(s, target): for line in s: for i in range(0,3): target.send(line[i]) if __name__ == '__main__': genStage(s, endStage()) |
Re: A gnarly little python loop
On Nov 12, 12:09*pm, rusi <rustompm...@gmail.com> wrote:
> This is a classic problem -- structure clash of parallel loops <rest snipped> Sorry wrong solution :D The fidgetiness is entirely due to python not allowing C-style loops like these: >> while ((c=getchar()!= EOF) { ... } Putting it into coroutine form, it becomes something like the following [Untested since I dont have the API]. Clearly the fidgetiness is there as before and now with extra coroutine plumbing def genStage(term, target): page = 1 while 1: r = api.GetSearch(term="foo", page=page) if not r: break for tweet in r: target.send(tweet) page += 1 @consumer def endStage(): while True: process((yield)) if __name__ == '__main__': genStage("foo", endStage()) |
Re: A gnarly little python loop
rusi wrote:
> The fidgetiness is entirely due to python not allowing C-style loops > like these: > >>> while ((c=getchar()!= EOF) { ... } for c in iter(getchar, EOF): ... > Clearly the fidgetiness is there as before and now with extra coroutine > plumbing Hmm, very funny... |
Re: A gnarly little python loop
On Nov 12, 7:21*am, rusi <rustompm...@gmail.com> wrote:
> On Nov 12, 12:09*pm, rusi <rustompm...@gmail.com> wrote:> This is a classic problem -- structure clash of parallel loops > > <rest snipped> > > Sorry wrong solution :D > > The fidgetiness is entirely due to python not allowing C-style loops > like these: > > >> while ((c=getchar()!= EOF) { ... } > [...] There are actually three fidgety things going on: 1. The API is 1-based instead of 0-based. 2. You don't know the number of pages in advance. 3. You want to process tweets, not pages of tweets. Here's yet another take on the problem: # wrap fidgety 1-based api def search(i): return api.GetSearch("foo", i+1) paged_tweets = (search(i) for i in count()) # handle sentinel paged_tweets = iter(paged_tweets.next, []) # flatten pages tweets = chain.from_iterable(paged_tweets) for tweet in tweets: process(tweet) |
Re: A gnarly little python loop
On Nov 12, 9:09*pm, Steve Howell <showel...@yahoo.com> wrote:
> On Nov 12, 7:21*am, rusi <rustompm...@gmail.com> wrote: > > > On Nov 12, 12:09*pm, rusi <rustompm...@gmail.com> wrote:> This is a classic problem -- structure clash of parallel loops > > > <rest snipped> > > > Sorry wrong solution :D > > > The fidgetiness is entirely due to python not allowing C-style loops > > like these: > > > >> while ((c=getchar()!= EOF) { ... } > > [...] > > There are actually three fidgety things going on: > > *1. The API is 1-based instead of 0-based. > *2. You don't know the number of pages in advance. > *3. You want to process tweets, not pages of tweets. > > Here's yet another take on the problem: > > * * # wrap fidgety 1-based api > * * def search(i): > * * * * return api.GetSearch("foo", i+1) > > * * paged_tweets = (search(i) for i in count()) > > * * # handle sentinel > * * paged_tweets = iter(paged_tweets.next, []) > > * * # flatten pages > * * tweets = chain.from_iterable(paged_tweets) > * * for tweet in tweets: > * * * * process(tweet) [Steve Howell] Nice on the whole -- thanks Could not the 1-based-ness be dealt with by using count(1)? ie use paged_tweets = (api.GetSearch("foo", i) for i in count(1)) {Peter] > >>> while ((c=getchar()!= EOF) { ... } for c in iter(getchar, EOF): ... Thanks. Learnt something |
| All times are GMT. The time now is 06:18 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.