Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > how to start thread by group?

Reply
Thread Tools

how to start thread by group?

 
 
oyster
Guest
Posts: n/a
 
      10-06-2008
my code is not right, can sb give me a hand? thanx

for example, I have 1000 urls to be downloaded, but only 5 thread at one time
def threadTask(ulr):
download(url)

threadsAll=[]
for url in all_url:
task=threading.Thread(target=threadTask, args=[url])
threadsAll.append(task)

for every5task in groupcount(threadsAll,5):
for everytask in every5task:
everytask.start()

for everytask in every5task:
everytask.join()

for everytask in every5task: #this does not run ok
while everytask.isAlive():
pass
 
Reply With Quote
 
 
 
 
bieffe62@gmail.com
Guest
Posts: n/a
 
      10-06-2008
On 6 Ott, 15:24, oyster <lepto.pyt...@gmail.com> wrote:
> my code is not right, can sb give me a hand? thanx
>
> for example, I have 1000 urls to be downloaded, but only 5 thread at one time
> def threadTask(ulr):
> * download(url)
>
> threadsAll=[]
> for url in all_url:
> * * *task=threading.Thread(target=threadTask, args=[url])
> * * *threadsAll.append(task)
>
> for every5task in groupcount(threadsAll,5):
> * * for everytask in every5task:
> * * * * everytask.start()
>
> * * for everytask in every5task:
> * * * * everytask.join()
>
> * * for everytask in every5task: * * * *#this does not run ok
> * * * * while everytask.isAlive():
> * * * * * * pass


Thread.join() stops until the thread is finished. You are assuming
that the threads
terminates exactly in the order in which are started. Moreover, before
starting the
next 5 threads you are waiting that all previous 5 threads have been
completed, while I
believe your intention was to have always the full load of 5 threads
downloading.

I would restructure my code with someting like this ( WARNING: the
following code is
ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
express my idea of
the algorithm (which, also, could be wrong ):


import threading, time

MAX_THREADS = 5
DELAY = 0.01 # or whatever

def task_function( url ):
download( url )

def start_thread( url):
task=threading.Thread(target=task_function, args=[url])
return task

def main():
all_urls = load_urls()
all_threads = []
while all_urls:
while len(all_threads) < MAX_THREADS:
url = all_urls.pop(0)
t = start_thread()
all_threads.append(t)
for t in all_threads
if not t.isAlive():
t.join()
all_threads.delete(t)
time.sleep( DELAY )


HTH

Ciao
-----
FB
 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      10-07-2008
En Mon, 06 Oct 2008 11:24:51 -0300, <> escribió:

> On 6 Ott, 15:24, oyster <lepto.pyt...@gmail.com> wrote:
>> my code is not right, can sb give me a hand? thanx
>>
>> for example, I have 1000 urls to be downloaded, but only 5 thread at
>> one time


> I would restructure my code with someting like this ( WARNING: the
> following code is
> ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
> express my idea of
> the algorithm (which, also, could be wrong ):


Your code creates one thread per url (but never more than MAX_THREADS
alive at the same time). Usually it's more efficient to create all the
MAX_THREADS at once, and continuously feed them with tasks to be done. A
Queue object is the way to synchronize them; from the documentation:

<code>
from Queue import Queue
from threading import Thread

num_worker_threads = 3
list_of_urls = ["http://foo.com", "http://bar.com",
"http://baz.com", "http://spam.com",
"http://egg.com",
]

def do_work(url):
from time import sleep
from random import randrange
from threading import currentThread
print "%s downloading %s" % (currentThread().getName(), url)
sleep(randrange(5))
print "%s done" % currentThread().getName()

# from this point on, copied almost verbatim from the Queue example
# at the end of http://docs.python.org/library/queue.html

def worker():
while True:
item = q.get()
do_work(item)
q.task_done()

q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.setDaemon(True)
t.start()

for item in list_of_urls:
q.put(item)

q.join() # block until all tasks are done
print "Finished"
</code>


--
Gabriel Genellina

 
Reply With Quote
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      10-07-2008
In message <mailman.2088.1223354239.3487.python->, Gabriel
Genellina wrote:

> Usually it's more efficient to create all the MAX_THREADS at once, and
> continuously feed them with tasks to be done.


Given that the bottleneck is most likely to be the internet connection, I'd
say the "premature optimization is the root of all evil" adage applies
here.
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      10-07-2008
Lawrence D'Oliveiro wrote:
> In message <mailman.2088.1223354239.3487.python->, Gabriel
> Genellina wrote:
>
>> Usually it's more efficient to create all the MAX_THREADS at once, and
>> continuously feed them with tasks to be done.

>
> Given that the bottleneck is most likely to be the internet connection, I'd
> say the "premature optimization is the root of all evil" adage applies
> here.


There is also the bottleneck of programmer time to understand, write,
and maintain. In this case, 'more efficient' is simpler, and to me,
more efficient of programmer time. Feeding a fixed pool of worker
threads with a Queue() is a standard design that is easy to understand
and one the OP should learn. Re-using tested code is certainly
efficient of programmer time. Managing a variable pool of workers that
die and need to be replaced is more complex (two loops nested within a
loop) and error prone (though learning that alternative is probably not
a bad idea also).

tjr

 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      10-07-2008
En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <>
escribió:
> Lawrence D'Oliveiro wrote:
>> In message <mailman.2088.1223354239.3487.python->,
>> Gabriel Genellina wrote:
>>
>>> Usually it's more efficient to create all the MAX_THREADS at once, and
>>> continuously feed them with tasks to be done.

>> Given that the bottleneck is most likely to be the internet
>> connection, I'd
>> say the "premature optimization is the root of all evil" adage applies
>> here.

>
> There is also the bottleneck of programmer time to understand, write,
> and maintain. In this case, 'more efficient' is simpler, and to me,
> more efficient of programmer time. Feeding a fixed pool of worker
> threads with a Queue() is a standard design that is easy to understand
> and one the OP should learn. Re-using tested code is certainly
> efficient of programmer time. Managing a variable pool of workers that
> die and need to be replaced is more complex (two loops nested within a
> loop) and error prone (though learning that alternative is probably not
> a bad idea also).


I'd like to add that debugging a program that continuously creates and
destroys threads is a real PITA.

--
Gabriel Genellina

 
Reply With Quote
 
bieffe62@gmail.com
Guest
Posts: n/a
 
      10-08-2008
On 7 Ott, 06:37, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> wrote:
> En Mon, 06 Oct 2008 11:24:51 -0300, <bieff...@gmail.com> escribió:
>
> > On 6 Ott, 15:24, oyster <lepto.pyt...@gmail.com> wrote:
> >> my code is not right, can sb give me a hand? thanx

>
> >> for example, I have 1000 urls to be downloaded, but only 5 thread at *
> >> one time

> > I would restructure my code with someting like this ( WARNING: the
> > following code is
> > ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
> > express my idea of
> > the algorithm (which, also, could be wrong ):

>
> Your code creates one thread per url (but never more than MAX_THREADS *
> alive at the same time). Usually it's more efficient to create all the *
> MAX_THREADS at once, and continuously feed them with tasks to be done. A *
> Queue object is the way to synchronize them; from the documentation:
>
> <code>
> *from Queue import Queue
> *from threading import Thread
>
> num_worker_threads = 3
> list_of_urls = ["http://foo.com", "http://bar.com",
> * * * * * * * * *"http://baz.com", "http://spam.com",
> * * * * * * * * *"http://egg.com",
> * * * * * * * * ]
>
> def do_work(url):
> * * *from time import sleep
> * * *from random import randrange
> * * *from threading import currentThread
> * * *print "%s downloading %s" % (currentThread().getName(), url)
> * * *sleep(randrange(5))
> * * *print "%s done" % currentThread().getName()
>
> # from this point on, copied almost verbatim from the Queue example
> # at the end ofhttp://docs.python.org/library/queue.html
>
> def worker():
> * * *while True:
> * * * * *item = q.get()
> * * * * *do_work(item)
> * * * * *q.task_done()
>
> q = Queue()
> for i in range(num_worker_threads):
> * * * t = Thread(target=worker)
> * * * t.setDaemon(True)
> * * * t.start()
>
> for item in list_of_urls:
> * * *q.put(item)
>
> q.join() * * * # block until all tasks are done
> print "Finished"
> </code>
>
> --
> Gabriel Genellina



Agreed.
I was trying to do what the OP was trying to do, but in a way that
works.
But keeping the thread alive and feeding them the URL is a better
design, definitly.
And no, I don't think its 'premature optimization': it is just
cleaner.

Ciao
------
FB
 
Reply With Quote
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      10-13-2008
In message <mailman.2151.1223416240.3487.python->, Gabriel
Genellina wrote:

> En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <>
> escribió:
>
>> Lawrence D'Oliveiro wrote:
>>
>>> In message <mailman.2088.1223354239.3487.python->,
>>> Gabriel Genellina wrote:
>>>
>>>> Usually it's more efficient to create all the MAX_THREADS at once, and
>>>> continuously feed them with tasks to be done.
>>>
>>> Given that the bottleneck is most likely to be the internet
>>> connection, I'd say the "premature optimization is the root of all evil"
>>> adage applies here.

>>
>> Feeding a fixed pool of worker threads with a Queue() is a standard
>> design that is easy to understand and one the OP should learn. Re-using
>> tested code is certainly efficient of programmer time.

>
> I'd like to add that debugging a program that continuously creates and
> destroys threads is a real PITA.


That's God trying to tell you to avoid threads altogether.
 
Reply With Quote
 
sjdevnull@yahoo.com
Guest
Posts: n/a
 
      10-13-2008
On Oct 13, 6:54 am, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message <mailman.2151.1223416240.3487.python-l...@python.org>, Gabriel
>
>
>
> Genellina wrote:
> > En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy <tjre...@udel.edu>
> > escribió:

>
> >> Lawrence D'Oliveiro wrote:

>
> >>> In message <mailman.2088.1223354239.3487.python-l...@python.org>,
> >>> Gabriel Genellina wrote:

>
> >>>> Usually it's more efficient to create all the MAX_THREADS at once, and
> >>>> continuously feed them with tasks to be done.

>
> >>> Given that the bottleneck is most likely to be the internet
> >>> connection, I'd say the "premature optimization is the root of all evil"
> >>> adage applies here.

>
> >> Feeding a fixed pool of worker threads with a Queue() is a standard
> >> design that is easy to understand and one the OP should learn. Re-using
> >> tested code is certainly efficient of programmer time.

>
> > I'd like to add that debugging a program that continuously creates and
> > destroys threads is a real PITA.

>
> That's God trying to tell you to avoid threads altogether.


Especially in a case like this that's tailor made for a trivial state-
machine solution if you really want multiple connections.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
an't start a thread Pool from another thread Glazner Python 0 01-06-2010 08:15 PM
why we should call Thread.start(),not directly call Thread.run()? junzhang1983@gmail.com Java 5 06-20-2008 03:12 PM
Start a thread with the same credentials as the parent thread =?Utf-8?B?cGxleDRy?= ASP .Net 0 11-13-2007 08:11 PM
Threads: does Thread.start() atomically set Thread.__started ? Enigma Curry Python 1 03-15-2006 01:54 PM



Advertisments