Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Is it better to use threads or fork in the following case

Reply
Thread Tools

Is it better to use threads or fork in the following case

 
 
grocery_stocker
Guest
Posts: n/a
 
      05-03-2009
Let's say there is a new zip file with updated information every 30
minutes on a remote website. Now, I wanna connect to this website
every 30 minutes, download the file, extract the information, and then
have the program search the file search for certain items.

Would it be better to use threads to break this up? I have one thread
download the data and then have another to actually process the data .
Or would it be better to use fork?

 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      05-03-2009
grocery_stocker schrieb:
> Let's say there is a new zip file with updated information every 30
> minutes on a remote website. Now, I wanna connect to this website
> every 30 minutes, download the file, extract the information, and then
> have the program search the file search for certain items.
>
> Would it be better to use threads to break this up? I have one thread
> download the data and then have another to actually process the data .
> Or would it be better to use fork?


Neither. Why do you think you need concurrency at all?

Diez

 
Reply With Quote
 
 
 
 
grocery_stocker
Guest
Posts: n/a
 
      05-03-2009
On May 3, 1:16 pm, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> grocery_stocker schrieb:
>
> > Let's say there is a new zip file with updated information every 30
> > minutes on a remote website. Now, I wanna connect to this website
> > every 30 minutes, download the file, extract the information, and then
> > have the program search the file search for certain items.

>
> > Would it be better to use threads to break this up? I have one thread
> > download the data and then have another to actually process the data .
> > Or would it be better to use fork?

>
> Neither. Why do you think you need concurrency at all?
>


Okay, here is what was going through my mind. I'm a 56k dialup modem.
What happens it takes me 15 minutes to download the file? Now let's
say during those 15 minutes, the program needs to parse the data in
the existing file.

 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      05-03-2009
grocery_stocker schrieb:
> On May 3, 1:16 pm, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
>> grocery_stocker schrieb:
>>
>>> Let's say there is a new zip file with updated information every 30
>>> minutes on a remote website. Now, I wanna connect to this website
>>> every 30 minutes, download the file, extract the information, and then
>>> have the program search the file search for certain items.
>>> Would it be better to use threads to break this up? I have one thread
>>> download the data and then have another to actually process the data .
>>> Or would it be better to use fork?

>> Neither. Why do you think you need concurrency at all?
>>

>
> Okay, here is what was going through my mind. I'm a 56k dialup modem.
> What happens it takes me 15 minutes to download the file? Now let's
> say during those 15 minutes, the program needs to parse the data in
> the existing file.


Is this an exercise in asking 20 hypothetical questions?

Getting concurrency right isn't trivial, so if you absolute don't need
this, don't do it.

Diez
 
Reply With Quote
 
CTO
Guest
Posts: n/a
 
      05-03-2009
Probably better just to check HEAD and see if its updated within the
time you're
looking at before any unpack. Even on a 56k that's going to be pretty
fast, and
you don't risk unpacking an old file while a new version is on the
way.

If you still want to be able to unpack the old file if there's an
update then
you're probably right about needing to run it concurrently, and
personally I'd
just fork it for ease of use- it doesn't sound like you're trying to
run 100,000
of these at the same time, and you're saving the file anyway.

Geremy Condra

 
Reply With Quote
 
Paul Hankin
Guest
Posts: n/a
 
      05-03-2009
On May 3, 10:29*pm, grocery_stocker <(E-Mail Removed)> wrote:
> On May 3, 1:16 pm, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
>
> > grocery_stocker schrieb:

>
> > > Let's say there is a new zip file with updated information every 30
> > > minutes on a remote website. Now, I wanna connect to this website
> > > every 30 minutes, download the file, extract the information, and then
> > > have the program search the file search for certain items.

>
> > > Would it be better to use threads to break this up? I have one thread
> > > download the data and then have another to actually process the data ..
> > > Or would it be better to use fork?

>
> > Neither. Why do you think you need concurrency at all?

>
> Okay, here is what was going through my mind. I'm a 56k dialup modem.
> What happens it takes me 15 minutes to download the file? Now let's
> say during those 15 minutes, the program needs to parse the data in
> the existing file.


If your modem is going at full speed for those 15 minutes, you'll have
around 6.3Mb of data. Even after decompressing, and unless the data is
in some quite difficult to parse format, it'll take seconds to
process.

--
Paul Hankin
 
Reply With Quote
 
grocery_stocker
Guest
Posts: n/a
 
      05-03-2009
On May 3, 1:40 pm, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> grocery_stocker schrieb:
>
>
>
> > On May 3, 1:16 pm, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> >> grocery_stocker schrieb:

>
> >>> Let's say there is a new zip file with updated information every 30
> >>> minutes on a remote website. Now, I wanna connect to this website
> >>> every 30 minutes, download the file, extract the information, and then
> >>> have the program search the file search for certain items.
> >>> Would it be better to use threads to break this up? I have one thread
> >>> download the data and then have another to actually process the data .
> >>> Or would it be better to use fork?
> >> Neither. Why do you think you need concurrency at all?

>
> > Okay, here is what was going through my mind. I'm a 56k dialup modem.
> > What happens it takes me 15 minutes to download the file? Now let's
> > say during those 15 minutes, the program needs to parse the data in
> > the existing file.

>
> Is this an exercise in asking 20 hypothetical questions?
>


No. This the prelude to me writing a real life python program.
 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      05-03-2009
En Sun, 03 May 2009 17:45:36 -0300, Paul Hankin <(E-Mail Removed)>
escribió:
> On May 3, 10:29*pm, grocery_stocker <(E-Mail Removed)> wrote:
>> On May 3, 1:16 pm, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
>> > grocery_stocker schrieb:


>> > > Would it be better to use threads to break this up? I have one

>> thread
>> > > download the data and then have another to actually process the

>> data .
>> > > Or would it be better to use fork?

>>
>> > Neither. Why do you think you need concurrency at all?

>>
>> Okay, here is what was going through my mind. I'm a 56k dialup modem.
>> What happens it takes me 15 minutes to download the file? Now let's
>> say during those 15 minutes, the program needs to parse the data in
>> the existing file.

>
> If your modem is going at full speed for those 15 minutes, you'll have
> around 6.3Mb of data. Even after decompressing, and unless the data is
> in some quite difficult to parse format, it'll take seconds to
> process.


In addition, the zip file format stores the directory at the end of the
file. So you can't process it until it's completely downloaded.
Concurrency doesn't help here.

--
Gabriel Genellina

 
Reply With Quote
 
CTO
Guest
Posts: n/a
 
      05-03-2009
> In addition, the zip file format stores the directory at the end of the *
> file. So you can't process it until it's completely downloaded. *
> Concurrency doesn't help here.


Don't think that's relevant, if I'm understanding the OP correctly.
Lets say you've downloaded the file once and you're doing whatever
the app does with it. Now, while that's happening the half an hour
time limit comes up. Now you want to start another download, but
you also want to continue to work with the old version. Voila,
concurrency.

 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      05-03-2009
On Sun, 3 May 2009 13:59:11 -0700 (PDT), grocery_stocker
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:

> No. This the prelude to me writing a real life python program.


Lots of "real life python programs" don't need threading or other
spawned processes...

Your 56K dial-up is probably only running around 44kbps (no "56K"
modem, in the US, ever reaches that speed -- the FCC limited the maximum
allowed bit-rate on phone lines to around 52kbps, and since the actual
speed is affected by the cleanliness of the signal on the lines rarely
hits even 50kbps). Assuming 44,000bps, no handshake/protocol overhead,
that comes to 5,500bytes/sec => 330,000 bytes/min => 4,950,000 in 15
minutes... call it 5MB... What type of processing are you planning that
would take any fairly recent computer 15 minutes to handle 5MB of data
-- 5MB is about 6 minutes of MP3 audio, or 3-4 3.5MP JPEGs

Presuming your processing really does have the risk of running over
into the next download interval, I'd suggest at most two threads
(pseudo-code):

worklist = Queue.Queue()

def downloader():
while True:
startTime = time.time()
#imagine proper format conversions for strings
filename = BASEFILENAME + startTime
doDownload(filename)
worklist.put(filename)
#compute next download time taking into account elapsed time
sleep (startTime + 30mins) - time.time()

def processor():
while True:
filename = worklist.get()
doFileProcessing(filename)


This ensures that downloads start every 30 minutes (unless a
download runs over 30 minutes, in which case the sleep is negative, and
probably returns immediately) regardless of the processing duration. It
also ensures that the files are processed IN ORDER OF DOWNLOAD with NO
OVERLAPS.

Threading is probably suited, since the downloader is blocked on a
sleep call, letting the processor run full speed; and if the processor
is fast, it will block waiting for the next file to be available,
meaning the downloader gets full CPU usage.

--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) (E-Mail Removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (E-Mail Removed))
HTTP://www.bestiaria.com/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
The Web server reported the following error when attempting to create or open the Web project located at the following URL: 'http://localhost/822319ev1'. 'HTTP/1.1 500 Internal Server Error'. chanmm ASP .Net 2 09-07-2010 07:37 AM
os.fork and pty.fork Eric Snow Python 0 01-08-2009 06:32 AM
Any better code for the following simple case? shuisheng C++ 3 12-12-2006 03:11 AM
The SCO case gets better and better.... thingy NZ Computing 2 12-10-2006 11:33 AM
Build a Better Blair (like Build a Better Bush, only better) Kenny Computer Support 0 05-06-2005 04:50 AM



Advertisments