Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: How to safely maintain a status file

Reply
Thread Tools

Re: How to safely maintain a status file

 
 
Plumo
Guest
Posts: n/a
 
      07-09-2012
> What are you keeping in this status file that needs to be saved
> several times per second? *Depending on what type of state you're
> storing and how persistent it needs to be, there may be a better way
> to store it.
>
> Michael


This is for a threaded web crawler. I want to cache what URL's are
currently in the queue so if terminated the crawler can continue next
time from the same point.
 
Reply With Quote
 
 
 
 
Michael Hrivnak
Guest
Posts: n/a
 
      07-09-2012
Please consider batching this data and doing larger writes. Thrashing
the hard drive is not a good plan for performance or hardware
longevity. For example, crawl an entire FQDN and then write out the
results in one operation. If your job fails in the middle and you
have to start that FQDN over, no big deal. If that's too big of a
chunk for your purposes, perhaps break each FQDN up into top-level
directories and crawl each of those in one operation before writing to
disk.

There are existing solutions for managing job queues, so you can
choose what you like. If you're unfamiliar, maybe start by looking at
celery.

Michael

On Mon, Jul 9, 2012 at 1:52 AM, Plumo <(E-Mail Removed)> wrote:
>> What are you keeping in this status file that needs to be saved
>> several times per second? Depending on what type of state you're
>> storing and how persistent it needs to be, there may be a better way
>> to store it.
>>
>> Michael

>
> This is for a threaded web crawler. I want to cache what URL's are
> currently in the queue so if terminated the crawler can continue next
> time from the same point.
> --
> http://mail.python.org/mailman/listinfo/python-list

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How to safely maintain a status file Plumo Python 1 07-12-2012 05:46 PM
Re: How to safely maintain a status file John Nagle Python 2 07-12-2012 12:31 PM
Re: How to safely maintain a status file Laszlo Nagy Python 1 07-09-2012 08:57 AM
Re: How to safely maintain a status file Dennis Lee Bieber Python 1 07-09-2012 05:58 AM
How to safely maintain a status file Richard Baron Penman Python 0 07-08-2012 11:29 AM



Advertisments