Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Mechanize MySQL and threads - deadlock?

Reply
Thread Tools

Mechanize MySQL and threads - deadlock?

 
 
Marc Weber
Guest
Posts: n/a
 
      02-26-2010
First of all: I'm still new to Ruby.

So pointing me to documentation or books is fine.

Use case:

Use mechanize to gather information. Because there are many pages I'd
like to run multiple threads each fetching pages. The fetched data
should be written to a MySQL database.

Can you point me to information telling me how to do this?

The failure looks like this now:

/pr/tasks/get_data_ruby/tasks.rb:364:in `join': deadlock detected (fatal)
from /pr/tasks/get_data_ruby/tasks.rb:364:in `block in run_tasks_wait'
from /pr/tasks/get_data_ruby/tasks.rb:364:in `each'
from /pr/tasks/get_data_ruby/tasks.rb:364:in `run_tasks_wait'
from get-data.rb:37:in `<mai

What is causing such deadlocks at all?

Details about my implementation:
=================================
Ruby version: ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
sequel-3.8.0
mysqlplus-0.1.1

Because things always go wrong I'd like store state in database to
resume work where the script failed.

To keep things simple I tried giving each thread it's own agent and DB
connection:


def newDBConnection
Sequel.connect(
:adapter => 'mysql',
:user => 'root',
:host => 'localhost',
:database => 'get_data',
assword=>'XXX')
end

# share one agent and db connection per thread
class MyThread < Thread
def agent
if !@agent
@agent = Mechanize.new
@agent.max_history =1
end
@agent
end

def db
@dbCache ||= newDBConnection
end
end

next I defined a task which reuses the db and Mechanize agent from the
thread which is running the task:

class Task
def run
# override
@thread = Thread.current
task
end

def agent
@agent ||= @thread.agent
end

def db
@dbCache ||= @thread.db
end
end



Next I wrote a simple function taking a list of tasks and a thread class
MyThread. it spawns parallel threads each getting a task from the task
list (Queue). They all may add more tasks to the queue.
The script should run until all tasks are done.

# t: class extending Thread
# tasks: type Queue.new
# parallel: num of threads used to run those tasks
def run_tasks_wait(t, tasks, parallel)
working = 0
threads = []
# run 3 threads
(1..parallel).each {|i|
threads << t.new {
firstTime = true
while working > 0 || firstTime
firstTime = false
while task = tasks.pop
working += 1
$log.debug("starting task #{task.to_s}")
$log.catchAndLog "caught exception in main worker thread" do
task.run if !task.nil?
end
$log.debug("finished task #{task.to_s} threads-working: #{working}")
working -= 1
end
# even if there is nothing left in queue keep thread running if there is one thread running
# this thread may push additional tasks to the queue
sleep 1
end
} }
# wait for threads
threads.each {|t| t.join() }
end


Thanks for any pointers
Marc Weber

 
Reply With Quote
 
 
 
 
Marc Weber
Guest
Posts: n/a
 
      02-26-2010
> # t: class extending Thread
> # tasks: type Queue.new
> # parallel: num of threads used to run those tasks
> def run_tasks_wait(t, tasks, parallel)

Replacing the Queue by an Array seems to fix the issue.

Marc

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How to insert PDF file in to MySql and read it from MySql toJAVA App Jeffrey H. Coffield Java 1 07-24-2009 12:29 AM
Segfault when requiring both mysql and mechanize Rod Dik Ruby 6 06-20-2009 12:20 AM
Odd result when attempting to use Mechanize in parallel with Threads Richard Conroy Ruby 3 12-12-2006 06:22 PM
mySql and multiple connection for threads johnny Python 5 12-11-2006 02:07 AM
[new to threads] threads with UI and loop Une bévue Ruby 0 06-14-2006 10:22 AM



Advertisments