Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Help with Excel (win32ole) and threads!?

Reply
Thread Tools

Help with Excel (win32ole) and threads!?

 
 
Alex Ciarlillo
Guest
Posts: n/a
 
      03-23-2007
I have an application which is supposed to create a set of excel
documents to display sales information for dining areas. The core of it
is a class called SaleReport which is initialized with a sqlite database
connection. The main function takes an excel worksheet object, a row
number and some info about the location and queries the database to
populate the excel sheet. Since none of the locations rely on each
others data, I thought it would be pretty useful to thread that part, so
that each location report is run in its own thread. The problem is, even
though this seems to work, it has not improved performance at all and I
am not sure where the bottleneck is. Here are my theories and the
example code is at the bottom:

1) My first theory was that using a single excel application instance
was blocking the threads so that only one could have access at a time,
but now I rewrote it to use multiple excel instances and still no dice.

2) The connection to the database is limiting access to a single thread
at a time. This shouldnt be the case since each instance of the
SaleReport class gets it own connection, and SQLite is threadsafe.

3) I'm flat out using the threads incorrectly.

The example code:
################################################## ###############################
require 'win32ole'
require 'SaleReport.rb'

startTime = Time.now

excel_1 = WIN32OLE::new('excel.Application')
WIN32OLE.const_load(excel_1)
excel_2 = WIN32OLE::new('excel.Application')

book1 = excel_1.Workbooks.add
book2 = excel_2.Workbooks.add

books = {'book1' => book1, 'book2' => book2}

locations = [ {'name' => 'location1', 'num' => 841},
{'name' => 'location2', 'num' => 842},
{'name' => 'location3', 'num' => 843} ]

books.each do |name, book|
puts "Working on #{name}."

threads = []
locations.each do |loc|
threads << Thread.new(loc, name, book) { |l, n, b|
row = 1
worksheet = b.Worksheets(1)
sales = SaleReport.new
puts "Starting thread - #{l} || #{n} || #{b}"
sales.outputLocation(ws, row, l['num'], l['name'], 2007)
}
end
puts "Joining..."
threads.each { |t| t.join }

book.SaveAs "#{name}.xls"
book.Close(1)
end
excel_1.quit
excel_2.quit
excel_1 = excel_2 = nil
GC.start #garbage collect

endTime = Time.now

puts "Time to run: #{endTime-startTime} seconds."
################################################## #############################

Running this with 2 locations without any threading gave a running time
of ~240 seconds. When I added the threading it was almost exactly the
same. The running time increases linearly with each location I add even
with multiple threads.

Thanks,
-Alex

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
ara.t.howard@noaa.gov
Guest
Posts: n/a
 
      03-23-2007
On Fri, 23 Mar 2007, Alex Ciarlillo wrote:

> I have an application which is supposed to create a set of excel
> documents to display sales information for dining areas. The core of it
> is a class called SaleReport which is initialized with a sqlite database
> connection. The main function takes an excel worksheet object, a row
> number and some info about the location and queries the database to
> populate the excel sheet. Since none of the locations rely on each
> others data, I thought it would be pretty useful to thread that part, so
> that each location report is run in its own thread. The problem is, even
> though this seems to work, it has not improved performance at all and I
> am not sure where the bottleneck is. Here are my theories and the
> example code is at the bottom:
>
> 1) My first theory was that using a single excel application instance
> was blocking the threads so that only one could have access at a time,
> but now I rewrote it to use multiple excel instances and still no dice.
>
> 2) The connection to the database is limiting access to a single thread
> at a time. This shouldnt be the case since each instance of the
> SaleReport class gets it own connection, and SQLite is threadsafe.


sqlite is threadsafe, but supports access by only one thread at a time. eg
it's not concurrent at the c level. the only level of concurancy sqlite
provides is at the process level.

> 3) I'm flat out using the threads incorrectly.


it's easy to do on windows - anything which blocks one thread as the os level
will block all threads. this is suprisingly easy to do. your code looks
fine. i'm not on windows, but if i were you i'd write some code that proves
to myself that concurent access to an excel doc by threads does not end up
blocking the whole process as i suspect it does. same goes for your
SaleReport object.

cheers.

-a
--
be kind whenever possible... it is always possible.
- the dalai lama

 
Reply With Quote
 
 
 
 
Jano Svitok
Guest
Posts: n/a
 
      03-24-2007
On 3/23/07, http://www.velocityreviews.com/forums/(E-Mail Removed) <(E-Mail Removed)> wrote:
> On Fri, 23 Mar 2007, Alex Ciarlillo wrote:
>
> > I have an application which is supposed to create a set of excel
> > documents to display sales information for dining areas. The core of it
> > is a class called SaleReport which is initialized with a sqlite database
> > connection. The main function takes an excel worksheet object, a row
> > number and some info about the location and queries the database to
> > populate the excel sheet. Since none of the locations rely on each
> > others data, I thought it would be pretty useful to thread that part, so
> > that each location report is run in its own thread. The problem is, even
> > though this seems to work, it has not improved performance at all and I
> > am not sure where the bottleneck is. Here are my theories and the
> > example code is at the bottom:
> >
> > 1) My first theory was that using a single excel application instance
> > was blocking the threads so that only one could have access at a time,
> > but now I rewrote it to use multiple excel instances and still no dice.
> >
> > 2) The connection to the database is limiting access to a single thread
> > at a time. This shouldnt be the case since each instance of the
> > SaleReport class gets it own connection, and SQLite is threadsafe.

>
> sqlite is threadsafe, but supports access by only one thread at a time. eg
> it's not concurrent at the c level. the only level of concurancy sqlite
> provides is at the process level.
>
> > 3) I'm flat out using the threads incorrectly.

>
> it's easy to do on windows - anything which blocks one thread as the os level
> will block all threads. this is suprisingly easy to do. your code looks
> fine. i'm not on windows, but if i were you i'd write some code that proves
> to myself that concurent access to an excel doc by threads does not end up
> blocking the whole process as i suspect it does. same goes for your
> SaleReport object.


I suppose that as well. As ruby threads are only interpreter threads,
I assume that each call to OLE blocks the entire interpreter.
Therefore it should not make any difference if you call it in threads
or not. Threads may be even slower, due to more overhead. You should
be able to check this by printing something to screen repeatedly in
one thread (remember setting $stdout.sync=true), and doing a long OLE
operation in another. The hyphothesis is that the printing will stop
while OLE is running.

To parallelize this you'd probably use more processes, either using
Win32:rocess from win32utils or manually spawning some worker
processes and communicating with the main process using drb or
similar.

 
Reply With Quote
 
Alex Ciarlillo
Guest
Posts: n/a
 
      03-28-2007
Jano Svitok wrote:
> To parallelize this you'd probably use more processes, either using
> Win32:rocess from win32utils or manually spawning some worker
> processes and communicating with the main process using drb or
> similar.


I did a test in the way you suggested and the WIN32OLE is blocking I'm
pretty sure. I don't think its blocking continuously until the thread is
finished, but when I output 1000 rows of data to excel in one thread and
print the numbers 1-50 to the screen in another, there is a noticeable
pause between each number being printed (as opposed to them printing
almost instantly w/o win32ole). I figure this means that the OLE is
blocking on each output call as you suggested.

I did figured out how to spawn the new process with win32-process and
found out about Marshaling. I figured I could just create a process so
that it takes a Worksheet object (marshaled from a file) and does all
the output to that sheet. The only problem is WIN32OLE objects have no
marshal_dump method defined, so I have no way to pass the sheets around
to different processes. I did not try DRb but I figured it would have
the same problem in this case. I could write my own marshal_dump but I
doubt that writing such a method for a win32ole object is within my
scope of knowledge.

I have pretty much eliminated SQLite as the reason for the threads not
running in parallel since I read up more about read/write locks and how
they are used in sqlite3. All my database connections are reading and
should be able to do so concurrently with no problems.

So anymore ideas on how to parallelize excel output over multiple
worksheets?

Thanks for all the help!
-Alex

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to return an excel file or excel data from ASP.NET Anonieko ASP .Net 2 02-12-2008 07:03 PM
Problem with Excel reports ::::Excel 2003 Migration To Excel 2007 =?Utf-8?B?c2hhc2hhbmsga3Vsa2Fybmk=?= ASP .Net 15 10-24-2007 01:34 PM
using Microsoft Excel image for Export to Excel button =?Utf-8?B?U3JpZGhhcg==?= ASP .Net 0 12-09-2005 08:58 PM
exporting an excel file from database; making changes to excel file and updating the database by importing it back Luis Esteban Valencia ASP .Net 1 01-12-2005 12:28 AM
Excel _WorkBook vs. Excel.Workbook =?Utf-8?B?SmltIEhlYXZleQ==?= ASP .Net 1 09-29-2004 03:48 PM



Advertisments