Help with Excel (win32ole) and threads!?

A

Alex Ciarlillo

I have an application which is supposed to create a set of excel
documents to display sales information for dining areas. The core of it
is a class called SaleReport which is initialized with a sqlite database
connection. The main function takes an excel worksheet object, a row
number and some info about the location and queries the database to
populate the excel sheet. Since none of the locations rely on each
others data, I thought it would be pretty useful to thread that part, so
that each location report is run in its own thread. The problem is, even
though this seems to work, it has not improved performance at all and I
am not sure where the bottleneck is. Here are my theories and the
example code is at the bottom:

1) My first theory was that using a single excel application instance
was blocking the threads so that only one could have access at a time,
but now I rewrote it to use multiple excel instances and still no dice.

2) The connection to the database is limiting access to a single thread
at a time. This shouldnt be the case since each instance of the
SaleReport class gets it own connection, and SQLite is threadsafe.

3) I'm flat out using the threads incorrectly.

The example code:
#################################################################################
require 'win32ole'
require 'SaleReport.rb'

startTime = Time.now

excel_1 = WIN32OLE::new('excel.Application')
WIN32OLE.const_load(excel_1)
excel_2 = WIN32OLE::new('excel.Application')

book1 = excel_1.Workbooks.add
book2 = excel_2.Workbooks.add

books = {'book1' => book1, 'book2' => book2}

locations = [ {'name' => 'location1', 'num' => 841},
{'name' => 'location2', 'num' => 842},
{'name' => 'location3', 'num' => 843} ]

books.each do |name, book|
puts "Working on #{name}."

threads = []
locations.each do |loc|
threads << Thread.new(loc, name, book) { |l, n, b|
row = 1
worksheet = b.Worksheets(1)
sales = SaleReport.new
puts "Starting thread - #{l} || #{n} || #{b}"
sales.outputLocation(ws, row, l['num'], l['name'], 2007)
}
end
puts "Joining..."
threads.each { |t| t.join }

book.SaveAs "#{name}.xls"
book.Close(1)
end
excel_1.quit
excel_2.quit
excel_1 = excel_2 = nil
GC.start #garbage collect

endTime = Time.now

puts "Time to run: #{endTime-startTime} seconds."
###############################################################################

Running this with 2 locations without any threading gave a running time
of ~240 seconds. When I added the threading it was almost exactly the
same. The running time increases linearly with each location I add even
with multiple threads.

Thanks,
-Alex
 
A

ara.t.howard

I have an application which is supposed to create a set of excel
documents to display sales information for dining areas. The core of it
is a class called SaleReport which is initialized with a sqlite database
connection. The main function takes an excel worksheet object, a row
number and some info about the location and queries the database to
populate the excel sheet. Since none of the locations rely on each
others data, I thought it would be pretty useful to thread that part, so
that each location report is run in its own thread. The problem is, even
though this seems to work, it has not improved performance at all and I
am not sure where the bottleneck is. Here are my theories and the
example code is at the bottom:

1) My first theory was that using a single excel application instance
was blocking the threads so that only one could have access at a time,
but now I rewrote it to use multiple excel instances and still no dice.

2) The connection to the database is limiting access to a single thread
at a time. This shouldnt be the case since each instance of the
SaleReport class gets it own connection, and SQLite is threadsafe.

sqlite is threadsafe, but supports access by only one thread at a time. eg
it's not concurrent at the c level. the only level of concurancy sqlite
provides is at the process level.
3) I'm flat out using the threads incorrectly.

it's easy to do on windows - anything which blocks one thread as the os level
will block all threads. this is suprisingly easy to do. your code looks
fine. i'm not on windows, but if i were you i'd write some code that proves
to myself that concurent access to an excel doc by threads does not end up
blocking the whole process as i suspect it does. same goes for your
SaleReport object.

cheers.

-a
 
J

Jano Svitok

sqlite is threadsafe, but supports access by only one thread at a time. eg
it's not concurrent at the c level. the only level of concurancy sqlite
provides is at the process level.


it's easy to do on windows - anything which blocks one thread as the os level
will block all threads. this is suprisingly easy to do. your code looks
fine. i'm not on windows, but if i were you i'd write some code that proves
to myself that concurent access to an excel doc by threads does not end up
blocking the whole process as i suspect it does. same goes for your
SaleReport object.

I suppose that as well. As ruby threads are only interpreter threads,
I assume that each call to OLE blocks the entire interpreter.
Therefore it should not make any difference if you call it in threads
or not. Threads may be even slower, due to more overhead. You should
be able to check this by printing something to screen repeatedly in
one thread (remember setting $stdout.sync=true), and doing a long OLE
operation in another. The hyphothesis is that the printing will stop
while OLE is running.

To parallelize this you'd probably use more processes, either using
Win32::process from win32utils or manually spawning some worker
processes and communicating with the main process using drb or
similar.
 
A

Alex Ciarlillo

Jano said:
To parallelize this you'd probably use more processes, either using
Win32::process from win32utils or manually spawning some worker
processes and communicating with the main process using drb or
similar.

I did a test in the way you suggested and the WIN32OLE is blocking I'm
pretty sure. I don't think its blocking continuously until the thread is
finished, but when I output 1000 rows of data to excel in one thread and
print the numbers 1-50 to the screen in another, there is a noticeable
pause between each number being printed (as opposed to them printing
almost instantly w/o win32ole). I figure this means that the OLE is
blocking on each output call as you suggested.

I did figured out how to spawn the new process with win32-process and
found out about Marshaling. I figured I could just create a process so
that it takes a Worksheet object (marshaled from a file) and does all
the output to that sheet. The only problem is WIN32OLE objects have no
marshal_dump method defined, so I have no way to pass the sheets around
to different processes. I did not try DRb but I figured it would have
the same problem in this case. I could write my own marshal_dump but I
doubt that writing such a method for a win32ole object is within my
scope of knowledge.

I have pretty much eliminated SQLite as the reason for the threads not
running in parallel since I read up more about read/write locks and how
they are used in sqlite3. All my database connections are reading and
should be able to do so concurrently with no problems.

So anymore ideas on how to parallelize excel output over multiple
worksheets?

Thanks for all the help!
-Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top