rake task dependencies: via timestamps table in database

jandot · Jun 6, 2008

Hi all,

There is some interest in the bioinformatics community for using rake
as a workflow tool (see e.g. http://www.bioinformaticszen.com/2008/05/organised-bioinformatics-experiments/).
Rake could be ideal for this type of work: a typical workflow will
take data and perform a first set of conversions on it (i.e. a task),
followed by a second set of conversions (that is dependent on the
first task), and so on.

However, bioinformaticians try to keep their data in databases rather
than files. And we found we need some workarounds to get dependencies
working. Does anyone know if it would be very difficult to add
functionality to rake to check a meta table in a database for
timestamps of tasks rather than looking at timestamps of files? I was
thinking of a table looking like the one below:

table: meta
task
modified_on
==============================================
001_load_data
20080602_0831
002_calculate_averages 20080602_0845
003_make_histogram_of_averages 20080602_0851

The rakefile would then contain:

task :001_load_data do
<do stuff>
<automatically update record in meta table>
end

task :002_calculate_averages => [:001_load_data] do
<do stuff>
<automatically update record in meta table>
end

task :003_make_histogram_of_averages => [:002_calculate_averages] do
<do stuff>
<automatically update record in meta table>
end

So if we had reloaded the data (001), then the timestamp for that task
in the meta table would be later than the one for task 002. As a
result, task 002 would automatically have to be rerun if we were to
run task 003.

I'd very much like to know if anyone has an idea how rake can be
extended this way. Basically, the dependency checker has to be
extended to look into a fixed table in a database...

Many thanks,
Jan Aerts

-
=================================
Dr Jan Aerts
Senior Bioinformatician
Genome Dynamics and Evolution Group
Wellcome Trust Sanger Institute
Hinxton
Cambridge CB10 1SA
UK

phone: +44 (0)1223 - 494732
web: http://www.sanger.ac.uk/Teams/Team29/

Pit Capitain · Jun 9, 2008

2008/6/6 jandot said:
There is some interest in the bioinformatics community for using rake
as a workflow tool (...)
However, bioinformaticians try to keep their data in databases rather
than files. And we found we need some workarounds to get dependencies
working. Does anyone know if it would be very difficult to add
functionality to rake to check a meta table in a database for
timestamps of tasks rather than looking at timestamps of files?

Hi Jan, if you look at the source code of rake's FileTask, you'll see
that this shouldn't be very difficult. The code consists of only four
methods and is easy to read. Feel free to ask again if you have more
questions.

Regards,
Pit

jandot · Jun 11, 2008

Hi Jan, if you look at the source code of rake's FileTask, you'll see
that this shouldn't be very difficult. The code consists of only four
methods and is easy to read. Feel free to ask again if you have more
questions.

Regards,
Pit

Thanks for that pointer, Pit. I think I got quite far now based on
FileTask. But something is still wrong. The trouble is that I have no
idea where, so can't really ask specific questions...
It looks like the block passed to a task is not executed.

I've put what I already have on github: http://github.com/jandot/biorake/tree/master

There's a sample directory with an example Rakefile that should work
once the extension is fixed. In addition, there are two test suites
copied from the file tests. Unfortunately, many of the tests still
fail.

If anybody could have a look at the tests and help to get them
running, I would be very thankfull.

Cheers,
jan.

Rake task dependencies question	3	Sep 10, 2010
changing rake task dependencies, runs in wrong order	0	Feb 5, 2007
Problem with rake and dependencies	5	Sep 26, 2008
Nuby Rake questions about FileList or File task	0	Dec 4, 2008
rake aborted! Validation failed:	4	Oct 12, 2010
ANN: Rake 0.8.4 Released	2	Mar 4, 2009
[ANN] Rake 0.8.2 Released	0	Sep 10, 2008
Getting namespaces from rake	0	Oct 26, 2007

rake task dependencies: via timestamps table in database

jandot

Pit Capitain

jandot

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads