1. Ruby result: 101 seconds , 2. Java result:9.8 seconds, 3. Perl result:62 seconds

I

Isaac Gouy

Ryan Davis wrote:
-snip-
That said... what did we learn from this thread?

Yes... benchmarks are dumb (see also, 3 lines down)
That there is no accounting for taste (ugliest code/
algorithm ever).
A good algorithm goes a long way.
2/3rds of the ppl are going to mis-analyze anyhow.

.. Nothing really.

What we learn depends on many things, including
- initial expectations, are we even in the right ball-park?
- openmindedness, are we willing to hear other peoples interpretations?
- ...

So if we initially expected a little Ruby program to have the same
performance characteristics as a roughly equivalent Java program, we
learned quite a lot.

(We don't all know the same things.)
 
J

Josef 'Jupp' SCHUGT

Hi!

1. Ruby result: 101 seconds
2. Java result:9.8 seconds
3. Perl result:62 seconds

My Ruby implementation of a sieve fishing for primes takes 4.8 seconds
or so on an AMD K6 running at 350 MHz (128 MB, Aurox Linux) while the
original program takes about 1362 seconds. Rule of thumb: Improving
algorithm good, improving hardware or language bad (to use the famous
animal farm style :)

time=Time.now.to_f
p, f = [ 2 ], 2

3.step(50000, 2) do |i|
r = Math.sqrt(i).to_i
p.each { |f| break if (i%f).zero? or f > r}
p.push(i) if (i%f).nonzero?
end
puts p
puts p.length
puts Time.now.to_f - time

What does this program do? First it assumes that 2 is the only even
prime so after initializing the list of primes to 2 it can restrict
the search to odd numbers. Iterating over the candidates for primes it
first computes the square root (to improve efficiency of comparisms
the value is converted into an integer). Iterating over all primes
already collected the program tries to find a prime factor of the
candidate for prime - i.e. a number where the remainder after division
by the prime in question is zero. If such a factor is found the loop
searching for a prime factor is aborted. The same happens if the
factor in question exceeds the pre-computed square root. Just after
the checking loop the program looks if the most recently checked
potential prime factor is no prime factor - i.e. division of the
candidate for prime by the potential factor yields nonzero remainder.
If that is the case a new prime has been found and the candidate for
prime is added to the list of primes.

Josef 'Jupp' SCHUGT
 
J

Josef 'Jupp' SCHUGT

Hi!
time=Time.now.to_i
p, f = [ 2 ], 2

3.step(50000, 2) do |i|
r = Math.sqrt(i).to_i
p.each { |f| break if (i%f).zero? or f > r}
p.push(i) if (i%f).nonzero?
end
puts p
puts p.length
puts Time.now.to_i - time

Follows C implementation:

#include <stdio.h>
#include <math.h>

void main(void) {
unsigned p[25000], f, r, idx = 1;
int i, j;

p[0] = 2;

for (i = 3; i <= 50000; i += 2) {
r = sqrt(i);
for (f = p[j = 0]; j < idx; f = p[++j]) {
if (!(i%f) || f > r) break;
}
if (i%f) p[idx++] = i;
}
for (f = p[i = 0]; i < idx; i++) printf("%u\n", p);
printf("%u\n", idx);
}

Runtime (this time estimated using 'time' command):

real: 0.113s
user: 0.064s
sys: 0.009s

I should add that I of course redirect the output to a file, not to
stdout because otherwise I would essentialy measure the terminal's
scroll speed.

The C program almost is a 1:1 equivalent of the Ruby one. I know that
I am wasting memory using "unsigned p[25000]" but I wanted to avoid
the dynamic memory allocation overhead while assuming not to know the
actual number of primes. Obviously 25000 is the upper limit for the
number of primes up to 50000.

Note that the C program could be optimized further (using pointer
arithmetics) but the speedup were that between "fast as hell" and
"ridiculously fast" - pure nonsense.

What does one learn from the speedup? That using prior knowledge can
tremendously improve speed.

In this case the algorithm knowledge is that one need not check if a
number can be divided by any number smaller than it but that it is
sufficient to check divisibility by all primes smaller than its square
root (and that by definition besides 2 no even number can be prime).
The fact knowledge is the list of all primes smaller than the present
candidate for prime.

Note that it is not a must to collect all primes! To find all primes
up to 50000 one only needs to store all primes smaller than 223 - the
integer part of square root of 50000. I store all of them because in
this case memory is not a problem and adding checks would slow down
the programs.
Josef 'Jupp' SCHUGT
 
K

Keith Fahlgren

Hey,

I'd like to have a daemon process that watches for new files in a
specific directory and then runs a command on them once they're there.
It seems like a problem someone would have solved before but I haven't
been able to dig up anything about it on the web yet (perhaps I'm not
phrasing it correctly). Just thought I'd bounce a question here before
I started coding it myself.

Any thoughts?


thanks,
Keith
 
D

Daniel Berger

Keith said:
Hey,

I'd like to have a daemon process that watches for new files in a
specific directory and then runs a command on them once they're there.
It seems like a problem someone would have solved before but I haven't
been able to dig up anything about it on the web yet (perhaps I'm not
phrasing it correctly). Just thought I'd bounce a question here before
I started coding it myself.

Any thoughts?

There's Ara Howard's "dirwatch". For Win32, there's win32-changejournal.

I don't know if dirwatch works on Win32.

Regards,

Dan
 
B

Belorion

Hey,
=20
I'd like to have a daemon process that watches for new files in a
specific directory and then runs a command on them once they're there.
It seems like a problem someone would have solved before but I haven't
been able to dig up anything about it on the web yet (perhaps I'm not
phrasing it correctly). Just thought I'd bounce a question here before
I started coding it myself.
=20
Any thoughts?
=20
=20
thanks,
Keith

Take a look at "Daedalus", its part of the FreeBSD Sysutils
(http://www.freebsd.org/es/ports/sysutils.html). For some help in how
to configure, use it, see
http://manuals.textdrive.com/read/chapter/61#page147.

Basically, it runs in the background and will execute any system
commands you want (which in your case might be another Ruby script
which can detect if a file has been added to the dir). It will
respond by executing another script of your choosing.

Matt
 
A

Ara.T.Howard

There's Ara Howard's "dirwatch". For Win32, there's win32-changejournal.

I don't know if dirwatch works on Win32.

nope - though it could be made to pretty easily. basically i wrote it because
of the lack of a changejournal type functionality for *nix filesystems in
general. plus dirwatch is really designed to setup a processing system which
runs external programs on files as they arrive in directories vs. running a
ruby block or some such.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
M

Michael Neumann

Keith said:
Hey,

I'd like to have a daemon process that watches for new files in a
specific directory and then runs a command on them once they're there.
It seems like a problem someone would have solved before but I haven't
been able to dig up anything about it on the web yet (perhaps I'm not
phrasing it correctly). Just thought I'd bounce a question here before
I started coding it myself.

A very simple one is here:

http://www.ntecs.de/viewcvs/viewcvs/Utils/file_change_notify.rb?rev=232&view=auto

Regards,

Michael
 
J

John Carter

I'd like to have a daemon process that watches for new files in a
specific directory and then runs a command on them once they're there.

If you are on Linux, it would be trivial to wrap something around
/dev/inotify

From /usr/src/linux/Documentation/filesystems/inotify.txt
or..
http://www.ibiblio.org/peanut/Kernel-2.6.12/filesystems/inotify.txt

inotify
a powerful yet simple file change notification system



Document started 15 Mar 2005 by Robert Love <[email protected]>

(i) User Interface

Inotify is controlled by a device node, /dev/inotify. If you do not use
udev,
this device may need to be created manually. First step, open it

int dev_fd = open ("/dev/inotify", O_RDONLY);

Change events are managed by "watches". A watch is an (object,mask) pair
where
the object is a file or directory and the mask is a bitmask of one or more
inotify events that the application wishes to receive. See
<linux/inotify.h>
for valid events. A watch is referenced by a watch descriptor, or wd.

Watches are added via a file descriptor.

Watches on a directory will return events on any files inside of the
directory.

Adding a watch is simple,

/* 'wd' represents the watch on fd with mask */
struct inotify_request req = { fd, mask };
int wd = ioctl (dev_fd, INOTIFY_WATCH, &req);

You can add a large number of files via something like

for each file to watch {
struct inotify_request req;
int file_fd;

file_fd = open (file, O_RDONLY);
if (fd < 0) {
perror ("open");
break;
}

req.fd = file_fd;
req.mask = mask;

wd = ioctl (dev_fd, INOTIFY_WATCH, &req);

close (fd);
}



John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong later."

From this principle, all of life and physics may be deduced.
 
G

gabriele renzi

Ara.T.Howard ha scritto:
plus dirwatch is really designed to setup a processing system
which
runs external programs on files as they arrive in directories vs. running a
ruby block or some such.

I don't know the internals nor the api for dirwatch, but could ypu
explain where the difference would be ?
 
A

Ara.T.Howard

Ara.T.Howard ha scritto:
plus dirwatch is really designed to setup a processing system

I don't know the internals nor the api for dirwatch, but could ypu explain
where the difference would be ?

well, dirwatch is an application vs. and api. so you don't have something
like

open('directory').on('created') do |file|
puts "#{ file } created"
end

or however you might imagine an api for watching directory events...


with dirwatch, which is a command line tool, you'd do something like this to
setup a watch

~ > dirwatch some_directory create

this initializes an sqlite database, config files, log files, generates sample
scripts, etc. all this will end up in ./some_directory/.dirwatch/. example:

jib:~ > mkdir some_directory

jib:~ > dirwatch some_directory/ create
---
/home/ahoward/some_directory:
dirwatch_dir : /home/ahoward/some_directory/.dirwatch
db : /home/ahoward/some_directory/.dirwatch/db
logs_dir : /home/ahoward/some_directory/.dirwatch/logs
config : /home/ahoward/some_directory/.dirwatch/dirwatch.conf
commands_dir : /home/ahoward/some_directory/.dirwatch/commands


if we peeked in dirwatch.conf we'd see something like
...
...
...
actions:
updated :
-
command: simple.sh
type: simple
pattern: ^.*$
timing: sync
-
command: yaml.rb
type: simple
pattern: ^.*$
timing: sync
...
...
...

(did i mention i love yaml? ;-) )

the 'actions' section is where you setup what to do on certain events. the
possible events are 'created', 'modified', 'deleted', or 'existing' (all of
which are pretty obvious) and the action 'updated' which is the union of
'created' or 'modified'. so this config is saying that, whenever a file is
updated we'll run two commands 'simple.sh' and 'yaml.rb'. note that a list of
commands can be specified - they will be run in that order. the list of
commands themselves are configured with a few paramters

command:

the command to run. the .dirwatch/commands_dir/ is pre-pended to PATH
when running commands so it's convenient to put them there. the
example/auto-generated commands are in that directory.

type:

this is the calling convention. for example simple commands are called
like

simple.sh file_that_was_updated mtime_of_that_file

and is called once for each file. yaml commands are called like

yaml.rb < (list of __every__ updated file and it's mtime on stdin in yaml format)

there are two other types but essentially you just have a choice - your
script is run once with every file or it gets all the files at once on
stdin.

pattern:

only files matching this regex will get passed to this command. dirwatch
itself has a --pattern option which causes it to see only files matching
that pattern but that affects everything. this is on a per command basis.
so you might see

updated :
-
command: gif2png
type: simple
pattern: ^.*\.gif$
timing: sync
-
command: png2ps
type: simple
pattern: ^.*\.png$
timing: sync

timing:

whether we wait for each command to finish or just spawn in the background
and collect exit_status later. this is extremely dangerous on systems
that could update 1,000,000 files at once.



next you'd simply start dirwatch using

jib:~ > dirwatch some_directory/ watch
I, [2005-07-21T09:04:48.668571 #27750] INFO -- : ** STARTED **
I, [2005-07-21T09:04:48.669050 #27750] INFO -- : config </home/ahoward/some_directory/.dirwatch/dirwatch.conf>
I, [2005-07-21T09:04:48.669252 #27750] INFO -- : flat <false>
I, [2005-07-21T09:04:48.669324 #27750] INFO -- : files_only <false>
I, [2005-07-21T09:04:48.682278 #27750] INFO -- : no_follow <false>
I, [2005-07-21T09:04:48.682358 #27750] INFO -- : pattern <>
I, [2005-07-21T09:04:48.682461 #27750] INFO -- : n_loops <>
I, [2005-07-21T09:04:48.682629 #27750] INFO -- : interval <00:05:00>
I, [2005-07-21T09:04:48.683028 #27750] INFO -- : lockfile </home/ahoward/some_directory/.dirwatch.lock>
I, [2005-07-21T09:04:48.683147 #27750] INFO -- : tmpwatch[all] <false>
I, [2005-07-21T09:04:48.683213 #27750] INFO -- : tmpwatch[nodirs] <false>
I, [2005-07-21T09:04:48.683278 #27750] INFO -- : tmpwatch[force] <true>
I, [2005-07-21T09:04:48.683454 #27750] INFO -- : tmpwatch[age] <30 days> == <2592000.0s>
I, [2005-07-21T09:04:48.683530 #27750] INFO -- : tmpwatch[rm] <rm_rf>
...
...
...

now, if i dropped a file into some_directory/ in another terminal:

jib:~/some_directory > touch a

i'd see this in the terminal running dirwatch

I, [2005-07-21T09:06:13.721967 #27839] INFO -- : ACTION.UPDATED.0.0 - cmd : simple.sh '/home/ahoward/some_directory/a' '2005-07-21 15:05:38.000000'
I, [2005-07-21T09:06:13.795296 #27839] INFO -- : ACTION.UPDATED.0.0 - exit_status : 0

the 'ACTION.UPDATED.0.0' is a uniq tag that makes finding the exit_status easy
in the event that the command was run 'async' and it's exit_status ends up in
the log 4000 lines later...


when running from the console like this the stdout of the command run shows
too, so i also saw this - the output of running simple.sh - in the terminal
running dirwatch:

dirwatch_dir: </home/ahoward/some_directory>
dirwatch_action: <updated>
dirwatch_type: <simple>
dirwatch_n_paths: <1>
dirwatch_path_idx: <0>
dirwatch_path: </home/ahoward/some_directory/a>
dirwatch_mtime: <2005-07-21 15:05:38.000000>
dirwatch_pid: <27839>
dirwatch_id: <ACTION.UPDATED.0.0>
command_line: </home/ahoward/some_directory/a 2005-07-21 15:05:38.000000>
path: </home/ahoward/some_directory/a>
mtime: <2005-07-21 15:05:38.000000>


simple.sh basically just prints it's environment and the argv it was called
with, here's the whole script:

jib:~/some_directory > cat .dirwatch/commands/simple.sh
#!/bin/sh
echo "dirwatch_dir: <$DIRWATCH_DIR>"
echo "dirwatch_action: <$DIRWATCH_ACTION>"
echo "dirwatch_type: <$DIRWATCH_TYPE>"
echo "dirwatch_n_paths: <$DIRWATCH_N_PATHS>"
echo "dirwatch_path_idx: <$DIRWATCH_PATH_IDX>"
echo "dirwatch_path: <$DIRWATCH_PATH>"
echo "dirwatch_mtime: <$DIRWATCH_MTIME>"
echo "dirwatch_pid: <$DIRWATCH_PID>"
echo "dirwatch_id: <$DIRWATCH_ID>"
echo "command_line: <$@>"
path=$1
mtime=$2
echo "path: <$path>"
echo "mtime: <$mtime>"

you'll notice quite a bit of information is passed via the environment and
that the mtime is also passed in on the command line. typical programs won't
use all this - but it's there. 'dirwatch --help' explains the meaning of
these environment variables.


so, normally you don't run like that (from the console) and instead have
something like this in your crontab to maintain an 'immortal' daemon

*/15 * * * * dirwatch /home/ahoward/some_directory watch --daemon

this does NOT start a daemon every fifteen minutes. the daemon always sets up
of a lockfile and refuses to start if one is already running. so, this just
makes sure exactly one daemon is running at all times - even after machine
reboots or if some bug causes dirwatch to crash. this may seem a bit odd but
those of you that don't have root on all your boxes in the office will
understand why it can work like that - you can setup robust daemons without
any special privledges. of course you can start it from init.d and it
supports 'start', 'stop', and 'restart' arguments too so this is trivial.

so that's it basically. dirwatch simply scans a directory, compares what it
finds to what's in it's database (sqlite), and runs appropriate actions in the
way you've configured it to do, and then sleeps for a while. it never stops,
automatically logs rolls, and does some other stuff too. there's a whole lot
of options like recursing into subdirectories, ignoring anything that's not a
file, a tmpwatch like facility built-in, etc. but you can read about that in
with --help.

cheers.

btw. i inlined the output of --help below. note that i just did a massive
re-write so some of this is a little off, but it's close.


-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================

NAME
dirwatch v0.9.0

SYNOPSIS
dirwatch [ options ]+ mode [ directory = ./ ]

DESCRIPTTION
dirwatch is a tool used to rapidly build processing systems from file system
events.

dirwatch manages an sqlite database that mirrors the state of a directory and
then triggers user definable event handlers for certain filesystem activities
such file creation, modification, deletion, etc. dirwatch can also implement
a tmpwatch like behaviour to ensure files of a certain age are removed from
the directory being watched. dirwatch normally runs as a daemon process by
first sychronizing the database inventory with that of the directory and then
firing appropriate triggers as they occur.

-----------------------------------------------------------------------------
the following actions may have triggers configured for them
-----------------------------------------------------------------------------

created -> a file was detected that was not already in the database
modified -> a file in the database was detected as being modified
updated -> a file was created or modified (union of these two actions)
deleted -> a file in the database is no longer in the directory
existing -> a file in the database still exists in the directory and has not
been modified

-----------------------------------------------------------------------------
the command line 'mode' must be one of the following
-----------------------------------------------------------------------------

create (c) -> initialize the database and supporting files
watch (w) -> monitor directory and trigger actions in the foreground
start (S) -> spawn a daemon watcher in the background
restart (R) -> (re)spawn a daemon watcher in the background
stop (H) -> stop/halt any currently running watcher
status (T) -> determine if any watcher is currently running
truncate (D) -> truncate/delete all entries from the database
archive (a) -> create a tar.gz archive of a watch's directory contents
list (l) -> dump database to stdout in silky smooth yaml format

for all modes the command line argument must be the name of the directory to
which to apply the operation - which defaults to the current directory.

-----------------------------------------------------------------------------
mode: create (c)
-----------------------------------------------------------------------------

initializes a storage directory with all required database files, logs,
command directories, sample configuration, sample programs, etc.

by default the storage dir will be stored in a subdirectory specfied as the
'directory' command line argument, eg:

directory/.dirwatch/

the --dirwatch_dir option can be used to specify an alternate location. this
is particularly important to use if you, for instance, have an external
program like tmpwatch running which might delete this directory!

when a dirwatch storage directory is created a few files are directories are
created underneath it. the hierarchy is

directory/.dirwatch/
commands/
logs/
db
dirwatch.conf
dirwatch.pid

where

commands/ -> any programs placed here will be automatically found as
this location is added to PATH
logs/ -> logs are kept here and are auto-rolled to no scrubbing is needed
db -> this is an sqlite database file
dirwatch.conf -> a yaml configuration file used to configure which commands
to trigger for which actions
dirwatch.pid -> a file containing the pid of the daemon process

examples:

0) initialize the directory incoming_data/ to be dirwatched using all
defaults

~ > dirwatch create incoming_data/

1) initialize the directory incoming_data/ to be dirwatched storing all
metadata in /usr/local/dirwatch/incoming_data

~ > dirwatch create incoming_data/ --dirwatch_dir=/usr/local/dirwatch/incoming_data/

-----------------------------------------------------------------------------
mode: start (S)
-----------------------------------------------------------------------------

dirwatch is normally run in daemon mode. the start mode is equivalent to
running in 'watch' mode with the '--daemon' and '--quiet' flags.

examples:

~ > dirwatch start incoming_data/

-----------------------------------------------------------------------------
mode: restart (R)
-----------------------------------------------------------------------------

'restart' mode checks a watcher's pidfile and either restarts the currently
running watcher or starts a new one as in 'start' mode. this is equivalent to
sending SIGHUP to the watcher daemon process.

examples:

~ > dirwatch restart incoming_data/

-----------------------------------------------------------------------------
mode: stop (H)
-----------------------------------------------------------------------------

'stop' mode checks for any process watching the specified directory and kills
this process if it exists. this is equivalent to sending TERM to the watcher
daemon process. the process will not exit immediately but will do at the
first possible safe opportunity. do not kill -9 the daemon process.

examples:

~ > dirwatch stop incoming_data/

-----------------------------------------------------------------------------
mode: status (T)
-----------------------------------------------------------------------------

'status' mode reports whether or not a watcher is running for the given
directory.

examples:

~ > dirwatch status incoming_data/

-----------------------------------------------------------------------------
mode: truncate (D)
-----------------------------------------------------------------------------

'truncate' (delete) mode atomically empties the database of all state.

examples:

~ > dirwatch truncate incoming_data/

-----------------------------------------------------------------------------
mode: archive (a)
-----------------------------------------------------------------------------

archive mode is used to atomically create a tgz file of a the storage
directory for a given directory while respecting the locking subsystem.

examples:

~ > dirwatch archive incoming_data/

essentially this is useful for making hot backups. you system must have the
tar command for this to operate.

-----------------------------------------------------------------------------
mode: watch (w)
-----------------------------------------------------------------------------

this is the biggie.

dirwatch is designed to run as a daemon, updating the database inventory at
the interval specified by the '--interval' option (5 minutes by default) and
firing appropriate trigger commands. two watchers may not watch the same
dir simoultaneously and attempting the start a second watcher will fail when
the second watcher is unable to obtain the pid lockfile. it is a non-fatal
error to attempt to start another watcher when one is running and this failure
can be made silent by using the '--quiet' option. the reason for this is to
allow a crontab entry to be used to make the daemon 'immortal'. for example,
the following crontab entry

*/15 * * * * dirwatch directory --daemon --dbdir=0 \
--files_only --flat \
--interval=10minutes --quiet

or (same but shorter)

*/15 * * * * dirwatch directory -D -d0 -f -F -i10m -q

will __attempt__ to start a daemon watching 'directory' every fifteen minutes.
if the daemon is not already running one will started, otherwise dirwatch will
simply fail silently (no cron email sent due to stderr).

this feature allows a normal user to setup daemon processes that not only will
run after machine reboot, but which will continue to run after other terminal
program behaviour.

the meaning of the options in the above crontab entry are as follows

--daemon -> become a child of init and run forever
--dbdir -> the storage directory, here the default is specified
--files_only -> inventory files only (default is files and directories)
--flat -> do not recurse into subdirectories (default recurses)
--interval -> generate inventory, at mininum, every 10 minutes
--quiet -> be quiet when failing due to another daemon already watching

as the watcher runs and maintains the inventory it is noted when
files/directories (entries) have been created, modified, updated, deleted, or
are existing. these entries are then handled by user definable triggers as
specified in the config file. the config file is of the format

...
actions :
created :
commands :
...
updated :
commands :
...
...
...

where the commands to be run for each trigger type are enumerated. each
command entry is of the following format:
...
-
command : command to run
type : calling convention
pattern : filter files further by this pattern
timing : synchronous or asynchronous execution
...

the meaning of each field is as follows:

command: this is the program to run. the search path for the program is
determined dynamically by the action run. for instance, when a
file is discovered to be 'modified' the search path for the
command will be

dbdir/commands/modified/ + dbdir/commands/ + $PATH

this dynamic path setting simply allows for short pathnames if
commands are stored in the dbdir/commands/* subdirectories.

type: there are four types of commands. the type merely indicates the
calling convention of the program. when commands are run there
are two peices of information which must be passed to the
program, the file in question and the mtime of that file. the
mtime is less important but programs may use it to know if the file
has been changed since they were spawned. mtime will probably be
ignored for most commands. the four types of commands fall into
two catagories: those commands called once for each file and those
types of commands called once with __all__ files

each file:

simple: the command will be called with three arguments: the file
in question, the mtime date, and the mtime time. eg:

command foobar.txt 2002-11-04 01:01:01.1234

expaned: the command will be have the strings '@file' and
'@mtime' replaced with appropriate values. eg:

command '@file' '@mtime'

expands to (and is called as)

command 'foobar.txt' '2002-11-04 01:01:01.1234'

all at once:

filter: the stdin of the program will be given a list where each
line contains three items, the file, the mtime data, and
the mtime time.

yaml: the stdin of the program will be given a list where each
entry contains two items, the file and the mtime. the
format of the list is valid yaml and the schema is an
array of hashes with the keys 'path' and 'mtime'.

pattern: all the files for a given action are filtered by this pattern,
and only those files matching pattern will have triggers fired.


timing: if timing is asynchronous the command will be run and not waited
for before starting the next command. asynchronous commands may
yield better performance but may also result in many commands
being run at once. asyncronous commands should not load the
system heavily unless one is looking to freeze a machine.
synchronous commands are spawned and waited for before the next
command is started. a side effect of synchronous commands is
that the time spent waiting may sum to an ammount of time greater
than the interval ('--interval' option) specified - if the amount
of time running commands exceeds the interval the next inventory
simply begins immeadiately with no pause. because of this one
should think of the interval used as a minimum bound only,
especially when synchronous commands are used.


note that sample commands of each type are auto-generated in the
dbdir/commands directory. reading these should answer any questions regarding
the calling conventions of any of the four types. for other questions regard
the sample config, which is also auto-generated.


-----------------------------------------------------------------------------
mode: list (l)
-----------------------------------------------------------------------------

dump the contents of the database in yaml format for easy viewing/parsing


ENVIRONMENT

for dirwatch itself:

export SLDB_DEBUG=1 -> cause sldb library actions (sql) to be logged
export LOCKFILE_DEBUG=1 -> cause lockfile library actions to be logged

for programs run by dirwatch the following environment variables will be set:

DIRWATCH_DIR -> the directory being watched
DIRWATCH_ACTION -> action type, one of 'instance', 'created', 'modified',
'updated', 'deleted', or 'existing'
DIRWATCH_TYPE -> command type, one of 'simple', 'expanded', 'filter', or
'yaml'
DIRWATCH_N_PATHS -> the total number of paths for this action. the paths
themselves will be passed to the program in a different
way depending on DIRWATCH_TYPE, for instance on the
command line or on stdin, but this number will always
be the total number of paths the program should expect.
DIRWATCH_PATH_IDX -> for some command types, like 'simple', the program will
be run more than once to handle all paths since calling
convention only allows the program to be called with
one path at a time. this number is the index of the
current path in such cases. for instance, a 'simple'
program may only be called with one path at a time so
if 10 files were created in the directory that would
result in the program being called 10 times. in each
case DIRWATCH_N_PATHS would be 10 and DIRWATCH_PATH_IDX
would range from 0 to 9 for each of the 10 calls to the
program. in the case of 'filter' and 'yaml' command
types, where every path is given at once on stdin this
value will be equal to DIRWATCH_N_PATHS
DIRWATCH_PATH -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the path
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all paths are
available on stdin.
DIRWATCH_MTIME -> for 'simple' and 'expanded' command types, which are
called once for each path, this will contain the mtime
the program is being called with. in the case of
'filter' or 'yaml' command types the varible contains
the string 'stdin' implying that all mtimes are
available on stdin.
DIRWATCH_PID -> the pid of dirwatch watcher process
DIRWATCH_ID -> an identifier for this action that will be unique for
any given run of a dirwatch watcher process.
restarting the watcher resets the generator. this
identifier is logged in the dirwatch watcher logs to is
useful to match program logs with dirwatch logs
PATH -> the normal shell path. for each program run the PATH
is modified to contain the commands dir of the dirwatch
watcher processs. normally this is
$DIRWATCH_DIR/.dirwatch/commands/


FILES
directory/.dirwatch/ -> dirwatch data files
directory/.dirwatch/dirwatch.conf -> default configuration file
directory/.dirwatch/commands/ -> default location for triggers
directory/.dirwatch/db -> sldb/sqlite database
directory/.dirwatch/dirwatch.pid -> default pidfile
directory/.dirwatch/logs/ -> automatically rolled log files

DIAGNOSTICS
success -> $? == 0
failure -> $? != 0


AUTHOR
(e-mail address removed)


BUGS
1 < bugno && bugno < 42

OPTIONS
--help, -h
this message
--log=path, -l
set log file - (default stderr)
--verbosity=verbostiy, -v
0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
--config=path
valid path - specify config file (default nil)
--template=[path]
valid path - generate a template config file in path (default stdout)
--dirwatch_dir=dirwatch_dir
specify dirwatch storage dir
--daemon, -d
specify daemon mode
--quiet, -q
be wery wery quiet
--flat, -F
do not recurse into subdirectories
--files_only, -f
consider only files
--no_follow, -n
do not follow links
--pattern=pattern, -p
consider only entries that match pattern
--n_loops=n_loops, -N
loop only this many times before exiting
--interval=interval, -i
sleep at least this long between loops
--lockfile=[lockfile], -k
specify a lockfile path
--show_input, -s
show input to all commands run
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top