Communication across Perl scripts

J

Jean

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
script two to be able to access that information. The easiest/dumbest
way is to write the data generated by script 1 as a file and read it
later using script 2. Is there any other way than this ? Can I store
the data in memory and make it available to script two (of-course with
support from my Linux ) ? Meaning malloc somedata by script 1 and make
script 2 able to access it.

There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.
 
T

Ted Zlatanov

J> I am searching for efficient ways of communication across two Perl
J> scripts. I have two scripts; Script 1 generates some data. I want my
J> script two to be able to access that information. The easiest/dumbest
J> way is to write the data generated by script 1 as a file and read it
J> later using script 2. Is there any other way than this ? Can I store
J> the data in memory and make it available to script two (of-course with
J> support from my Linux ) ? Meaning malloc somedata by script 1 and make
J> script 2 able to access it.

J> There is no guarantee that Script 2 will be run after Script 1. So
J> there should be some way to free that memory using a watchdog timer.

Depends on your latency and load requirements.

If you need speed, shared memory is probably your best bet.

If you need easy reliable implementation, put the information in files
(you can notify the reader there's a new file with fam/inotify or
SIGUSR1). That's not a dumb way as long as you implement it properly
and it fits your requirements.

If you need low latency, use a message queue.

Ted
 
T

Ted Zlatanov

SP> Options are plentiful. Have a look at "perldoc perlipc" for a good
SP> overview.

Unfortunately that page doesn't mention (nor should it) databases,
message queues, ESBs, loopback network interfaces, etc. Each one of
those may have distinct advantages over plain IPC, depending on the OS,
environment, policies, and existing infrastructure.

Ted
 
J

jl_post

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
script two to be able to access that information.
There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.

It sounds like there's no guarantee that either script will overlap
while running, either. Unless you write your data to a file on disk,
you'll need another program to act as some sort of broker to manage
the data you want to share.

You could try using a third-party broker, or perhaps use an SQL
database to store your data. ...or you could just write what you want
to share to disk, to be picked up by Script 2.

The easiest/dumbest way is to write the data generated by
script 1 as a file and read it later using script 2.

That may be easiest, but I don't think it's the dumbest. And if
you use this approach, I highly recommend using the "Storable" module
(it's a standard module so you should already have it). If you have a
reference to data in Script 1 (for example, $dataReference), you can
save it in one line (if you don't count the "use Storable" line), like
this:

use Storable qw(lock_nstore lock_retrieve);
lock_nstore($dataReference, "file_name");

and then Script 2 can read it in with one line like this:

use Storable qw(lock_nstore lock_retrieve);
my $dataReference = lock_retrieve("file_name");

Now Script 1 and Script 2 should both have a $dataReference that
refers to identical data.

Type "perldoc Storable" at the Unix/DOS prompt to read more about
this module.

It's hard to get much simpler than this. You might be tempted to
write your own file-writing and file-reading code, but if you do,
you'll have to handle your own file locking and your own to/from file
stream conversions. (And that'll probably take more than just two
lines of code to implement.)

If you're good with SQL, you may want to try a DBI module like
DBD::SQLite. The database for SQLite is stored on disk (so you don't
need a third-party program to manage the data), and it gives you the
flexibility in that if you ever have move your shared data to a
database server, most of the data-sharing code will remain unchanged.

Also, don't forget to "use strict;" and "use warnings;" if you
aren't using them already; they'll save you lots of headaches in the
long run.

I hope this helps,

-- Jean-Luc
 
C

C.DeRykus

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
script two to be able to access that information. The easiest/dumbest
way is to write the data generated by script 1 as a file and read it
later using script 2. Is there any other way than this ? Can I store
the data in memory and make it available to script two (of-course with
support from my Linux ) ? Meaning malloc somedata by script 1 and make
script 2 able to access it.

There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.

It sounds like a named pipe (see perlipc) would be
the easiest, most straightforward solution. (See
T.Zlatonov's suggestions though for other possible
non-IPC solutions which, depending on the exact
scenario, may be a better fit.)

With a named pipe though, each script just deals
with the named file for reading or writing while
the OS takes care of the messy IPC details for
you. The 2nd script will just block until data
is available so running order isn't a concern. As
long as the two scripts are running more or less
concurrently, I would guess memory use will be
manageable too since the reader will be draining
the pipe as the data arrives.
 
X

Xho Jingleheimerschmidt

Jean said:
I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
script two to be able to access that information. The easiest/dumbest
way is to write the data generated by script 1 as a file and read it
later using script 2.

This is usual not dumb. It is often the best way to do it.
Intermediate files and shell pipelines are by far the most common way
for me to do this--I never use anything other than those two unless I
have a compelling reason. Maybe you have a compelling reason, I don't
know and you haven't given us enough information to determine.

(Well, the third default option is to reconsider whether these two
scripts really need to be different rather than one script. I assume
you already did that rejected it for some good reason.)
Is there any other way than this ? Can I store
the data in memory and make it available to script two (of-course with
support from my Linux ) ? Meaning malloc somedata by script 1 and make
script 2 able to access it.

There are many ways to do this, and AFAIK they all either leave a lot to
be desired, or introduce annoying and subtle complexities.
There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.

Can't you control the timing of the execution of your scripts?

Xho
 
T

Ted Zlatanov

ML> Speaking of message queues, what do people recommend on Unix/Linux?

I've heard positive things about http://www.rabbitmq.com/ but haven't
used it myself. There's a lot of others, see
http://en.wikipedia.org/wiki/Category:Message-oriented_middleware

Depending on your needs, TIBCO may fit. It's very popular in the
financial industry and in my experience has been a pretty good system
over the last 3 years I've used it. The Perl bindings
are... well... usable. The major headaches I've had were when the
process is slow handling incoming data. Unless you write your Perl very
carefully, it's easy to block and balloon the memory size (because
TIBCO's queue uses your own application's memory) to multi-gigabyte
footprints. So forget about database interactions, for instance--you
have to move them to a separate process and use IPC or file drops.
Threads (as in "use threads") are probably a bad idea too.

Ted
 
T

Ted Zlatanov

CD> With a named pipe though, each script just deals with the named file
CD> for reading or writing while the OS takes care of the messy IPC
CD> details for you. The 2nd script will just block until data is
CD> available so running order isn't a concern. As long as the two
CD> scripts are running more or less concurrently, I would guess memory
CD> use will be manageable too since the reader will be draining the
CD> pipe as the data arrives.

The only warning I have there is that pipes are pretty slow and have
small buffers by default in the Linux kernel (assuming Linux). I forget
exactly why, I think it's due to terminal disciplines or something, I
didn't dig too much. I ran into this earlier this year.

So if you have a fast writer pipes can be problematic.

Ted
 
P

Peter Makholm

That may be easiest, but I don't think it's the dumbest. And if
you use this approach, I highly recommend using the "Storable" module
(it's a standard module so you should already have it).

As long as you just use it for a single host for very temporary files,
Storable is fine. But I have been bitten by Storable not being
compatible between versions or different installations one time to
many to call it 'highly recommended'.

If you need suport for every possible perl structure then Storable is
probably the only almost viable solution. But if simple trees of
hashrefs and arrayrefs is good enough the I consider JSON::XS a better
choice.


But it all depends on the exact needs and for the original poster he
might not come in situations where Storable shows it's nasty sides and
don't need the extra speed from JSON::XS or the more future-proofe and
portable format.

//Makholm
 
D

Dr.Ruud

I have two scripts; Script 1 generates some data. I want my
script two to be able to access that information. The easiest/dumbest
way is to write the data generated by script 1 as a file and read it
later using script 2. Is there any other way than this ?

I normally use a database for that. Script-1 can normally be scaled up
by making it do things in parallel (by chunking the input in an obvious
non-inter-dependable way).

Script-2 can also just be a phase in script-1. Once all children are
done processing, there normally is a reporting phase.

There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.

When the intermediate data is in temporary database tables, they
disappear automatically with the close of the connection.
 
M

Martijn Lievaart

As long as you just use it for a single host for very temporary files,
Storable is fine. But I have been bitten by Storable not being
compatible between versions or different installations one time to many
to call it 'highly recommended'.

Another way might be Data::Dumper.

M4
 
R

Randal L. Schwartz

Jean> I am searching for efficient ways of communication across two Perl
Jean> scripts. I have two scripts; Script 1 generates some data. I want my
Jean> script two to be able to access that information.

Look at DBM::Deep for a trivial way to store structured data, including
having transactions so the data will change "atomically".

And despite the name... DBM::Deep has no XS components... so it can even
be installed in a hosted setup with limited ("no") access to compilers.

Disclaimer: Stonehenge paid for part of the development of DBM::Deep,
because yes, it's *that* useful.

print "Just another Perl hacker,"; # the original
 
J

jl_post

As long as you just use it for a single host for very temporary files,
Storable is fine. But I have been bitten by Storable not being
compatible between versions or different installations one time to
many to call it 'highly recommended'.


I was under the impression that Storable::nstore() was cross-
platform compatible (as opposed to Storable::store(), which isn't).
"perldoc Storable" has this to say about it:
You can also store data in network order to allow easy
sharing across multiple platforms, or when storing on a
socket known to be remotely connected. The routines to
call have an initial "n" prefix for *network*, as in
"nstore" and "nstore_fd".

Unfortunately, it doesn't really specify the extent of what was
meant by "multiple platforms". I always thought that meant any
platform could read data written out by nstore(), but since I've never
tested it, I can't really be sure.

When you said you were "bitten" by Storable, were you using
Storable::store(), or Storable::nstore()?

-- Jean-Luc
 
P

Peter J. Holzer

CD> With a named pipe though, each script just deals with the named file
CD> for reading or writing while the OS takes care of the messy IPC
CD> details for you. The 2nd script will just block until data is
CD> available so running order isn't a concern. As long as the two
CD> scripts are running more or less concurrently, I would guess memory
CD> use will be manageable too since the reader will be draining the
CD> pipe as the data arrives.

The only warning I have there is that pipes are pretty slow and have
small buffers by default in the Linux kernel (assuming Linux).

Hmm. On my system (a 1.86 GHz Core2 - not ancient, but not the latest
and greatest, either) I can transfer about 800 MB/s through a pipe at
32 kB buffer size. For larger buffers it gets a bit slower, but a buffer
size of 1MB is still quite ok.

You may confuse that with other systems. Windows pipes have a reputation
for being slow. Traditionally Unix pipes were restricted to a rather
small buffer (8 or 10 kB). I do think Linux pipes become synchronous for
large writes, though.
I forget exactly why, I think it's due to terminal disciplines or
something, I didn't dig too much.

Unix pipes have nothing to do with terminals. Originally they were
implemented as files, BSD 4.x reimplemented them on top of Unix sockets.
I don't now how Linux implements them, but I'm quite sure that no
terminals are involved, and certainly no terminal disciplines.
Are you confusing them with ptys, perhaps?
I ran into this earlier this year.

Can you dig up the details?

hp
 
B

Bart Lateur

Randal said:
Look at DBM::Deep for a trivial way to store structured data, including
having transactions so the data will change "atomically".

And despite the name... DBM::Deep has no XS components... so it can even
be installed in a hosted setup with limited ("no") access to compilers.

Disclaimer: Stonehenge paid for part of the development of DBM::Deep,
because yes, it's *that* useful.

Ouch. DBM::Deep is buggy, in my experience.

I don't know the exact circumstances, but when using it to cache the XML
contents of user home nodes on Perlmonks, I regularly get crashes in it.
It has something to do with changing size of the data, IIRC from larger
than 8k to below 8k. But I could have gotten these details wrong, as it
has been many since I last tried it.
 
P

paul

Ouch. DBM::Deep is buggy, in my experience.

I don't know the exact circumstances, but when using it to cache the XML
contents of user home nodes on Perlmonks, I regularly get crashes in it.
It has something to do with changing size of the data, IIRC from larger
than 8k to below 8k. But I could have gotten these details wrong, as it
has been many since I last tried it.

you can try Named pipes, as a special type of file that allows for
interprocess communication
.. by using the "mknod" command you can create a name pile file, for
one process can open for reading
another for writing.
 
T

Ted Zlatanov

CD> With a named pipe though, each script just deals with the named file
CD> for reading or writing while the OS takes care of the messy IPC
CD> details for you. The 2nd script will just block until data is
CD> available so running order isn't a concern. As long as the two
CD> scripts are running more or less concurrently, I would guess memory
CD> use will be manageable too since the reader will be draining the
CD> pipe as the data arrives.
PJH> Hmm. On my system (a 1.86 GHz Core2 - not ancient, but not the latest
PJH> and greatest, either) I can transfer about 800 MB/s through a pipe at
PJH> 32 kB buffer size. For larger buffers it gets a bit slower, but a buffer
PJH> size of 1MB is still quite ok.

Hmm, sorry for stating that badly.

The biggest problem is that pipes *block* normally. So even if your
reader is slow only once in a while, as long as you're using the default
buffer (which is small), your writer will block too. In my situation
(the writer was receiving data from TIBCO) that was deadly.

I meant to say that but somehow it turned into "pipes are slow" between
brain and keyboard. Sorry.

PJH> You may confuse that with other systems. Windows pipes have a
PJH> reputation for being slow.

Yes, on Windows we had even more trouble for many reasons. But I was
only talking about Linux so I won't take that bailout :)

PJH> Unix pipes have nothing to do with terminals. Originally they were
PJH> implemented as files, BSD 4.x reimplemented them on top of Unix sockets.
PJH> I don't now how Linux implements them, but I'm quite sure that no
PJH> terminals are involved, and certainly no terminal disciplines.
PJH> Are you confusing them with ptys, perhaps?

Probably. I was on a tight deadline and the pipe approach simply did
not work, so I couldn't investigate in more detail. There's a lot more
resiliency in a file drop approach, too: if either side dies, the other
one is not affected. There is no leftover mess like with shared memory,
either. So I've been pretty happy with the file drop.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top