Thoughts on speeding up PDF::API2

B

Bill H

In a recent post I asked about speeding up a perl script that uses
PDF::API2. I did some profiling of the code and see that the vast
majority of the time (about 90%) is used in going through all
the .pm's in the PDF::API2 library. Once it gets past all of the
initialization, my code that uses the api goes very fast, creating a
20+ pdf document with seperate image thumbnail files of each page (via
imagemagik) in less than 2 seconds.

In a meeting we were having tonight we was tossing around the idea of
having the program go through its initial setup and then "pause" to
wait for a signal to create a pdf file, then create the pdf, images
and then go back to the pause. Basically running all the time as a
service. Anyone see any reason why this would be a bad idea?

We further started wondering, instead of pausing, then running on a
signal and then going back to pause for next signal to make a pdf,
would it be possible to fork off a child at that point and have the
child create the pdf / images and end, while the parent stayed at the
pause position waiting for another signal to fork off a child. If we
forked off a child, would it start from the begining of the script or
would it start at the same place (probably next line) in the perl
script it was forked off of?

Any thoughts?

Bill H
 
B

Ben Morrow

Quoth Bill H said:
In a recent post I asked about speeding up a perl script that uses
PDF::API2. I did some profiling of the code and see that the vast
majority of the time (about 90%) is used in going through all
the .pm's in the PDF::API2 library. Once it gets past all of the
initialization, my code that uses the api goes very fast, creating a
20+ pdf document with seperate image thumbnail files of each page (via
imagemagik) in less than 2 seconds.

In a meeting we were having tonight we was tossing around the idea of
having the program go through its initial setup and then "pause" to
wait for a signal to create a pdf file, then create the pdf, images
and then go back to the pause. Basically running all the time as a
service. Anyone see any reason why this would be a bad idea?

No, it's a very good idea. This is exactly what systems like mod_perl
and FastCGI do to speed things up. You do have to be careful to clear
everything out between one run and the next...
We further started wondering, instead of pausing, then running on a
signal and then going back to pause for next signal to make a pdf,
would it be possible to fork off a child at that point and have the
child create the pdf / images and end, while the parent stayed at the
pause position waiting for another signal to fork off a child.

....which is something fork allows you to avoid :). fork does have some
overhead, which is why programs like Apache go to some trouble to avoid
forking a new process as each request comes in, but since your previous
model was a whole new perl process for each run this probably isn't
significant.

If anyone suggests using threads from perl on a system that has a real
fork, laugh :).
If we forked off a child, would it start from the begining of the
script or would it start at the same place (probably next line) in the
perl script it was forked off of?

perldoc -f fork
man 2 fork

Basically, both old and new processes will return from the fork call,
the only difference between them at that point being what is returned.

Ben
 
X

xhoster

Bill H said:
In a recent post I asked about speeding up a perl script that uses
PDF::API2. I did some profiling of the code and see that the vast
majority of the time (about 90%) is used in going through all
the .pm's in the PDF::API2 library. Once it gets past all of the
initialization, my code that uses the api goes very fast, creating a
20+ pdf document with seperate image thumbnail files of each page (via
imagemagik) in less than 2 seconds.

If 10% of the time is spent doing something that takes 2 seconds,
then 100% of the time is 20 seconds and the module loading must be taking
almost 18 seconds. That is outrageous on anything modestly recent
computer. On my machine, loading PDF::API2 takes ~0.5 seconds.

One possible problem is if the PDF::API2 location install show up late in
@INC, and the stuff earlier in @INC is on slow network drives. For each of
the files it opens as part of loading PDF:API2, it has to "stat" its way
through the entire @INC list before finally finding it.

In a meeting we were having tonight we was tossing around the idea of
having the program go through its initial setup and then "pause" to
wait for a signal to create a pdf file, then create the pdf, images
and then go back to the pause. Basically running all the time as a
service. Anyone see any reason why this would be a bad idea?

Nope. Sounds like a good idea. Working out the "signal" could be tricky.
We further started wondering, instead of pausing, then running on a
signal and then going back to pause for next signal to make a pdf,
would it be possible to fork off a child at that point and have the
child create the pdf / images and end, while the parent stayed at the
pause position waiting for another signal to fork off a child.

Yes, you can do that, but it probably wouldn't be worthwhile. Since the
make a pdf part is fast, what is the point of parallelizing it? It would
add complexity for probably little to no benefit.

If we
forked off a child, would it start from the begining of the script or
would it start at the same place (probably next line) in the perl
script it was forked off of?

The new process and the old process start/continue at the same place. It
isn't the next line, it is the" returning" of the fork.
$x=fork();

The fork itself only happens in the parent, but the assignment to $x
happens in both the parent and the child.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
B

Bill H

If 10% of the time is spent doing something that takes 2 seconds,
then 100% of the time is 20 seconds and the module loading must be taking
almost 18 seconds.  That is outrageous on anything modestly recent
computer. On my machine, loading PDF::API2 takes ~0.5 seconds.

One possible problem is if the PDF::API2 location install show up late in
@INC, and the stuff earlier in @INC is on slow network drives.  For each of
the files it opens as part of loading PDF:API2, it has to "stat" its way
through the entire @INC list before finally finding it.


Nope.  Sounds like a good idea.  Working out the "signal" could be tricky.


Yes, you can do that, but it probably wouldn't be worthwhile.  Since the
make a pdf part is fast, what is the point of parallelizing it?  It would
add complexity for probably little to no benefit.


The new process and the old process start/continue at the same place.  It
isn't the next line, it is the" returning" of the fork.
$x=fork();

The fork itself only happens in the parent, but the assignment to $x
happens in both the parent and the child.

Xho

--
--------------------http://NewsReader.Com/--------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Thanks Ben, Xho for your comments. I am glad to see the idea we had
wasn't that far fetched.

On the signal to do something, the part of the website that calls the
perl program using PDF::API2 is in php and uses php sessions to talk
back and forth to each other. I saw a perl module that let you access
php sessions and wonder about using that method to send the signal.
Has anyone had any experience using php sessions in perl? Are they
continously updated? Or can anyone think of a better way of signaling
the perl script from another program?

Bill H
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top