Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Thoughts on speeding up PDF::API2

Reply
Thread Tools

Thoughts on speeding up PDF::API2

 
 
Bill H
Guest
Posts: n/a
 
      09-12-2008
In a recent post I asked about speeding up a perl script that uses
PDF::API2. I did some profiling of the code and see that the vast
majority of the time (about 90%) is used in going through all
the .pm's in the PDF::API2 library. Once it gets past all of the
initialization, my code that uses the api goes very fast, creating a
20+ pdf document with seperate image thumbnail files of each page (via
imagemagik) in less than 2 seconds.

In a meeting we were having tonight we was tossing around the idea of
having the program go through its initial setup and then "pause" to
wait for a signal to create a pdf file, then create the pdf, images
and then go back to the pause. Basically running all the time as a
service. Anyone see any reason why this would be a bad idea?

We further started wondering, instead of pausing, then running on a
signal and then going back to pause for next signal to make a pdf,
would it be possible to fork off a child at that point and have the
child create the pdf / images and end, while the parent stayed at the
pause position waiting for another signal to fork off a child. If we
forked off a child, would it start from the begining of the script or
would it start at the same place (probably next line) in the perl
script it was forked off of?

Any thoughts?

Bill H
 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      09-12-2008

Quoth Bill H <(E-Mail Removed)>:
> In a recent post I asked about speeding up a perl script that uses
> PDF::API2. I did some profiling of the code and see that the vast
> majority of the time (about 90%) is used in going through all
> the .pm's in the PDF::API2 library. Once it gets past all of the
> initialization, my code that uses the api goes very fast, creating a
> 20+ pdf document with seperate image thumbnail files of each page (via
> imagemagik) in less than 2 seconds.
>
> In a meeting we were having tonight we was tossing around the idea of
> having the program go through its initial setup and then "pause" to
> wait for a signal to create a pdf file, then create the pdf, images
> and then go back to the pause. Basically running all the time as a
> service. Anyone see any reason why this would be a bad idea?


No, it's a very good idea. This is exactly what systems like mod_perl
and FastCGI do to speed things up. You do have to be careful to clear
everything out between one run and the next...

> We further started wondering, instead of pausing, then running on a
> signal and then going back to pause for next signal to make a pdf,
> would it be possible to fork off a child at that point and have the
> child create the pdf / images and end, while the parent stayed at the
> pause position waiting for another signal to fork off a child.


....which is something fork allows you to avoid . fork does have some
overhead, which is why programs like Apache go to some trouble to avoid
forking a new process as each request comes in, but since your previous
model was a whole new perl process for each run this probably isn't
significant.

If anyone suggests using threads from perl on a system that has a real
fork, laugh .

> If we forked off a child, would it start from the begining of the
> script or would it start at the same place (probably next line) in the
> perl script it was forked off of?


perldoc -f fork
man 2 fork

Basically, both old and new processes will return from the fork call,
the only difference between them at that point being what is returned.

Ben

--
Every twenty-four hours about 34k children die from the effects of poverty.
Meanwhile, the latest estimate is that 2800 people died on 9/11, so it's like
that image, that ghastly, grey-billowing, double-barrelled fall, repeated
twelve times every day. Full of children. [Iain Banks] http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      09-12-2008
Bill H <(E-Mail Removed)> wrote:
> In a recent post I asked about speeding up a perl script that uses
> PDF::API2. I did some profiling of the code and see that the vast
> majority of the time (about 90%) is used in going through all
> the .pm's in the PDF::API2 library. Once it gets past all of the
> initialization, my code that uses the api goes very fast, creating a
> 20+ pdf document with seperate image thumbnail files of each page (via
> imagemagik) in less than 2 seconds.


If 10% of the time is spent doing something that takes 2 seconds,
then 100% of the time is 20 seconds and the module loading must be taking
almost 18 seconds. That is outrageous on anything modestly recent
computer. On my machine, loading PDF::API2 takes ~0.5 seconds.

One possible problem is if the PDF::API2 location install show up late in
@INC, and the stuff earlier in @INC is on slow network drives. For each of
the files it opens as part of loading PDF:API2, it has to "stat" its way
through the entire @INC list before finally finding it.


> In a meeting we were having tonight we was tossing around the idea of
> having the program go through its initial setup and then "pause" to
> wait for a signal to create a pdf file, then create the pdf, images
> and then go back to the pause. Basically running all the time as a
> service. Anyone see any reason why this would be a bad idea?


Nope. Sounds like a good idea. Working out the "signal" could be tricky.

> We further started wondering, instead of pausing, then running on a
> signal and then going back to pause for next signal to make a pdf,
> would it be possible to fork off a child at that point and have the
> child create the pdf / images and end, while the parent stayed at the
> pause position waiting for another signal to fork off a child.


Yes, you can do that, but it probably wouldn't be worthwhile. Since the
make a pdf part is fast, what is the point of parallelizing it? It would
add complexity for probably little to no benefit.


> If we
> forked off a child, would it start from the begining of the script or
> would it start at the same place (probably next line) in the perl
> script it was forked off of?


The new process and the old process start/continue at the same place. It
isn't the next line, it is the" returning" of the fork.
$x=fork();

The fork itself only happens in the parent, but the assignment to $x
happens in both the parent and the child.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
Bill H
Guest
Posts: n/a
 
      09-12-2008
On Sep 11, 11:27*pm, (E-Mail Removed) wrote:
> Bill H <(E-Mail Removed)> wrote:
> > In a recent post I asked about speeding up a perl script that uses
> > PDF::API2. I did some profiling of the code and see that the vast
> > majority of the time (about 90%) is used in going through all
> > the .pm's in the PDF::API2 library. Once it gets past all of the
> > initialization, my code that uses the api goes very fast, creating a
> > 20+ pdf document with seperate image thumbnail files of each page (via
> > imagemagik) in less than 2 seconds.

>
> If 10% of the time is spent doing something that takes 2 seconds,
> then 100% of the time is 20 seconds and the module loading must be taking
> almost 18 seconds. *That is outrageous on anything modestly recent
> computer. On my machine, loading PDF::API2 takes ~0.5 seconds.
>
> One possible problem is if the PDF::API2 location install show up late in
> @INC, and the stuff earlier in @INC is on slow network drives. *For each of
> the files it opens as part of loading PDF:API2, it has to "stat" its way
> through the entire @INC list before finally finding it.
>
> > In a meeting we were having tonight we was tossing around the idea of
> > having the program go through its initial setup and then "pause" to
> > wait for a signal to create a pdf file, then create the pdf, images
> > and then go back to the pause. Basically running all the time as a
> > service. Anyone see any reason why this would be a bad idea?

>
> Nope. *Sounds like a good idea. *Working out the "signal" could be tricky.
>
> > We further started wondering, instead of pausing, then running on a
> > signal and then going back to pause for next signal to make a pdf,
> > would it be possible to fork off a child at that point and have the
> > child create the pdf / images and end, while the parent stayed at the
> > pause position waiting for another signal to fork off a child.

>
> Yes, you can do that, but it probably wouldn't be worthwhile. *Since the
> make a pdf part is fast, what is the point of parallelizing it? *It would
> add complexity for probably little to no benefit.
>
> > If we
> > forked off a child, would it start from the begining of the script or
> > would it start at the same place (probably next line) in the perl
> > script it was forked off of?

>
> The new process and the old process start/continue at the same place. *It
> isn't the next line, it is the" returning" of the fork.
> $x=fork();
>
> The fork itself only happens in the parent, but the assignment to $x
> happens in both the parent and the child.
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> The costs of publication of this article were defrayed in part by the
> payment of page charges. This article must therefore be hereby marked
> advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
> this fact.


Thanks Ben, Xho for your comments. I am glad to see the idea we had
wasn't that far fetched.

On the signal to do something, the part of the website that calls the
perl program using PDF::API2 is in php and uses php sessions to talk
back and forth to each other. I saw a perl module that let you access
php sessions and wonder about using that method to send the signal.
Has anyone had any experience using php sessions in perl? Are they
continously updated? Or can anyone think of a better way of signaling
the perl script from another program?

Bill H
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ot]Valid excuse for speeding? JaR MCSE 9 01-05-2005 08:23 PM
speeding up data transfer? Devin Panchal Wireless Networking 1 09-06-2004 05:46 PM
Need some hints on speeding up Spamtrap Perl 1 08-11-2004 11:25 PM
Speeding up pages OHM ASP .Net 2 05-24-2004 03:24 PM
Speeding up page display Troy ASP .Net 2 01-22-2004 09:13 PM



Advertisments