Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > speeding up cgi perl

Reply
Thread Tools

speeding up cgi perl

 
 
skeldoy
Guest
Posts: n/a
 
      12-17-2008
Hey! I am working on an application that features a huge database
(mysql) and some cgi (perl) for listing, sorting, searching,
dupechecking and more. I see that the configuration for the mysql is
pretty much spot on - most of the data is cached and so the mysqld-
process isn't really doing much in terms of causing bottlenecks. But I
believe that the perl-code may be the bottleneck here. I have turned
off buffering completely and I render pretty much just the things that
are needed. Still it can take up to a minute to print (in html) a
query that returns in a second on the mysql-console.

The output from the cgi is around 15MB for every operation the user
does, so I see the potential for slowness right there, in the sheer
amount of data that has to be produced and transmitted over the net,
but still I don't really understand what I have done to make the cgi
so slow.

The cgi is mostly taking a parameter like $query and doing a "select
from db where value like "%$query%"" and returning that in pretty <td>
$_</td>-form. That seems to work reasonably fast. But when I do a
"select * from db" things tend to get really slow when dealing with
15000-entries++ (even though mysql has it all cached and spits it out
in a split second). The cgi-process sits there, spitting out html to
the client, using up 95% of the cpu-time of one of the cores and using
50MBs of memory or so. I have no idea what it does. I have replaced
most of the " with '. And I can't really see that I am doing something
that needn't be done. Is there an issue with creating multiple
database-connections (DBD::mysql) that I should be aware of?

If somebody has experience in doing huge db's with perl, can you
please give me some pointers? Is this a code issue or is it a network-
issue or is it a browser issue? Does anyone have any tips for doing
huge databases with perl?
 
Reply With Quote
 
 
 
 
smallpond
Guest
Posts: n/a
 
      12-17-2008
On Dec 17, 11:04 am, skeldoy <skel...@gmail.com> wrote:

"select * from db"

If you're sending back everything, why not just render the page once
and
return static html? You can schedule updates when the db changes.
 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      12-17-2008
On 2008-12-17 16:04, skeldoy <> wrote:
> Hey! I am working on an application that features a huge database
> (mysql) and some cgi (perl) for listing, sorting, searching,
> dupechecking and more. I see that the configuration for the mysql is
> pretty much spot on - most of the data is cached and so the mysqld-
> process isn't really doing much in terms of causing bottlenecks. But I
> believe that the perl-code may be the bottleneck here. I have turned
> off buffering completely and I render pretty much just the things that
> are needed. Still it can take up to a minute to print (in html) a
> query that returns in a second on the mysql-console.
>
> The output from the cgi is around 15MB for every operation the user
> does, so I see the potential for slowness right there, in the sheer
> amount of data that has to be produced and transmitted over the net,
> but still I don't really understand what I have done to make the cgi
> so slow.
>
> The cgi is mostly taking a parameter like $query and doing a "select
> from db where value like "%$query%""


You should use placeholders instead for security, but that won't affect
speed much.

> and returning that in pretty <td> $_</td>-form. That seems to work
> reasonably fast. But when I do a "select * from db" things tend to get
> really slow when dealing with 15000-entries++ (even though mysql has
> it all cached and spits it out in a split second). The cgi-process
> sits there, spitting out html to the client, using up 95% of the
> cpu-time of one of the cores and using 50MBs of memory or so. I have
> no idea what it does.


To find out where it spends the time, use Devel::NYTProf.

The simplest method to use it is usually to just invoke it from the
commandline:

export REQUEST_METHOD=GET
export QUERY_STRING='whatever'
perl -d:NYTProf yourscript.cgi

This creates a file nytprof.out in the current directory which you can
convert to a nice HTML report with the nytprofhtml command.

There are also other ways for invoking it, see the docs.


CGI does have considerable overhead per row. However, for just 15000
rows I'd still expect that to be finished in a few seconds at most.

> If somebody has experience in doing huge db's with perl, can you
> please give me some pointers?


Is 700 GB and 6 billion rows in the largest table huge enough? I don't
use mysql for that one, though.

> Is this a code issue or is it a network-
> issue or is it a browser issue?


Since you wrote that your CGI script uses 95% CPU time during the time
it runs it is unlikely to be a browser problem. But browsers in general
are rather slow at rendering tables. For 15000 rows, the browser may
easily take a few dozen seconds to render the table (depending on the
speed of your workstation, how much memory you have, etc.) Try accessing
the CGI with wget or a similar tool - is it much faster than with the
browser?

hp
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      12-17-2008
skeldoy <> wrote:
> Hey! I am working on an application that features a huge database
> (mysql) and some cgi (perl) for listing, sorting, searching,
> dupechecking and more.


Isn't dupechecking more a back-office thing best done from the command
line, not CGI?

> I see that the configuration for the mysql is
> pretty much spot on - most of the data is cached and so the mysqld-
> process isn't really doing much in terms of causing bottlenecks. But I
> believe that the perl-code may be the bottleneck here. I have turned
> off buffering completely


What does that mean? $|=1? mysql_use_result?

I wouldn't know how to identify *all* sources of buffering present in
a complex system, let alone know how to disable them completely.

> and I render pretty much just the things that
> are needed. Still it can take up to a minute to print (in html) a
> query that returns in a second on the mysql-console.


It takes less than a second to stream the entire <big_num> of data
to your screen on the mysql-console? Or do you somehow tell it to
run the query but suppress the actual results?

> The output from the cgi is around 15MB for every operation the user
> does, so I see the potential for slowness right there, in the sheer
> amount of data that has to be produced and transmitted over the net,
> but still I don't really understand what I have done to make the cgi
> so slow.


Does the CGI have to be invoked by POST, or will GET work?

Use wget to fetch the page. See how long it takes when the results just
have to fetched and stored and not rendered.

And once you do that and have the results stored as a static file, you
might as well put that in a location that is served up via http and open it
with the client browser. See how long does it take to pull the data over
the net and render it in a browser when it is coming from a static file
rather than a CGI.

>
> The cgi is mostly taking a parameter like $query and doing a "select
> from db where value like "%$query%"" and returning that in pretty <td>
> $_</td>-form. That seems to work reasonably fast. But when I do a
> "select * from db" things tend to get really slow when dealing with
> 15000-entries++ (even though mysql has it all cached and spits it out
> in a split second). The cgi-process sits there, spitting out html to
> the client, using up 95% of the cpu-time of one of the cores


You can try adding -dProf or -d:SmallProf (or you favorite profiler) to
the CGI shebang line. Or come up with an adapter run the CGI in a
command-line environment, and profile it from there.

> and using
> 50MBs of memory or so.


On most modern computers, 50MB is a rounding error. Is that the case for
your computer?


> I have no idea what it does. I have replaced
> most of the " with '.


???

> And I can't really see that I am doing something
> that needn't be done. Is there an issue with creating multiple
> database-connections (DBD::mysql) that I should be aware of?


*Anything* can be done incorrectly, in which case there are things to be
aware of. Are you using multiple MySQL connections in your CGI? If so,
how are you doing it?


> If somebody has experience in doing huge db's with perl, can you
> please give me some pointers?


Profile, profile, profile. If that doesn't work, comment out chunks
of code and see if that affects speed (for example, do all the work except
for the actual printing of the html table.)

> Is this a code issue or is it a network-
> issue or is it a browser issue?


If the CGI script itself is using 95% of a CPU, then it probably isn't
a browser or network issue. Unless your CPU utilization reporting tool
accounts for I/O waits as CPU usage (I've seen some that do).

> Does anyone have any tips for doing
> huge databases with perl?


15,000 records is not huge. It is just barely even medium. Unless you
have giant blob or text fields. If you do have a large database with
DBD::mysql, it is important to know the benefits and drawbacks of
mysql_use_result.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
skeldoy
Guest
Posts: n/a
 
      12-17-2008
> ... why not just render the page once
> return static html? *You can schedule updates when the db changes.


I think that I am somewhat doing that with configuring the mysql to
cache almost everything it can. The server basically is just one big
mysql-cache and some memory for the cgi-process to run.. But I see
your point.. I can do it to test out if the browser may be the issue
 
Reply With Quote
 
skeldoy
Guest
Posts: n/a
 
      12-17-2008
> You should use placeholders instead for security, but that won't affect
> speed much.


I know. But it is intended to be used by inhouse people, and if they
mess it up, I will just restore it.

> To find out where it spends the time, use Devel::NYTProf.
> The simplest method to use it is usually to just invoke it from the
> commandline:
> export REQUEST_METHOD=GET
> export QUERY_STRING='whatever'
> perl -d:NYTProf yourscript.cgi
> This creates a file nytprof.out in the current directory which you can
> convert to a nice HTML report with the nytprofhtml command.
> There are also other ways for invoking it, see the docs.


Good tip! I will try it out and see what it says. Thanks man, this is
really good stuff right there.

>
> CGI does have considerable overhead per row. However, for just 15000
> rows I'd still expect that to be finished in a few seconds at most.


I know! when I do a "mysql -u root \n select * from db" it returns
pretty much in the speed that my gfx-adapter can print it out, and
within a couple of secs if I pipe it to a file.

> Is 700 GB and 6 billion rows in the largest table huge enough? I don't
> use mysql for that one, though.


Is it a mysql issue that make you use another db?

> Since you wrote that your CGI script uses 95% CPU time during the time
> it runs it is unlikely to be a browser problem. But browsers in general
> are rather slow at rendering tables. For 15000 rows, the browser may
> easily take a few dozen seconds to render the table (depending on the
> speed of your workstation, how much memory you have, etc.) Try accessing
> the CGI with wget or a similar tool - is it much faster than with the
> browser?


A really good question. Never really thought about seeing whether the
data-transmission is the issue in that way.
Just another thing for me to do..

Really appreciate your time/help on this. Good pointers there. I got
pretty one lane looking at the code
and tuning the mysql there. Never occurred to me that there is basic
ways of testing the data-transmission (ie. curl/wget).
And the Devel::NYTProf package is a definate plus in finding the
culprit here. Thanks again!

 
Reply With Quote
 
skeldoy
Guest
Posts: n/a
 
      12-18-2008
> Isn't dupechecking more a back-office thing best done from the command
> line, not CGI?


I know. I got some very particular non-technical customers that wants
to oversee the entire process to spend time. I think the bureaucrats
have too little work and too much time on their hands. But hey! I
can't really tell them what THEY want to do, right?

> What does that mean? *$|=1? *mysql_use_result?


$|=1 on everything, everywhere. The mysql doesn't really buffer output
anymore, but the whole thing is in memory - so it is, as I said,
probably not the culprit.

> I wouldn't know how to identify *all* sources of buffering present in
> a complex system, let alone know how to disable them completely.


I know, but I thought there might be something trivial that I didn't
know about. Good to hear though.

> It takes less than a second to stream the entire <big_num> of data
> to your screen on the mysql-console? *Or do you somehow tell it to
> run the query but suppress the actual results?


If i render it on screen from the mysql - I get bottlenecked by the
graphics-adapter/Xorg. When I pipe it to a file it gets done in like a
couple of seconds, and I figure that is how long it really does to do
the query.

> Does the CGI have to be invoked by POST, or will GET work?


I use GET.

> Use wget to fetch the page. *See how long it takes when the results just
> have to fetched and stored and not rendered.


I will.

> And once you do that and have the results stored as a static file, you
> might as well put that in a location that is served up via http and open it
> with the client browser. *See how long does it take to pull the data over
> the net and render it in a browser when it is coming from a static file
> rather than a CGI.


The next step. I agree. I plainly just have to be more thorough.. See
your pointers there.

> You can try adding -dProf or -d:SmallProf (or you favorite profiler) to
> the CGI shebang line. *Or come up with an adapter run the CGI in a
> command-line environment, and profile it from there.


Yeah. Mister Peter mentioned a Devel::NYTProf that I intend to run.
But I will definatly try out DProf and SmallProf like you say.
Glad some of you actually know about these things (like profiling and
such other stuff that I am totally blind to).

> On most modern computers, 50MB is a rounding error. *Is that the case for
> your computer?


? huh ? Is this a new feature of the VonNeumann machine that slipped
my attention?

> *Anything* can be done incorrectly, in which case there are things to be
> aware of. *Are you using multiple MySQL connections in your CGI? *If so,
> how are you doing it?


doing multiple "my $db1 = the connection stuff ; my $db2 = the
connection stuff ; my $db...
I open them all up in the beginning of the code and keep them over the
length of the run.
Strange thing is that when I parse the data into the DB I do it in the
exactly same manner and the
multiple connections speed that up. Can't really see how multiple
connections would suddenly be an issue,
but strange things happen in the Perl-world, so I have all
possibilities open.
>
> Profile, profile, profile. *If that doesn't work, comment out chunks
> of code and see if that affects speed (for example, do all the work except
> for the actual printing of the html table.)


Again with the profiling. I will man. Thanks for that. Never actually
been a profiling man. Just code and it normally works the way I
intended. Will definelty try it out now..

> If the CGI script itself is using 95% of a CPU, then it probably isn't
> a browser or network issue. *Unless your CPU utilization reporting tool
> accounts for I/O waits as CPU usage (I've seen some that do).


I don't really know. All I have done is run top on the linux-server
that runs it.
Not really into the whole userland thing - can it be an apache issue??

> 15,000 records is not huge. *It is just barely even medium. *Unless you
> have giant blob or text fields. *If you do have a large database with
> DBD::mysql, it is important to know the benefits and drawbacks of
> mysql_use_result.


I know 15000 is supposed to be nada. But 15000 is the magic number
where things tend to slow down.
You must excuse me, I don't normally type English, so I may tend to
write awkwardly.
I will read up on mysql_use_result though. Thanks for the tip. I see
there is much more docmentation to dive into.

Thanks for all the tips and pointers and not least your time!
 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      12-18-2008
skeldoy wrote:

>> You should use placeholders instead for security, but that won't affect
>> speed much.

>
>I know. But it is intended to be used by inhouse people, and if they
>mess it up, I will just restore it.


Tsk.

It's not hard, and here is how:

>>>select from db where value like "%$query%"


my $sth = $dbh->prepare("select * from db '%' || ? || '%'");
$sth->execute($query);

How hard was that?

--
Bart.
 
Reply With Quote
 
skeldoy
Guest
Posts: n/a
 
      12-18-2008
Update: Found out (with curl) that the problem is in the browsers. The
queries completes and downloads in a second or so but it takes forever
to render in all browsers I have tested.

Thanks for your help guys!
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      12-18-2008
On 2008-12-17 23:53, skeldoy <> wrote:
>> Is 700 GB and 6 billion rows in the largest table huge enough? I don't
>> use mysql for that one, though.

>
> Is it a mysql issue that make you use another db?


Actually for us, mysql would be "another db". We are traditionally an
Oracle shop, so using Oracle was the natural thing to do for us. We are
using mysql for a few smaller databases, and I'm planning to look at
postgres again - the inverted indices could be really useful for our
application.

hp
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Recommendation for a small web framework like Perl's CGI::Applicationto run as CGI? excord80 Python 17 01-29-2009 06:02 PM
what's wrong calling a Perl/CGI script in Perl/CGI script under Tomcat server? kath Perl Misc 4 04-09-2007 09:21 PM
Python-cgi or Perl-cgi script doubt praba kar Python 1 07-30-2005 08:25 AM
speeding up perl script execution under apache stig erikson Perl Misc 3 10-29-2004 03:57 PM
Re: CGI Perl "use CGI" statement fail Jürgen Exner Perl 0 07-31-2003 02:00 PM



Advertisments