Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > DBI Performance Issues

Reply
Thread Tools

DBI Performance Issues

 
 
Chris H.
Guest
Posts: n/a
 
      08-25-2006
im currently working on an issue for work, in which processing of a pipe
delimited text file consisting of 356,400 lines of data with 19 fields in
each row.

the issue that i am having is not the performance of reading and splitting
this data, but in writing to the database. using dbi/dbd::mysql, using
'localhost' as the server the script connects to i get the following
results:

> time ./ushbh-dev.pl hbh.txt

Reading input file - 08/25/06 10:19:09
Updating us_hourly table - 08/25/06 10:19:09
1000 records processed.

real 0m1.096s
user 0m0.424s
sys 0m0.008s

------------------------

using 'servername.tld' from a remote machine, i get the following
performance:


> time ./ushbh-dev.pl hbh.txt

Reading input file - 08/25/06 10:17:49
Updating us_hourly table - 08/25/06 10:17:49
1000 records processed.

real 1m11.606s
user 0m0.250s
sys 0m0.034s

------------------------

the issue seems to be with remote connections either through the dbi
module, or the dbd::mysql driver. this also appears with the sybase dbi
driver going from a unix machine to a mssql machine.

are there any remote connection variables or performance enchancing calls
that im missing?

the sql query thats being run is:

update table foo set var1 = 'foo', var2 = 'foo2' where var3 = 'bar' and
var4 = 'bar2';

the table is indexed based on var3 and var4, which did show a huge
increase in performance once it was indexed properly.

this is data that needs to be updated every 15 minutes, so dropping the
data and re-inserting is not an option.

thanks for any help or insight you can provide on this issue.

--
Chris H.
http://www.nasland.nu

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      08-25-2006
"Chris H." <(E-Mail Removed)> wrote:
> im currently working on an issue for work, in which processing of a pipe
> delimited text file consisting of 356,400 lines of data with 19 fields in
> each row.
>
> the issue that i am having is not the performance of reading and
> splitting this data, but in writing to the database. using
> dbi/dbd::mysql, using 'localhost' as the server the script connects to i
> get the following results:
>
> > time ./ushbh-dev.pl hbh.txt

> Reading input file - 08/25/06 10:19:09
> Updating us_hourly table - 08/25/06 10:19:09
> 1000 records processed.
>
> real 0m1.096s
> user 0m0.424s
> sys 0m0.008s
>
> ------------------------
>
> using 'servername.tld' from a remote machine, i get the following
> performance:
>
> > time ./ushbh-dev.pl hbh.txt

> Reading input file - 08/25/06 10:17:49
> Updating us_hourly table - 08/25/06 10:17:49
> 1000 records processed.
>
> real 1m11.606s
> user 0m0.250s
> sys 0m0.034s
>
> ------------------------


How fast is your network connection? I see barely any change in
performance at all between local and remote.


> the issue seems to be with remote connections either through the dbi
> module, or the dbd::mysql driver.


I would suggest it is at a lower lever, at the network or at the compiled
binaries that DBD::mysql links against. Have you implemented the same
thing in another language (C, Java, Python) and seen better performance?
If not, I doubt your problem has anything to do with Perl or perl modules
per se.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
 
Chris H.
Guest
Posts: n/a
 
      08-28-2006
On Fri, 25 Aug 2006 17:56:42 +0000, xhoster wrote:
>
> How fast is your network connection? I see barely any change in
> performance at all between local and remote.
>


the real machine will be 100mbit to 100mbit via lan. the testing machines
that i used are from my home cable connection (8mb down/2mb up) to 100mbit
colo. its easy to blame the cable connection, however the traffic going
across the line during the updates are a whopping 7kbit/s which a 14.4k
could handle. thats why im at a loss for ideas. its like its refusing to
use the cpu or the network hardware during remote tests. as i dont have
iptraf on the actual machine that will run this, i cant say for certain
how much bandwidth its using on the lan...but i'd imaging not much if any
more than my home to colo tests.


> I would suggest it is at a lower lever, at the network or at the compiled
> binaries that DBD::mysql links against. Have you implemented the same
> thing in another language (C, Java, Python) and seen better performance?
> If not, I doubt your problem has anything to do with Perl or perl modules
> per se.
>
> Xho


i have not implemented this in another language, however, there are a few
changes. first the live script is run from a sun solaris machine using
dbi's sybase odbc driver to connect to a mssql database on a windows 2003
server machine. the dev script i've modified to suit my environment which
consists of mysql. the tables i have set up identical, and the data file
was scp'ed from the live machine and run against both databases to check
for performance differences.

the only difference short of this i can find is that on my colo box
running mysql local, and running the script local with 'localhost' as the
server to connect to will use 50% cpu for the perl script, and 50% cpu for
the mysql process.

running it from my home machine to the mysql box using 'domain.tld' as the
server to connect to, it uses less than .5% cpu as if its not even working
or attempting to work.

this effect is seen in the live version of the script too, with the remote
server in the connect string, the script on the solaris box will use less
than 1% cpu and the mssql server machine under task manager reports a cpu
usage of 8%-14% (it doesnt run this database exclusively).

i realize there are quite a few variables at play here, but im hoping its
something really simple im missing thats blocking it from just running at
its true speed.

i cant say how the dbi or dbd drivers on the solaris machine are compiled,
as i didnt set them up. the dbi and dbd drivers on both my home machine
and the colo box are standard gentoo compiled versions. as far as the
ebuild's are concerned, they are setting no additional options from the
defaults.

thanks.

--
Chris H.
http://www.nasland.nu

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      08-28-2006
"Chris H." <(E-Mail Removed)> wrote:
> On Fri, 25 Aug 2006 17:56:42 +0000, xhoster wrote:
> >
> > How fast is your network connection? I see barely any change in
> > performance at all between local and remote.
> >

>
> the real machine will be 100mbit to 100mbit via lan. the testing machines
> that i used are from my home cable connection (8mb down/2mb up) to
> 100mbit colo. its easy to blame the cable connection, however the traffic
> going across the line during the updates are a whopping 7kbit/s which a
> 14.4k could handle. thats why im at a loss for ideas.


You are not making the important distiction between throughput and latency.
Your Perl script sends the sql command, then waits to get a response.
While waiting, it does nothing--it doesn't use CPU and/or bandwidth by
"sending ahead on prospect" more update statements. So you are limited by
latency, not by throughput. With an insert, you can circumvent this by
using the MySQL-specific multiple row insert statement to add many rows per
network round-trip. Maybe you can use the MySQL-specific "INSERT...ON
DUPLICATE KEY UPDATE" syntax to accomplish the same thing--just a
suggestion, I've never looked into it in detail myself. However, if your
real situation will have the two machines on the same LAN, then latency
will probably not be a problem. You simply can't do benchmarking on such
disparate setups and expect the results to be meaningful.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Chris H.
Guest
Posts: n/a
 
      08-28-2006
On Mon, 28 Aug 2006 16:27:31 +0000, xhoster wrote:

> You are not making the important distiction between throughput and latency.
> Your Perl script sends the sql command, then waits to get a response.
> While waiting, it does nothing--it doesn't use CPU and/or bandwidth by
> "sending ahead on prospect" more update statements. So you are limited by
> latency, not by throughput. With an insert, you can circumvent this by
> using the MySQL-specific multiple row insert statement to add many rows per
> network round-trip. Maybe you can use the MySQL-specific "INSERT...ON
> DUPLICATE KEY UPDATE" syntax to accomplish the same thing--just a
> suggestion, I've never looked into it in detail myself. However, if your
> real situation will have the two machines on the same LAN, then latency
> will probably not be a problem. You simply can't do benchmarking on such
> disparate setups and expect the results to be meaningful.
>
> Xho


i understand the differences in the setup, but i dont see latency playing
that much of a role in this issue. ping response from solaris to windows
on the lan (live setup) is 2.73ms. ping response from my home machine to
my colo box is 17.5ms. my issue is that the two setups perform the same in
regards to how slow they are getting the information to the server. i can
understand it being slower, that's a given, but what im having an issue
with is that it takes just as long to update 10 records remotely as 1,000
locally...then something is going wrong somewhere. i'd suspect it should
be faster than that.

thanks again for all the information thus far, it is giving me different
ideas to try. i'll definately look at the 'insert...on duplicate key
update' documentation to see if it'll work.

--
Chris H.
http://www.nasland.nu

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      08-28-2006
"Chris H." <(E-Mail Removed)> wrote:
> On Mon, 28 Aug 2006 16:27:31 +0000, xhoster wrote:
>
> > You are not making the important distiction between throughput and
> > latency. Your Perl script sends the sql command, then waits to get a
> > response. While waiting, it does nothing--it doesn't use CPU and/or
> > bandwidth by "sending ahead on prospect" more update statements. So
> > you are limited by latency, not by throughput. With an insert, you can
> > circumvent this by using the MySQL-specific multiple row insert
> > statement to add many rows per network round-trip. Maybe you can use
> > the MySQL-specific "INSERT...ON DUPLICATE KEY UPDATE" syntax to
> > accomplish the same thing--just a suggestion, I've never looked into it
> > in detail myself. However, if your real situation will have the two
> > machines on the same LAN, then latency will probably not be a problem.
> > You simply can't do benchmarking on such disparate setups and expect
> > the results to be meaningful.
> >
> > Xho

>
> i understand the differences in the setup, but i dont see latency playing
> that much of a role in this issue. ping response from solaris to windows
> on the lan (live setup) is 2.73ms. ping response from my home machine to
> my colo box is 17.5ms.


Is that under load or under no load? Anyway, you reported 71 seconds for
1000 records, which is 71ms per record. 71ms is only fourfold more than
17.5 ms, so I would say that latency is definitely in the ball park to be a
problem. (I would expect MySQL to have more overhead then ping--4 times
more would not greatly surprise me.)

> my issue is that the two setups perform the same
> in regards to how slow they are getting the information to the server. i
> can understand it being slower, that's a given, but what im having an
> issue with is that it takes just as long to update 10 records remotely as
> 1,000 locally...then something is going wrong somewhere. i'd suspect it
> should be faster than that.


I'm afraid you lost me. You have a localhost msyql connection with 1000
records per second, and an over-cable-modem mysql connection with 14
records per second. Do you have a *mysql* connection on a LAN? If so,
what speed did that provide?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-29-2006
On 2006-08-28 19:47, Chris H. <(E-Mail Removed)> wrote:
> On Mon, 28 Aug 2006 16:27:31 +0000, xhoster wrote:
>
>> You are not making the important distiction between throughput and latency.
>> Your Perl script sends the sql command, then waits to get a response.
>> While waiting, it does nothing--it doesn't use CPU and/or bandwidth by
>> "sending ahead on prospect" more update statements. So you are limited by
>> latency, not by throughput. With an insert, you can circumvent this by
>> using the MySQL-specific multiple row insert statement to add many rows per
>> network round-trip.


DBI also has an execute_array method, but I don't know if this is
implemented by DBD::MySQL. If it is, it can improve performance quite
dramatically - I've measured a factor 50 in an (admittedly very simple)
benchmark with Oracle.

> i understand the differences in the setup, but i dont see latency playing
> that much of a role in this issue. ping response from solaris to windows
> on the lan (live setup) is 2.73ms.


This seems very slow for a 100Mbit network. 0.27ms is more what I would
expect.

> ping response from my home machine to my colo box is 17.5ms. my issue
> is that the two setups perform the same in regards to how slow they
> are getting the information to the server. i can understand it being
> slower, that's a given, but what im having an issue with is that it
> takes just as long to update 10 records remotely as 1,000 locally...


ping localhost reports an rtt of 0.05 ms on my (rather old) laptop. I
assume your server is at least as fast. So the latency of your broadband
connection is at least 350 times longer than that of the loopback
interface. So I don't find it very surprising that a mostly
latency-bound job takes 71 times as long. Assuming there are four round
trips per insert (don't know why there should be so many) and an acual
processing time of 0.8 ms we would arrive almost exactly at your
measurements:

broadband: 4 * 17.5ms + 0.8ms = 70.8ms
loopback: 4 * 0.05ms + 0.8ms = 1.0ms

If this theory is correct you should expect
4 * 2.73ms + 0.8ms = 11.72ms
per insert (or about 85 inserts/second) over the LAN.


hp

--
_ | Peter J. Holzer | > Wieso sollte man etwas erfinden was nicht
|_|_) | Sysadmin WSR | > ist?
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) | Was sonst wäre der Sinn des Erfindens?
__/ | http://www.hjp.at/ | -- P. Einstein u. V. Gringmuth in desd
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DBI Mysql storing DBI:binary, trouble with character 26 Jerome Hauss Ruby 0 10-13-2004 03:04 PM
mod_perl - dbi - DBD:Pg performance test, old vs new Pablo S Perl 0 09-01-2004 04:21 AM
DBI and DBI::Oracle packages configuration ulloa Perl 1 07-22-2004 05:52 PM
DBI problem : How can I load quickly one huge table with DBI ??. Tim Haynes Perl Misc 3 09-13-2003 03:43 AM
Mason, DBI, and DBI::Pg Asby Perl Misc 0 07-24-2003 09:04 PM



Advertisments