Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Better performance ...

Reply
Thread Tools

Better performance ...

 
 
cms-team@ivi.de
Guest
Posts: n/a
 
      08-28-2006
Hy!

I've got the following code and would like to ask, if there is any
possibilty for more performance. The script inserts about 240
recordsets per second into a MySQL database.

The file (input.txt) contains about 365.000 lines/recordsets.

================================================== ==
# Grundwerte initialisieren
require "/home/vip/my.properties";
$IMPDAT="$HOME_HOST/input.txt";

if (!open (FP1,"<$IMPDAT"))
{
print ("$IMPDAT nicht vorhanden!\n");
exit(3);
}
#Mit der DB verbinden
$rc=&connect_to_db();
$linecount = 0;

# Alte Tabelleninhalte vorher löschen
$statment="DROP TABLE IF EXISTS `testtabelle`";
$rv=$dbh->do($statment);
$statment="CREATE TABLE `testtabelle` (`apl` varchar(2) NOT NULL
default '',`vsnr` varchar(11) NOT NULL default '',`pvsnr` varchar(
NOT NULL default '',`art` char(1) NOT NULL default '',`aktiv` char(1)
NOT NULL default '',`vmnr` varchar(4) NOT NULL default '',`vbeginn`
varchar( NOT NULL default '',`vende` varchar( default NULL,`zw`
int(1) NOT NULL default '0',`ikm` char(1) NOT NULL default '',`ktoidx`
char(1) default NULL,`nettobeitrag` decimal(9,2) NOT NULL default
'0.00',`bruttoratenbeitrag` decimal(9,2) NOT NULL default
'0.00',`produkttext` varchar(150) NOT NULL default '',`rohbau`
varchar(6) default NULL,PRIMARY KEY (`vsnr`),KEY `pvsnr` (`pvsnr`),KEY
`vm` (`vmnr`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;";
$rv=$dbh->do($statment);

# Nimmt die Startzeit der Anwendung
$startzeit = time;

# Zeilenweise Auslesen der Datei
while (<FP1>)
{
$line = $_;
chop ($line);
$linecount += 1;

$line=~s/\"//g;
($apl,$vsnr1,$vsnr2,$pvsnr,$art,$aktiv,$vmnr,$vbeg inn,$vende,$zw,$ikm,$ktoidx,$nettobeitrag,$bruttor atenbeitrag,$produkttext,$rohbau)
= split(/;/,$line);

$vende=~s/ *$//g;
$vende=~s/,/\./g;

$ktoidx=~s/ *$//g;
$ktoidx=~s/,/\./g;

$nettobeitrag=~s/ *$//g;
$nettobeitrag=~s/,/\./g;

$bruttoratenbeitrag=~s/ *$//g;
$bruttoratenbeitrag=~s/,/\./g;

$produkttext=~s/ *$//g;
$produkttext=~s/,/\./g;

$rohbau=~s/ *$//g;
$rohbau=~s/,/\./g;

$statment="INSERT into testtabelle
(apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ik m,ktoidx,nettobeitrag,bruttoratenbeitrag,produktte xt,rohbau)
value
('$apl','$vsnr1$vsnr2','$pvsnr','$art','$aktiv','$ vmnr','$vbeginn',".($vende
eq "" ? "NULL" : "'".$vende."'").",$zw,'$ikm',".($ktoidx eq "" ? "NULL"
:
"'".$ktoidx."'").",$nettobeitrag,$bruttoratenbeitr ag,'$produkttext',".($rohbau
eq "" ? "NULL" : "'".$rohbau."'").")";

$rv=$dbh->do($statment);
}
$rc = $dbh->disconnect;
close(FP1);

# Zeit ausgeben
&Dauer;
$est = $est == 0 ? 1 : $est;
$dsek = int($linecount/$est);
print "Das Script 'laden_all_daten.pl' hat fuer
".punktuiere($linecount)." Datensaetze insgesamt ".$dauer." Minuten
gebraucht (".$dsek." Datensaetze pro Sekunde)\n";
unlink($TEMPDAT);
exit(0);
================================================== ==

Where can I make this thing run a little faster?

Thx in advance!

Greetings
Marcus Dau

 
Reply With Quote
 
 
 
 
Marc Espie
Guest
Posts: n/a
 
      08-28-2006
In article <(E-Mail Removed) .com>,
<(E-Mail Removed)> wrote:
>Hy!
>
>I've got the following code and would like to ask, if there is any
>possibilty for more performance. The script inserts about 240
>recordsets per second into a MySQL database.
>
>The file (input.txt) contains about 365.000 lines/recordsets.
>


>Where can I make this thing run a little faster?
>

Turn autocommit off, and commit just once every n transactions.

Not sure how good it will be with mysql, but sqlite loves it quite
a bit.

Otherwise, use some other database engine. mysql is simple to set up,
but definitely not the fastest.
 
Reply With Quote
 
 
 
 
cms-team@ivi.de
Guest
Posts: n/a
 
      08-28-2006
Thx a lot. This does the trick.

Performance went up from 50 minutes runtime for 30 scripts (tables) to
10 minutes!!!

Greetings
Marcus

 
Reply With Quote
 
goho
Guest
Posts: n/a
 
      08-28-2006

http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Thx a lot. This does the trick.
>
> Performance went up from 50 minutes runtime for 30 scripts (tables) to
> 10 minutes!!!
>
> Greetings
> Marcus


Your problem doesn't have a lot to do with perl; I suggest you look
towards the mysql group, however, that said:

I've not used MySQL but from experience with other DBMS you may get
significant improvement by creating the pvsnr and vm keys (indexes)
AFTER loading the data.

In some databases it helps to offline pre-sort in order of the primary
key - try it. Without knowing the internal structure of InnoDB I cant
say whether it will improve things or be a disaster but you wont know
until you try.

Be aware of ROLLBACK issues if you dont regularly commit; not a subject
for discussion here, I'll leave that for you to follow up. If you are
looking for a book, Michael Kofler's 'The Definitive Guide to MySQL'
looks pretty reasonable

Good luck.

 
Reply With Quote
 
Paul Lalli
Guest
Posts: n/a
 
      08-28-2006
(E-Mail Removed) wrote:
> I've got the following code and would like to ask, if there is any
> possibilty for more performance. The script inserts about 240
> recordsets per second into a MySQL database.



<snip>

> while (<FP1>)
> {
> $line = $_;


Why make two assignments?

while (my $line = <FP1>) {

> chop ($line);


You want chomp(), not chop(). There is a difference. Look at perldoc
-f chomp and perldoc -f chop

> $linecount += 1;


No need for this variable. Perl maintains the linecount in the $.
variable for you.

>
> $line=~s/\"//g;
> ($apl,$vsnr1,$vsnr2,$pvsnr,$art,$aktiv,$vmnr,$vbeg inn,$vende,$zw,$ikm,$ktoidx,$nettobeitrag,$bruttor atenbeitrag,$produkttext,$rohbau)
> = split(/;/,$line);
>
> $vende=~s/ *$//g;
> $vende=~s/,/\./g;
>
> $ktoidx=~s/ *$//g;
> $ktoidx=~s/,/\./g;
>
> $nettobeitrag=~s/ *$//g;
> $nettobeitrag=~s/,/\./g;
>
> $bruttoratenbeitrag=~s/ *$//g;
> $bruttoratenbeitrag=~s/,/\./g;
>
> $produkttext=~s/ *$//g;
> $produkttext=~s/,/\./g;
>
> $rohbau=~s/ *$//g;
> $rohbau=~s/,/\./g;


For readability sake, make this one for loop that iterates over your
six variables. For performance sake, change the * to a + and change
the second s///g to a tr///

for ($vende, $ktoidx, $nettobeitrag, $bruttoratenbeitrag, $producttext,
$rohbau) {
s/ +$//g;
tr/,/./;
}

>
> $statment="INSERT into testtabelle
> (apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ik m,ktoidx,nettobeitrag,bruttoratenbeitrag,produktte xt,rohbau)
> value
> ('$apl','$vsnr1$vsnr2','$pvsnr','$art','$aktiv','$ vmnr','$vbeginn',".($vende
> eq "" ? "NULL" : "'".$vende."'").",$zw,'$ikm',".($ktoidx eq "" ? "NULL"
> :
> "'".$ktoidx."'").",$nettobeitrag,$bruttoratenbeitr ag,'$produkttext',".($rohbau
> eq "" ? "NULL" : "'".$rohbau."'").")";
>
> $rv=$dbh->do($statment);


Here's a large performance hit. Instead of compiling this SQL each
time through, prepare this SQL *once*, outside the loop. Then execute
it many times in the loop.

#outside the loop:
my $statement = <<"EO_SQL";
INSERT INTO testabelle
(apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ik m,ktoidx,nettobeitrag,bruttoratenbeitrag,produktte xt,rohbau)
VALUE
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
EO_SQL
my $sth = $dbh->prepare($statement);
#...

#later, inside the loop:
$sth->execute($apl, $vsnr1 . $vnsr2, $pvsnr, $art, $aktiv, $vmnr,
$vbeginn, ($vende eq q{} ? 'NULL' : $vende), $zw, $ikm, ($ktoidx eq q{}
? 'NULL' : $ktoidx), $nettobeitrag, $bruttoratenbeitrag, $produkttext,
($rohbau eq q{} ? 'NULL' : $rohbau));

You'll note that this also saves you the annoyance of having to figure
out all those quotes in your original SQL.


The previous recommendations given to you are good - committing only
after all the inserts will definately save you time - but this should
help as well.

Paul Lalli

 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      08-28-2006
(E-Mail Removed) schreef:

> $vende=~s/ *$//g;


I don't think the g-modifier is needed. Also change the "*" to a "+",
but Paul already showed you that.

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
J. Gleixner
Guest
Posts: n/a
 
      08-28-2006
(E-Mail Removed) wrote:
> Hy!
>
> I've got the following code and would like to ask, if there is any
> possibilty for more performance. The script inserts about 240
> recordsets per second into a MySQL database.
>
> The file (input.txt) contains about 365.000 lines/recordsets.


Also, see MySQL specific techniques, especially "LOAD DATA INFILE",
which they say "is usually 20 times faster than using INSERT statements."

http://dev.mysql.com/doc/refman/5.0/...ert-speed.html
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
GL2 better than the XLs? Consumer grade HDs better than pro-sumer Mini DVs? dh@. DVD Video 1 08-28-2008 07:20 PM
The SCO case gets better and better.... thingy NZ Computing 2 12-10-2006 11:33 AM
Is splint really better than lint? Is there a better tool than splint? Peter Bencsik C Programming 2 09-21-2006 10:02 PM
Build a Better Blair (like Build a Better Bush, only better) Kenny Computer Support 0 05-06-2005 04:50 AM
Why doesn't the better camera have a better dpi? Tony Carlisle Digital Photography 6 10-04-2003 10:40 AM



Advertisments