Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Newbie Question: python mysqldb performance question

Reply
Thread Tools

Newbie Question: python mysqldb performance question

 
 
cjl
Guest
Posts: n/a
 
      05-20-2007
Group:

I'm new to python and new to mysql.

I have a csv file that is about 200,000 rows that I want to add to a
mysql database. Yes, I know that I can do this directly from the
mysql command line, but I am doing it through a python script so that
I can munge the data before adding it.

I have the following example code:

conn = MySQLdb.connect(db="database", host="localhost", user="root",
passwd="password")
c = conn.cursor()

reader = csv.reader(open(sys.argv[1]))
for row in reader:
data1, data2, data3, data4 = row
data = (data1,data2,data3,data4)
c.execute("""insert into datatable values (%s, %s, %s, %s)""",
data)
conn.commit()

This takes a really long time to execute, on the order of minutes.
Directly importing the csv file into mysql using 'load infile' takes
seconds.

What am I doing wrong? What can I do to speed up the operation?

Thanks in advance,
cjl

 
Reply With Quote
 
 
 
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      05-21-2007
On 20 May 2007 16:55:07 -0700, cjl <(E-Mail Removed)> declaimed the
following in comp.lang.python:

> reader = csv.reader(open(sys.argv[1]))
> for row in reader:
> data1, data2, data3, data4 = row
> data = (data1,data2,data3,data4)


That set is redundant in this example -- It's the very same as
saying

data = row

> c.execute("""insert into datatable values (%s, %s, %s, %s)""",
> data)
> conn.commit()
>
> This takes a really long time to execute, on the order of minutes.
> Directly importing the csv file into mysql using 'load infile' takes
> seconds.


"load infile", for local to server files (if so configured) means
all processing is done directly by the server. Using Python, you are
going through a TCP/IP connection with send/response traffic for each
command.


Step one: Don't commit after every single row... Feed all the rows
and then commit (among other things, that means an error in the middle
of reading the CSV rolls back all database activity, allowing you to try
again without having duplicate records in the database).

Step two: Don't process the rows one-at-a-time. Build up batches,
and use .executemany() on the batch.
--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) (E-Mail Removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (E-Mail Removed))
HTTP://www.bestiaria.com/
 
Reply With Quote
 
 
 
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      05-21-2007
On May 20, 5:55 pm, cjl <(E-Mail Removed)> wrote:
....snip...
> conn = MySQLdb.connect(db="database", host="localhost", user="root",
> passwd="password")
> c = conn.cursor()
>
> reader = csv.reader(open(sys.argv[1]))
> for row in reader:
> data1, data2, data3, data4 = row
> data = (data1,data2,data3,data4)
> c.execute("""insert into datatable values (%s, %s, %s, %s)""",
> data)
> conn.commit()
>
> This takes a really long time to execute, on the order of minutes.
> Directly importing the csv file into mysql using 'load infile' takes
> seconds.
>
> What am I doing wrong? What can I do to speed up the operation?


In addition to the previous poster's suggestions,
if you have indexes, foreign keys, or other constraints
on the table, and you are sure that you are loading
"good" data, you may want to drop them before doing
the inserts, and recreate them after. Updating indexes
and checking contraints can be time consuming.

But you will probabably never get your code to run
as fast as the mysql "load" command, which (I'm
guessing here, don't use mysql much) skips all
the sql machinery and writes directly to the table.

 
Reply With Quote
 
John Nagle
Guest
Posts: n/a
 
      05-21-2007
cjl wrote:
> Group:
>
> I'm new to python and new to mysql.
>
> I have a csv file that is about 200,000 rows that I want to add to a
> mysql database. Yes, I know that I can do this directly from the
> mysql command line, but I am doing it through a python script so that
> I can munge the data before adding it.
>
> I have the following example code:
>
> conn = MySQLdb.connect(db="database", host="localhost", user="root",
> passwd="password")
> c = conn.cursor()
>
> reader = csv.reader(open(sys.argv[1]))
> for row in reader:
> data1, data2, data3, data4 = row
> data = (data1,data2,data3,data4)
> c.execute("""insert into datatable values (%s, %s, %s, %s)""",
> data)
> conn.commit()
>
> This takes a really long time to execute, on the order of minutes.
> Directly importing the csv file into mysql using 'load infile' takes
> seconds.
>
> What am I doing wrong? What can I do to speed up the operation?


"LOAD INFILE" is generally faster than doing
many inserts. There are ways to speed things up, especially if
you can reconfigure MySQL to use more memory. You may want to
load the data without indexing it, then build the indices. Ask
in a MySQL group, or read the manual.

If you can use LOAD INFILE, do so. Preferably from an empty
database. Then it can sort the records and insert them all at once.
You can create a file for LOAD INFILE from Python, then issue the
LOAD INFILE command.

A few minutes isn't a "really long time" for that. I had to do
15,000,000 INSERT operations a few months back, and it took three days.

John Nagle

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie: Limesurvey and... MySQLdb? Albert-jan Roskam Python 0 03-20-2009 06:15 PM
Will MySQLdb, the Python shim, be supported for Python 2.6 or 3.x? John Nagle Python 4 11-19-2008 08:38 PM
Problems installing MySQLdb on Windows [newbie] Alex Meier Python 5 12-29-2004 01:21 PM
Newbie MySQLdb / MySQL version problem, I think Dave Merrill Python 7 11-29-2004 05:23 PM
python + MySQLDB distibution on Win .How? Igor Prischepoff Python 0 11-12-2003 07:55 AM



Advertisments