Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > design strategy on handling large DB

Reply
Thread Tools

design strategy on handling large DB

 
 
szr
Guest
Posts: n/a
 
      04-13-2008
Peter J. Holzer wrote:
> On 2008-04-13 02:05, <> wrote:
>> "Ela" <> wrote:
>>> It's only 3GB... but I find when I run the current java program
>>> with the mysql, it uses up all the 2GB memory...

>>
>> What uses up all 2GB? MySQL or Java? If MySQL, then you have it
>> poorly configured. If Java, then you must be loading all the data
>> into memory at one time, either accidentally or on purpose.

>
> I don't know about Java, but in Perl this is easy to do accidentally.
>
> if you do something like
>
> my $sth = $dbh->prepare("select * from largetable");
> $sth->execute();
>
> while (my $r = $sth->fetchrow_hashref) {
> process_one_row($r);
> }
>
> You read one row at a time into memory. Right?
>
> Wrong. DBD::mysql by default uses mysql_store_result, so the
> $sth->execute will (try to) load the entire table into memory. You
> need to tell DBD::mysql to use mysql_use_result instead:
>
> $dbh->{mysql_use_result} = 1;
>
> But then you cannot issue other queries until you have finished
> reading the results ...


Couldn't one get around that by just using multiple DBD objects (with
mysql_use_result set) ?

--
szr


 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      04-13-2008
On 2008-04-13 15:28, szr <> wrote:
> Peter J. Holzer wrote:
>> DBD::mysql by default uses mysql_store_result, so the $sth->execute
>> will (try to) load the entire table into memory. You need to tell
>> DBD::mysql to use mysql_use_result instead:
>>
>> $dbh->{mysql_use_result} = 1;
>>
>> But then you cannot issue other queries until you have finished
>> reading the results ...

>
> Couldn't one get around that by just using multiple DBD objects (with
> mysql_use_result set) ?
>


Yes, that would work. But then you have two sessions and have to be
careful with transactions, locking, etc. All solvable, of course, you
just have to be aware of it.

hp
 
Reply With Quote
 
 
 
 
nolo contendere
Guest
Posts: n/a
 
      04-14-2008
On Apr 13, 7:26*am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:

> if you do something like
>
> * * my $sth = $dbh->prepare("select * from largetable");
> * * $sth->execute();
>
> * * while (my $r = $sth->fetchrow_hashref) {
> * * * * process_one_row($r);
> * * }
>
> You read one row at a time into memory. Right?
>
> Wrong. DBD::mysql by default uses mysql_store_result, so the
> $sth->execute will (try to) load the entire table into memory. You need
> to tell DBD::mysql to use mysql_use_result instead:
>
> * * $dbh->{mysql_use_result} = 1;
>
> But then you cannot issue other queries until you have finished reading
> the results ...



This is weird. So what would be the difference between any of the
fetch/select<row> methods and the fetch/select<all> methods for mysql
if the default 'mysql_store_result' were used?
 
Reply With Quote
 
nolo contendere
Guest
Posts: n/a
 
      04-14-2008
On Apr 14, 10:04*am, nolo contendere <simon.c...@fmr.com> wrote:
> On Apr 13, 7:26*am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
>
>
>
> > if you do something like

>
> > * * my $sth = $dbh->prepare("select * from largetable");
> > * * $sth->execute();

>
> > * * while (my $r = $sth->fetchrow_hashref) {
> > * * * * process_one_row($r);
> > * * }

>
> > You read one row at a time into memory. Right?

>
> > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > $sth->execute will (try to) load the entire table into memory. You need
> > to tell DBD::mysql to use mysql_use_result instead:

>
> > * * $dbh->{mysql_use_result} = 1;

>
> > But then you cannot issue other queries until you have finished reading
> > the results ...

>
> This is weird. So what would be the difference between any of the
> fetch/select<row> methods and the fetch/select<all> methods for mysql
> if the default 'mysql_store_result' were used?


Sorry, my question should be: Why would you choose to use any of the
'row' methods over the 'all' methods if the default were being used?
 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      04-14-2008
On Sat, 12 Apr 2008 08:57:19 +0800 "Ela" <> wrote:

E> "Ted Zlatanov" <> wrote in message
E> news:...
>> On Fri, 11 Apr 2008 23:03:20 +0800 "Ela" <> wrote:
>>
>> You are loading all your data into memory. That's almost certainly the
>> wrong approach. Do you need all the data? What does the data look
>> like? What operations are you doing across the data? SQL has almost
>> every operation covered, and it's rare indeed that you will find it
>> inadequate for managing structured data.


E> Do you think the bottleneck according to the following statistics is from
E> mysqld?

E> 08:56:56 up 11:45, 5 users, load average: 1.00, 1.00, 1.00
E> 84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
E> CPU states: cpu user nice system irq softirq iowait idle
E> total 47.5% 0.0% 2.6% 0.0% 0.0% 0.0% 49.8%
E> cpu00 95.1% 0.0% 4.8% 0.0% 0.0% 0.0% 0.0%
E> cpu01 0.0% 0.0% 0.4% 0.0% 0.0% 0.0% 99.6%
E> Mem: 2061424k av, 2043432k used, 17992k free, 0k shrd, 17308k
E> buff
E> 515388k actv, 12k in_d, 37452k in_c
E> Swap: 2040212k av, 0k used, 2040212k free 1278708k
E> cached

E> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
E> 3825 ela 25 0 20732 20M 3024 R 49.7 1.0 550:05 0 mysqld
E> 3815 ela 24 0 60772 59M 10428 S 0.2 2.9 1:11 0 java
E> 4373 ela 15 0 1044 1044 808 R 0.2 0.0 0:00 1 top

I don't mean to be rude, but you haven't answered any of the questions
above, and as I said, no one can help you without understanding what
you're trying to do.

You could post your source code, but that's less useful than some
answers.

For what it's worth, the Java interpreter is taking up just a few MB of
memory in the `top' output you've pasted, and MySQL is not using much
either. Hit `M' in `top' to see the processes sorted by memory usage,
which should be helpful.

Ted
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      04-14-2008
On Apr 11, 8:46 am, "Ela" <e...@yantai.org> wrote:
> except in using mysql or related programs, how can perl be used to deal with
> large databases? Furthermore, how can I load that large db into cache in
> advance and later use another perl to access/search that db?


What kind of user interface do you want? If you have your database,
you already have all you need to access and search the data, i.e.,
SQL. You can certainly do it from the command line, and depending of
the kind of database you may have graphical utilities. Enterprise
Manager for SQL Server, or Toad for Mysql, or even Access (via ODBC)
for Mysql.

I would suggest using Access as a front end for your database is a
good, efficient, easy way for users that are comfortable for using
Access. They can easily do inserts, updates, deletes, and select
queries using query objects and the graphical query builder.

If this doesn't work, you can always build an HTML like network
interface that uses a browser for user interaction. This is commonly
done on the web and isn't difficult to do. You can program dynamic
queries on the fly using user supplied parameters.

The important thing to remember is that the RDBMS already does
everything you want -- no need to reinvent the wheel. All you have to
do is connect it to your chosen interface. Perl can easily translate
between the two.

CC
 
Reply With Quote
 
nolo contendere
Guest
Posts: n/a
 
      04-14-2008
On Apr 14, 12:27*pm, Lawrence Statton <yankeeinex...@gmail.com> wrote:
> nolo contendere <simon.c...@fmr.com> writes:
>
> > Sorry, my question should be: Why would you choose to use any of the
> > 'row' methods over the 'all' methods if the default were being used?

>
> Because DBI is providing an abstraction layer -- it is a "quirk" of
> the DBD::mysql driver that the row methods have the same footprint as
> the "all" methods.


Ok, didn't know if there was some non-obvious benefit for the mysql
driver to implement it that way...
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-14-2008
nolo contendere <> wrote:
> On Apr 14, 10:04=A0am, nolo contendere <simon.c...@fmr.com> wrote:
> > On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:

....
> > > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > > $sth->execute will (try to) load the entire table into memory. You
> > > need to tell DBD::mysql to use mysql_use_result instead:

> >
> > > =A0 =A0 $dbh->{mysql_use_result} =3D 1;

> >
> > > But then you cannot issue other queries until you have finished
> > > reading the results ...

> >
> > This is weird. So what would be the difference between any of the
> > fetch/select<row> methods and the fetch/select<all> methods for mysql
> > if the default 'mysql_store_result' were used?

>
> Sorry, my question should be: Why would you choose to use any of the
> 'row' methods over the 'all' methods if the default were being used?


For one, because you are using DBI and trying to abstract away the
peculiarities of the particular DBDs, to the extent possible. The behavior
of DBD::mysql is certainly peculiar; you should only bow down to it as
a last resort.

For another, store_result is much more memory efficient, as it seems to
store all the data in a highly compact low level structure. Then fetchrow
parcels it out into memory-inefficient Perl structures one row at a time.
While fetchall stores the whole result set in Perl structures, taking
several times as much memory.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-14-2008
"Peter J. Holzer" <hjp-> wrote:
> On 2008-04-13 02:05, <> wrote:
> > "Ela" <> wrote:
> >> It's only 3GB... but I find when I run the current java program with
> >> the mysql, it uses up all the 2GB memory...

> >
> > What uses up all 2GB? MySQL or Java? If MySQL, then you have it
> > poorly configured. If Java, then you must be loading all the data into
> > memory at one time, either accidentally or on purpose.

>
> I don't know about Java, but in Perl this is easy to do accidentally.
>
> if you do something like
>
> my $sth = $dbh->prepare("select * from largetable");
> $sth->execute();
>
> while (my $r = $sth->fetchrow_hashref) {
> process_one_row($r);
> }
>
> You read one row at a time into memory. Right?
>
> Wrong. DBD::mysql by default uses mysql_store_result, so the
> $sth->execute will (try to) load the entire table into memory. You need
> to tell DBD::mysql to use mysql_use_result instead:
>
> $dbh->{mysql_use_result} = 1;
>
> But then you cannot issue other queries until you have finished reading
> the results ...


I wasn't aware of that last limitation, or at least had forgotten it. I
thought store_result was the default because mysql locks table during the
read and they wanted to make those locks last as briefly as possible, by
reading the all of the data up front. But I guess it is also there to work
around this other limitation you point out.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
nolo contendere
Guest
Posts: n/a
 
      04-14-2008
On Apr 14, 12:43*pm, xhos...@gmail.com wrote:
> nolo contendere <simon.c...@fmr.com> wrote:
> > On Apr 14, 10:04=A0am, nolo contendere <simon.c...@fmr.com> wrote:
> > > On Apr 13, 7:26=A0am, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:

> ...
> > > > Wrong. DBD::mysql by default uses mysql_store_result, so the
> > > > $sth->execute will (try to) load the entire table into memory. You
> > > > need to tell DBD::mysql to use mysql_use_result instead:

>
> > > > =A0 =A0 $dbh->{mysql_use_result} =3D 1;

>
> > > > But then you cannot issue other queries until you have finished
> > > > reading the results ...

>
> > > This is weird. So what would be the difference between any of the
> > > fetch/select<row> methods and the fetch/select<all> methods for mysql
> > > if the default 'mysql_store_result' were used?

>
> > Sorry, my question should be: Why would you choose to use any of the
> > 'row' methods over the 'all' methods if the default were being used?

>
> For one, because you are using DBI and trying to abstract away the
> peculiarities of the particular DBDs, to the extent possible. *The behavior
> of DBD::mysql is certainly peculiar; you should only bow down to it as
> a last resort.
>
> For another, store_result is much more memory efficient, as it seems to
> store all the data in a highly compact low level structure. *Then fetchrow
> parcels it out into memory-inefficient Perl structures one row at a time.
> While fetchall stores the whole result set in Perl structures, taking
> several times as much memory.
>


Ok, that makes sense--just seems like a 'gotcha' to keep in mind when
using DBI with mysql, as Peter pointed out earlier in this thread.
Thx.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Design of a pipelined architecture/framework for handling large data sets nish Java 1 11-30-2006 07:19 PM
what is the reasonable (best?) Exception handling strategy? Petr Jakes Python 2 06-01-2006 12:49 PM
Best strategy for handling memory allocations Fr?d?ric Ledain C Programming 1 02-08-2005 02:45 PM
Python design strategy (was Python evolution: Unease) ajsiegel@optonline.net Python 1 01-04-2005 07:28 PM
Exception handling strategy Do ASP .Net 1 12-16-2003 04:43 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57