Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Java heap and big database queries (http://www.velocityreviews.com/forums/t390758-java-heap-and-big-database-queries.html)

Patricia Shanahan 02-10-2007 05:59 PM

Java heap and big database queries
 
I'm using Java to query a MySQL database through com.mysql.jdbc.Driver.

My Java program calculates a relatively small summary of the query
result, and I'm sure my Java structure would fit in the default heap memory.

However, I run out of heap memory while running with -Xmx700m.

The program runs perfectly on a million line subset of the database, but
fails on the full database, over 88 million lines.

My query is:

String getData = "select count(*), "
+ "sum(net_amt), "
+ "exists(select * from stores where stores.store_id =
transactions.store_id) "
+ "from transactions group by txn_id;";

....

Statement st = con.createStatement();
ResultSet result = st.executeQuery(getData);

There are about 10,160,053 distinct values of txn_id.

Any ideas for what I may be doing wrong? Suggestions for how to obtain
and scan a large query result without Java heap problems?

Thanks,

Patricia


Chris Uppal 02-10-2007 07:22 PM

Re: Java heap and big database queries
 
Patricia Shanahan wrote:

> The program runs perfectly on a million line subset of the database, but
> fails on the full database, over 88 million lines.


See this page:

http://dev.mysql.com/doc/refman/5.0/...ion-notes.html

and scan down about 2/3 to a sub-section titled "ResultSet".

(Made my hair stand on end... There is no way, just /no way/, I am ever going
to choose MySQL for /anything/.)

-- chris




Patricia Shanahan 02-10-2007 07:44 PM

Re: Java heap and big database queries
 
Chris Uppal wrote:
> Patricia Shanahan wrote:
>
>> The program runs perfectly on a million line subset of the database, but
>> fails on the full database, over 88 million lines.

>
> See this page:
>
> http://dev.mysql.com/doc/refman/5.0/...ion-notes.html
>
> and scan down about 2/3 to a sub-section titled "ResultSet".


Thanks. That certainly matches the symptoms I'm seeing.

> (Made my hair stand on end... There is no way, just /no way/, I am ever going
> to choose MySQL for /anything/.)


I picked MySQL because I had worked on a project that used it, but
for far smaller tables, and the DB comparison web pages I found seemed
to indicate it was reasonable performance for large tables.

I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.

Patricia

Paul Tomblin 02-10-2007 07:52 PM

Re: Java heap and big database queries
 
In a previous article, Patricia Shanahan <pats@acm.org> said:
>I'm very open to suggestions for better choices, especially after this
>experience. I need free or cheap for academic and personal use. No need
>for commercial use.


Postgresql implemented more of the SQL standard (although MySQL has caught
up in the last couple of releases as long as you use InnoDB) and is really
free, even for commercial use


--
Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/
Last I checked, it wasn't the power cord for the Clue Generator
that was sticking up your ass.
-- John Novak

Chris Smith 02-10-2007 08:31 PM

Re: Java heap and big database queries
 
Patricia Shanahan <pats@acm.org> wrote:
> I'm very open to suggestions for better choices, especially after this
> experience. I need free or cheap for academic and personal use. No need
> for commercial use.


I second the recommendation for PostgreSQL. I have used it extensively
from Java applications with large amounts of data, and have run into few
problems. I did run into a quirk in the 7.x release cycle, but I was
able to easily patch the JDBC driver, and the problem is completely
solved with the JDBC drivers for the 8.x versions.

PostgreSQL still does not implement all of JDBC perfectly. In
particular, PostgreSQL does fall back to pre-fetching more complex
result sets (e.g., scrollable result sets). A full list of limitations
is available at http://jdbc.postgresql.org/todo.html

That said, I am not aware of any low-cost database that does better than
PostgreSQL. It is far better than anything else I've tried.

--
Chris Smith

=?ISO-8859-1?Q?Arne_Vajh=F8j?= 02-10-2007 08:59 PM

Re: Java heap and big database queries
 
Patricia Shanahan wrote:
> I'm very open to suggestions for better choices, especially after this
> experience. I need free or cheap for academic and personal use. No need
> for commercial use.


MySQL, PostgreSQL and FireBird are the usual suspects for a
free database server.

Arne

David Segall 02-11-2007 04:52 AM

Re: Java heap and big database queries
 
Patricia Shanahan <pats@acm.org> wrote:

>Chris Uppal wrote:
>> Patricia Shanahan wrote:
>>
>>> The program runs perfectly on a million line subset of the database, but
>>> fails on the full database, over 88 million lines.

>>
>> See this page:
>>
>> http://dev.mysql.com/doc/refman/5.0/...ion-notes.html
>>
>> and scan down about 2/3 to a sub-section titled "ResultSet".

>
>Thanks. That certainly matches the symptoms I'm seeing.
>
>> (Made my hair stand on end... There is no way, just /no way/, I am ever going
>> to choose MySQL for /anything/.)

>
>I picked MySQL because I had worked on a project that used it, but
>for far smaller tables, and the DB comparison web pages I found seemed
>to indicate it was reasonable performance for large tables.
>
>I'm very open to suggestions for better choices, especially after this
>experience. I need free or cheap for academic and personal use. No need
>for commercial use.

I can't compare them but I have what I believe to be a complete list
of freely distributable "heavy duty" databases at
<http://database.profectus.com.au/>. Of these, MySql has the most
restrictive license and that includes IBM, Oracle and Microsoft.

Mike Schilling 02-11-2007 07:54 PM

Re: Java heap and big database queries
 

"Patricia Shanahan" <pats@acm.org> wrote in message
news:eql7a4$ivp$1@ihnp4.ucsd.edu...

> I'm very open to suggestions for better choices, especially after this
> experience. I need free or cheap for academic and personal use. No need
> for commercial use.


Piggybacking in Patricia's question, I'm curious if anyone has experience
with the newly open-sourced Ingres. (The commerical version, not University
Ingres.)



=?ISO-8859-1?Q?Arne_Vajh=F8j?= 02-12-2007 12:21 AM

Re: Java heap and big database queries
 
David Segall wrote:
> I can't compare them but I have what I believe to be a complete list
> of freely distributable "heavy duty" databases at
> <http://database.profectus.com.au/>. Of these, MySql has the most
> restrictive license and that includes IBM, Oracle and Microsoft.


The text for MySQL is rather imprecise.

You can use MySQL all you want with closed source.

You can use the MySQL connectors all you want with closed source.

As long as you do not distribute.

If you distribute your stuff and it is "linked" with MySQL
code then then you need to buy a commercial license or
make your stuff GPL (or at least FLOSS when it is MySQL).

How to define "linked" can be a bit fuzzy.

But it is not something special for MySQL - this is
how GPL works.

It is rather unusual to use GPL for a library - often
LGPL (or an Apache/BSD style license) is chosen.

The true GPL believers will not consider that a problem.

But there are a reason that Apache/BSD sometimes is called
business friendly open source license.

Arne






David Segall 02-12-2007 01:09 PM

Re: Java heap and big database queries
 
Arne Vajh°j <arne@vajhoej.dk> wrote:

>If you distribute your stuff and it is "linked" with MySQL
>code then then you need to buy a commercial license or
>make your stuff GPL (or at least FLOSS when it is MySQL).
>
>How to define "linked" can be a bit fuzzy.
>
>But it is not something special for MySQL - this is
>how GPL works.

I am not qualified to interpret the GPL license but MySQL AB's
interpretation contrasts with the thousands of commercial applications
that are "linked" to Linux and dozens of other GPLd programs. Since,
as far as I am aware, these problems don't arise with the other freely
distributable databases I list at <http://database.profectus.com.au>,
including those from Oracle and Microsoft, it seems foolish to choose
MySQL.


All times are GMT. The time now is 12:06 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.