Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Java heap and big database queries

Reply
Thread Tools

Java heap and big database queries

 
 
Chris Uppal
Guest
Posts: n/a
 
      02-15-2007
David Segall wrote:

> I have not seen _any_ argument for preferring MySQL over the other
> databases I listed at http://database.profectus.com.au. Why risk a
> legal battle when you have several alternatives that do not attempt to
> restrict how you use the database?


I have my own opinions on the legal matters, but I think there is -- or at
least may be -- a technical argument for MySQL (in this case).

MySQL is, as far as I can tell, alone amongst the heavyweight databases in that
it can run without full ACID support (and may even defaults to doing so!).
Normally I would say that was a huge argument /against/ MySQL, but for the
particular application under discussion (Patricia's), I think that might be an
advantage. It ought to allow MySQL to run faster, perhaps significantly so.

Whether there is any real advantage, I have no idea. But I have assumed (maybe
wrongly) for a long time that the frequently heard justification: "I use MySQL
'cos it's lightning fast", is (if it's not pure myth) a consequence of running
without a the safety net that sensible DBMSs provide.

-- chris



 
Reply With Quote
 
 
 
 
Patricia Shanahan
Guest
Posts: n/a
 
      02-15-2007
Chris Uppal wrote:
> David Segall wrote:
>
>> I have not seen _any_ argument for preferring MySQL over the other
>> databases I listed at http://database.profectus.com.au. Why risk a
>> legal battle when you have several alternatives that do not attempt to
>> restrict how you use the database?

>
> I have my own opinions on the legal matters, but I think there is -- or at
> least may be -- a technical argument for MySQL (in this case).
>
> MySQL is, as far as I can tell, alone amongst the heavyweight databases in that
> it can run without full ACID support (and may even defaults to doing so!).
> Normally I would say that was a huge argument /against/ MySQL, but for the
> particular application under discussion (Patricia's), I think that might be an
> advantage. It ought to allow MySQL to run faster, perhaps significantly so.
>
> Whether there is any real advantage, I have no idea. But I have assumed (maybe
> wrongly) for a long time that the frequently heard justification: "I use MySQL
> 'cos it's lightning fast", is (if it's not pure myth) a consequence of running
> without a the safety net that sensible DBMSs provide.


Avoiding ACID overhead was one of my thoughts in picking MySQL originally.

However, the non-streaming of query results was very discouraging. It's
not just the immediate problem. It is the implication that the way I
want to work is not something MySQL is tuned for.

My Ph.D. adviser has also come down in favor of PostgreSQL, so I would
need a really strong reason to go the other way.

I had a few days delay while I did some other work, but now I'm in the
process of reconstructing the database. I'll soon find out how well it
works in practice.

Patricia
 
Reply With Quote
 
 
 
 
Mike Schilling
Guest
Posts: n/a
 
      02-15-2007

"Patricia Shanahan" <(E-Mail Removed)> wrote in message
news:eM5Bh.2552$(E-Mail Removed) nk.net...
>
> Avoiding ACID overhead was one of my thoughts in picking MySQL originally.
>
> However, the non-streaming of query results was very discouraging.


And the alternative being to transmit one row at a time isn't promising
either; for a large dataset, network overhead will dwarf any other
processing you might do.

> It's not just the immediate problem. It is the implication that the way I
> want to work is not something MySQL is tuned for.


Or that MySQL isn't tuned, period.


 
Reply With Quote
 
aloha.kakuikanu
Guest
Posts: n/a
 
      02-15-2007
On Feb 10, 9:59 am, Patricia Shanahan <(E-Mail Removed)> wrote:
> I'm using Java to query a MySQL database through com.mysql.jdbc.Driver.
>
> My Java program calculates a relatively small summary of the query
> result, and I'm sure my Java structure would fit in the default heap memory.
>
> However, I run out of heap memory while running with -Xmx700m.
>
> The program runs perfectly on a million line subset of the database, but
> fails on the full database, over 88 million lines.
>
> My query is:
>
> String getData = "select count(*), "
> + "sum(net_amt), "
> + "exists(select * from stores where stores.store_id =
> transactions.store_id) "
> + "from transactions group by txn_id;";
>
> ...
>
> Statement st = con.createStatement();
> ResultSet result = st.executeQuery(getData);
>
> There are about 10,160,053 distinct values of txn_id.
>
> Any ideas for what I may be doing wrong? Suggestions for how to obtain
> and scan a large query result without Java heap problems?


It can be argued that it is entirely possible to design your client
application is such a way that it never is required to fetch huge
result set. After all query results are presented to the end user, who
is capable digesting only rather limited amount of data. In your
example, why do you need to do partial aggregation and fetch
10,160,053 values, especially that you projected away txn_id column?



 
Reply With Quote
 
Patricia Shanahan
Guest
Posts: n/a
 
      02-16-2007
aloha.kakuikanu wrote:
....
> It can be argued that it is entirely possible to design your client
> application is such a way that it never is required to fetch huge
> result set. After all query results are presented to the end user, who
> is capable digesting only rather limited amount of data. In your
> example, why do you need to do partial aggregation and fetch
> 10,160,053 values, especially that you projected away txn_id column?

....

I'm doing data mining, not transaction processing. I am the end user.

If I were more skilled at SQL than Java, I might do more of the
processing in SQL, and have smaller results sets. As it is, I've been
using SQL for a few days, and Java for years...

I know, for example, exactly how to generate histogram tables in Java,
in a form that can be read into a spreadsheet for display.

Patricia
 
Reply With Quote
 
Vadim Tropashko
Guest
Posts: n/a
 
      02-16-2007
On Feb 15, 4:10 pm, Patricia Shanahan <(E-Mail Removed)> wrote:
> I'm doing data mining, not transaction processing. I am the end user.


More power to SQL, then.

> If I were more skilled at SQL than Java, I might do more of the
> processing in SQL, and have smaller results sets. As it is, I've been
> using SQL for a few days, and Java for years...
>
> I know, for example, exactly how to generate histogram tables in Java,
> in a form that can be read into a spreadsheet for display.


Here is sample book chapter with section on histograms http://
vadimtropashko.wordpress.com/files/2007/02/ch3.pdf
The whole SQL patterns book is here
http://www.rampant-books.com/book_20...ing_styles.htm


 
Reply With Quote
 
=?ISO-8859-1?Q?Arne_Vajh=F8j?=
Guest
Posts: n/a
 
      02-16-2007
Chris Uppal wrote:
> MySQL is, as far as I can tell, alone amongst the heavyweight databases in that
> it can run without full ACID support (and may even defaults to doing so!).
> Normally I would say that was a huge argument /against/ MySQL, but for the
> particular application under discussion (Patricia's), I think that might be an
> advantage. It ought to allow MySQL to run faster, perhaps significantly so.
>
> Whether there is any real advantage, I have no idea. But I have assumed (maybe
> wrongly) for a long time that the frequently heard justification: "I use MySQL
> 'cos it's lightning fast", is (if it's not pure myth) a consequence of running
> without a the safety net that sensible DBMSs provide.


I think that is correct.

Arne
 
Reply With Quote
 
aloha.kakuikanu
Guest
Posts: n/a
 
      02-16-2007
On Feb 15, 4:10 pm, Patricia Shanahan <(E-Mail Removed)> wrote:
> If I were more skilled at SQL than Java, I might do more of the
> processing in SQL, and have smaller results sets. As it is, I've been
> using SQL for a few days, and Java for years...


What people usually when they find a challenging query, is they post
it to a proper forum. You write a short description of a problem, a
sample data, and the expected output. Unlike typical java help
request, each SQL query is like a tiny math puzzle. This is why there
would be plenty of people who are eager to take on your challenge.

Unfortunately, the big 3 (and open source) communities has grown
apart, so that there is no single the most appropriate spot where to
post generic SQL problems. For Oracle the best place would be OTN, for
SQLServer public.microsoft.sqlserver.programming. I don't know what is
the best for MySQL.

 
Reply With Quote
 
Mark Thornton
Guest
Posts: n/a
 
      02-16-2007
Mike Schilling wrote:
> "Patricia Shanahan" <(E-Mail Removed)> wrote in message
> news:eM5Bh.2552$(E-Mail Removed) nk.net...
>
>>Avoiding ACID overhead was one of my thoughts in picking MySQL originally.
>>
>>However, the non-streaming of query results was very discouraging.

>
>
> And the alternative being to transmit one row at a time isn't promising
> either; for a large dataset, network overhead will dwarf any other
> processing you might do.
>


Transmitting rows a batch at a time (say 100) avoids that problem and
also has only a modest heap impact.

Mark Thornton
 
Reply With Quote
 
Mike Schilling
Guest
Posts: n/a
 
      02-16-2007

"Mark Thornton" <(E-Mail Removed)> wrote in message
news:MJeBh.11577$(E-Mail Removed)...
> Mike Schilling wrote:
>> "Patricia Shanahan" <(E-Mail Removed)> wrote in message
>> news:eM5Bh.2552$(E-Mail Removed) nk.net...
>>
>>>Avoiding ACID overhead was one of my thoughts in picking MySQL
>>>originally.
>>>
>>>However, the non-streaming of query results was very discouraging.

>>
>>
>> And the alternative being to transmit one row at a time isn't promising
>> either; for a large dataset, network overhead will dwarf any other
>> processing you might do.
>>

>
> Transmitting rows a batch at a time (say 100) avoids that problem and also
> has only a modest heap impact.


Yes. That MySQL doesn't batch rows is a sign of its primitiveness.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net 0 12-26-2008 09:29 AM
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net Web Controls 0 12-26-2008 06:11 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Python 0 12-24-2008 07:35 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Ruby 0 12-24-2008 05:07 AM
xslt queries in xml to SQL queries Ian Roddis Python 3 02-26-2006 06:49 PM



Advertisments