Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Data structure problem to solve Linux memory fault

Reply
Thread Tools

Data structure problem to solve Linux memory fault

 
 
dalyea@gmail.com
Guest
Posts: n/a
 
      05-10-2007
I am running into a memory fault error when processing the
contents of a database table into a lookup hash.

The problem is that I store item equivalencies (IE) in a db table IE
with primary key (a, b). a and b are integers and correspond
to productId values.

For IE products a and b, I insert as primary key (a, b) and, notably,
leave out (b, a). This obviously reduces the IE table size by 50%.

To build a single lookup data structure prior to writing to a lookup
file,
I try to create hash %ie which for each row has:

$ie{a}{b}=1;
$ie{b}{a}=1;

Then to get all the IE for a product a, one just looks at the slice
%ie{a} and gets all the values b1, b2, b3, ... for it.

The IE table has 1.4 million rows for 80,000 products, and sometimes
(b, a) is written [which is not a problem really]. When I create %ie,
it must
be upwards of 2.7 million rows, and hence, I am getting a memory fault
error.

The problem is, can I use a better (meaning smaller) data structure
to capture the IE data?

I could lookup each product's IE data as needed, but that would be
80,000+ single queries to the database. I don't even want to try
that.

Also, my algorithm for matching products as IE pretty much puts
all combinations of (a, b, c, d) in the table for products a b c d.
There are 4c2 or 6 combinations, not counting order, but up to 12
combinations if pairs such as (a, b) and (b, a) are entered. But it
is
possible that (a, b) and (b, c) exist, but not (a, c). Therefore, my
%ie
structure is technically incomplete as it is today. (Though I know
from the way I entered IE data that is probably 99% complete.)
The point is, a single query won't necessarily get all the IE data
for a single product. The whole data structure should be pre-loaded
all at once, as I'm trying to do.

Any ideas?

Thanks,
David

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      05-10-2007
"(E-Mail Removed)" <(E-Mail Removed)> wrote:
> I am running into a memory fault error when processing the
> contents of a database table into a lookup hash.
>
> The problem is that I store item equivalencies (IE) in a db table IE
> with primary key (a, b). a and b are integers and correspond
> to productId values.
>
> For IE products a and b, I insert as primary key (a, b) and, notably,
> leave out (b, a). This obviously reduces the IE table size by 50%.


Is that important? If your data is so large that you need to resort to
space saving tricks even when storing it in a disk-based database, then
trying to load the whole thing into memory in an uncompressed format
seems to be a losing proposition.

>
> To build a single lookup data structure prior to writing to a lookup
> file,


Why? Database systems are designed to efficiently handle such tasks.
If you need to take the data *out* of a database to make a lookup file,
then what is the point of having it in a database in the first place?

> I try to create hash %ie which for each row has:
>
> $ie{a}{b}=1;
> $ie{b}{a}=1;
>
> Then to get all the IE for a product a, one just looks at the slice
> %ie{a} and gets all the values b1, b2, b3, ... for it.
>
> The IE table has 1.4 million rows for 80,000 products, and sometimes
> (b, a) is written [which is not a problem really]. When I create %ie,
> it must
> be upwards of 2.7 million rows, and hence, I am getting a memory fault
> error.


2.7e6/80000 = almost 34 equivalences per item on average. Is that right?

>
> The problem is, can I use a better (meaning smaller) data structure
> to capture the IE data?


Yes, but perhaps you should try to implement that structure in the
database, rather than Perl's memory. Probably the best way would be to
define a canonical identifier to each equivalence class. For example, just
pick the alphabetically first member of each class to be the canonical id
of that class.

In Perl , you would have two top-level hashes. One would be a simple
single-level hash which would map each item ID to the canonical ID for the
class it belongs to, and would have 80,000 entries (the number of items).
The other would have one entry for each canonical item, and the value would
be a reference to either a hash or an array with all the other items in
that class. For example, if you have A1, A2, A3 and B1, B2, B3, with the
As being equivalent to each other and the Bs being equivalent to each
other, you would have something like:

my %equiv=(
'A3' => 'A1',
'B3' => 'B1',
'B1' => 'B1',
'A1' => 'A1',
'B2' => 'B1',
'A2' => 'A1',
);
my %class=(
'B1' => {
'B3' => undef,
'B1' => undef,
'B2' => undef,
},
'A1' => {
'A3' => undef,
'A1' => undef,
'A2' => undef,
},
);


> I could lookup each product's IE data as needed, but that would be
> 80,000+ single queries to the database. I don't even want to try
> that.


If you are going to read 1.4 million rows anyway, I don't see why issuing
80,000 queries would be out of the question.

>
> Also, my algorithm for matching products as IE pretty much puts
> all combinations of (a, b, c, d) in the table for products a b c d.


If you are motivated to save 50% by only storing one ordering of each pair,
it seems silly to use many times as much space to store each pairing
separately. You are straining at a gnat but swallowing a camel.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      05-10-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) schreef:

> To build a single lookup data structure prior to writing to a lookup
> file, I try to create hash %ie which for each row has:
>
> $ie{a}{b}=1;
> $ie{b}{a}=1;


No need to add both, no need to give it a real value. Just make sure
that a < b, and let the value be undefined.


> Then to get all the IE for a product a, one just looks at the slice
> %ie{a} and gets all the values b1, b2, b3, ... for it.


So you also need a hash with a > b. (I am assuming that a and b are
never equal.)


> The problem is, can I use a better (meaning smaller) data structure
> to capture the IE data?


Use a proper database, with both (a,b) and (b,a) as index, and guard at
insert/update that a <= b.

> The point is, a single query won't necessarily get all the IE data
> for a single product. The whole data structure should be pre-loaded
> all at once, as I'm trying to do.


SELECT t1.a AS a, t1.b AS b, t2.b AS c FROM ab AS t1 JOIN ab AS t2 ON
t1.b = t2.a WHERE ...

--
Affijn, Ruud

"Gewoon is een tijger."

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Simple structure and copying data to pointer of the same structure A C++ 27 04-16-2011 11:07 PM
How to solve problem about "segmentation fault; core dump" in GCC? jaswinder C Programming 15 08-14-2010 12:37 AM
Please solve this(Structure Program) sdinu96 C Programming 1 05-22-2010 10:33 AM
Memory allocation in Structure to Structure pra_ramli@rediffmail.com C++ 2 03-09-2006 05:51 AM
Stack fault and page fault help S.Flournoy Computer Support 2 04-17-2004 04:23 PM



Advertisments