Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > A suggestion on data structures.

Reply
Thread Tools

A suggestion on data structures.

 
 
pereges
Guest
Posts: n/a
 
      08-06-2008
Hi, I'm in a bit of dilemma here.

I want to use an adjacency matrix for building edge lists in my
project.

The thing is an adjacency matrix will have nv * nv elements where nv
is number of vertices. Advantage is that it makes things very fast in
my project, only one pass over list of triangles as opposed to two.
But I'm a little skeptical about allocating memory for nv * nv
elements. nv can be very high up to 10,000,000 as well. Storing
10,000,000 * 10,000,000 elements might actually be an extremely
stupid idea which may not even be allowed. Also, while creating the
matrix also a lot of time may be spent :

typedef int **m;

matrix create_matrix(size_t cols, size_t rows)
{
size_t i;
matrix m;

m = (int **) malloc(rows * sizeof(int *));

for (i = 0; i < rows; i++)
{
m[i] = (int *) malloc(cols * sizeof(int));
}

return (m);
}

You still have to make nv passes here.

And then I want to do some initializations using memset i.e. all
values must be zero initially. I don't know the internal workings of
memset so once again I'm fearing that it will slow my program even
more. Is there an easier way out of this ?
 
Reply With Quote
 
 
 
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-06-2008
pereges <(E-Mail Removed)> writes:

> I want to use an adjacency matrix for building edge lists in my
> project.
>
> The thing is an adjacency matrix will have nv * nv elements where nv
> is number of vertices. Advantage is that it makes things very fast in
> my project, only one pass over list of triangles as opposed to two.
> But I'm a little skeptical about allocating memory for nv * nv
> elements. nv can be very high up to 10,000,000 as well. Storing
> 10,000,000 * 10,000,000 elements might actually be an extremely
> stupid idea which may not even be allowed. Also, while creating the
> matrix also a lot of time may be spent :
>
> typedef int **m;
>
> matrix create_matrix(size_t cols, size_t rows)
> {
> size_t i;
> matrix m;
>
> m = (int **) malloc(rows * sizeof(int *));
>
> for (i = 0; i < rows; i++)
> {
> m[i] = (int *) malloc(cols * sizeof(int));
> }
>
> return (m);
> }


You would not do this for sure. First, an adjacency matrix has only
0/1 bits in it, so there is no need to an array of ints. Secondly,
they are almost always sparse. You need to read up on representations
for sparse boolean matrices. Initially, of course, this is not a C
question but it will turn into one later!

--
Ben.
 
Reply With Quote
 
 
 
 
Harold Aptroot
Guest
Posts: n/a
 
      08-06-2008
"pereges" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi, I'm in a bit of dilemma here.
>
> I want to use an adjacency matrix for building edge lists in my
> project.
>
> The thing is an adjacency matrix will have nv * nv elements where nv
> is number of vertices. Advantage is that it makes things very fast in
> my project, only one pass over list of triangles as opposed to two.
> But I'm a little skeptical about allocating memory for nv * nv
> elements. nv can be very high up to 10,000,000 as well. Storing
> 10,000,000 * 10,000,000 elements might actually be an extremely
> stupid idea which may not even be allowed. Also, while creating the
> matrix also a lot of time may be spent :
>
> typedef int **m;
>
> matrix create_matrix(size_t cols, size_t rows)
> {
> size_t i;
> matrix m;
>
> m = (int **) malloc(rows * sizeof(int *));
>
> for (i = 0; i < rows; i++)
> {
> m[i] = (int *) malloc(cols * sizeof(int));
> }
>
> return (m);
> }
>
> You still have to make nv passes here.


As to a rather useless suggestion in this case (it will eat all your ram and
then some):
since all rows are going to be the same size, you don't actually need a
jagged array (unless the algorithm somehow manages to require it), you could
just malloc(rows * cols * sizeof(int))
in this situation you wouldn't need an int though - and as others have
pointed out, a sparse set would be much better

 
Reply With Quote
 
Bartc
Guest
Posts: n/a
 
      08-06-2008

"pereges" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi, I'm in a bit of dilemma here.
>
> I want to use an adjacency matrix for building edge lists in my
> project.
>
> The thing is an adjacency matrix will have nv * nv elements where nv
> is number of vertices. Advantage is that it makes things very fast in
> my project, only one pass over list of triangles as opposed to two.
> But I'm a little skeptical about allocating memory for nv * nv
> elements. nv can be very high up to 10,000,000 as well. Storing
> 10,000,000 * 10,000,000 elements might actually be an extremely
> stupid idea which may not even be allowed. Also, while creating the
> matrix also a lot of time may be spent :


It's unlikely any triangle could be adjacent to up to 10million others.
Usually up to just 3 others. Unless they are going to be connected in
unusual ways.

Suggest 3 ints per triangle, containing indices of other triangles. When
there are more than 3, have some overflow scheme (one of the 3 points to an
allocated list perhaps).

To test whether two A,B triangles are adjacent requires testing B to see if
A is in it's list.

You might try also a hash table: this has to be set up first with all
adjacencies. Once set up, create a hash value from the indices or addresses
of triangles A,B, and lookup; if present in the table, they are adjacent.
(Each table entry contains A:B, but you need many unused entries too.)

--
Bartc

 
Reply With Quote
 
soscpd
Guest
Posts: n/a
 
      08-07-2008
Hello Pereges, List

On Aug 6, 6:20 pm, pereges <(E-Mail Removed)> wrote:


I have some work with matrix of that size and bigger (snapshot of
magnetic surface spin in some surface), and databases really help me a
lot (too, make the code more easy to parallel computing). One simple
sqlite db can be really fast, and you will not waste precious time
writing 1k lines to make the application run 5% faster (I know... in
10 days, 5% reach 12 hours). You can too, hack into the db engine core
and drop a few useless stuff, to make it run faster. 1Tb database
isn't something outstanding, anyway (ok... but it is outstanding for a
sqlite like db - my advice - don't do it. Get a client/server engine,
instead. Hopefully, with the server running in another machine.).

Rafael
 
Reply With Quote
 
pereges
Guest
Posts: n/a
 
      08-07-2008
yeah, I've tried looking into hash table for edges. But the thing is
every edge is represented by two vertex indices and finding a has
function that takes the two indices as input and results in a unique
address is quite difficult especially with so many vertices.
 
Reply With Quote
 
Bartc
Guest
Posts: n/a
 
      08-07-2008
[Rapid detection of any of 10million items touching another]

"pereges" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> yeah, I've tried looking into hash table for edges. But the thing is
> every edge is represented by two vertex indices and finding a has
> function that takes the two indices as input and results in a unique
> address is quite difficult especially with so many vertices.


Your idea of having a 100-million-million-element set or array is not
practical to do in a simple way; using sparse arrays or databases (as soscpd
suggested) is going to slow things down.

How fast do you want this test to be?

Depending on how your triangles are stored, you might as well just examine
them without building any table. (I take it each triangle is defined as 3
edge indices, and each edge by 2 vertex indices? Or is the triangle defined
also by 3 vertices?)

A hash function does not need to give a unique result; only reasonably well
spread. However even such a table is going to need a lot of memory:

How many entries will there be? I'm guessing 3 entries per triangle (A:B,
A:C, A) which will include duplicating some information (so there will be
a B:A in there somewhere). So about 30 million entries. Each entry is 2 ints
(8 bytes); and with some extra capacity in the table, this will be at least
300Mbytes.

My original idea was to store 3 ints per triangle: total cost about
120Mbytes. So the options include:

Store no extra info, just analyze your existing data to see if A:B touch;
cost: 0Mbytes;
Store touching indices with each triangle; cost: 120Mbytes;
Hash table: cost: 300Mbytes
Simple bit-matrix: cost: 12.5million million bytes.

All getting a little faster, or would be if the 12,500 Gbytes was feasible.

--
Bartc

 
Reply With Quote
 
pereges
Guest
Posts: n/a
 
      08-07-2008
On Aug 7, 2:51 pm, "Bartc" <(E-Mail Removed)> wrote:

> Your idea of having a 100-million-million-element set or array is not
> practical to do in a simple way; using sparse arrays or databases (as soscpd
> suggested) is going to slow things down.
>
> How fast do you want this test to be?
>
> Depending on how your triangles are stored, you might as well just examine
> them without building any table. (I take it each triangle is defined as 3
> edge indices, and each edge by 2 vertex indices? Or is the triangle defined
> also by 3 vertices?)
>
> A hash function does not need to give a unique result; only reasonably well
> spread. However even such a table is going to need a lot of memory:
>
> How many entries will there be? I'm guessing 3 entries per triangle (A:B,
> A:C, A) which will include duplicating some information (so there will be
> a B:A in there somewhere). So about 30 million entries. Each entry is 2 ints
> (8 bytes); and with some extra capacity in the table, this will be at least
> 300Mbytes.
>
> My original idea was to store 3 ints per triangle: total cost about
> 120Mbytes. So the options include:
>
> Store no extra info, just analyze your existing data to see if A:B touch;
> cost: 0Mbytes;
> Store touching indices with each triangle; cost: 120Mbytes;
> Hash table: cost: 300Mbytes
> Simple bit-matrix: cost: 12.5million million bytes.
>
> All getting a little faster, or would be if the 12,500 Gbytes was feasible.
>


Hi, I think it should be reasonably fast. Because in my project there
are few pre processing steps to be carried out before the calculation
begins:

- Read mesh file and allocate memory for triangles, vertices etc.
- Traverse the lists to build edge list.
- From the list of edges, determine the list of sharp edges and
corenrs in the object.

Step no 2 takes massive amount of time because of the two passes. If
the mesh is very coarse with large number of vertices and triangles,
the user will have to wait for a huge time before any actual
calculation begins.

The triangle is represented by 3 indices into the vertex array:

v0 v1 v2

v0 to v1 is edge 1
v1 to v2 is edge 2
v2 to v0 is edge 3.

This is my data structure for a edge :

typedef struct
{
size_t triangle_index[2];
size_t vertex_index[2];

}edge;

triangle index will be storing the indices of neighbouring triangles
( exactly 2 in a non manifold mesh) which share this edge. Vertex
index array stores the vertex indices of the end points of the edge.

This is my program for building edge list and you can clearly see the
problem of 2 passes.

void build_edge_list(void)
{
size_t i, j;
size_t iv0, iv1, it0, it1;
node *ptr;
edge *e;

/* PASS 1 ON LIST OF TRIANGLES */

for (i = 0; i < mesh_ptr->ntri; i++)
{
for (j = 0; j < 3; j++)
{
iv0 = mesh_ptr->tri[i].v[j];
iv1 = mesh_ptr->tri[i].v[(j + 1) % 3];

if (iv0 < iv1)
{
add_to_list(&mesh_ptr->edge_list);
mesh_ptr->edge_list->data = malloc(sizeof(edge));
assert(mesh_ptr->edge_list->data != NULL);
e = (edge *)mesh_ptr->edge_list->data;
e->vertex_index[0] = iv0;
e->vertex_index[1] = iv1;
e->triangle_index[0] = i;
e->triangle_index[1] = i;
}
}
}

/* SECOND PASS ON LIST OF TRIANGLES */

for (i = 0; i < mesh_ptr->ntri; i++)
{
for (j = 0; j < 3; j++)
{
iv0 = mesh_ptr->tri[i].v[j];
iv1 = mesh_ptr->tri[i].v[(j + 1) % 3];

if (iv0 > iv1)
{
ptr = mesh_ptr->edge_list;
while (ptr != NULL)
{
e = (edge *)ptr->data;

if (e->vertex_index[0] == iv1)
{
if (e->vertex_index[1] == iv0)
{
if (e->triangle_index[0] == e-
>triangle_index[1])

{
e->triangle_index[1] = i;
}
}
}
ptr = ptr->next;
}
}
}
}
}


Basically what this algorithm does is it traverses list of triangles
and passes through every edge. Edges that satisfy iv0 < iv1 are put
into the list of edges. Its vertex indices are iv0 iv1 in that order
The other corresponding edge is obviously iv1 iv0. Lets call both
those components as half -edges. For this edge, record the same
triangle index for both entries.

In the next pass, we are only interested in edges where iv1 > iv0.
Then we pass linearly through the edge list and compare the vertex
indices with the edge vertex indices. If there is a match, then we
have found the other half edge with opposite orientation. Make the
necessary change in triangle index.

 
Reply With Quote
 
pereges
Guest
Posts: n/a
 
      08-07-2008
> Your idea of having a 100-million-million-element set or array is not
> practical to do in a simple way; using sparse arrays or databases (as soscpd
> suggested) is going to slow things down.


> How fast do you want this test to be?


> Depending on how your triangles are stored, you might as well just examine
> them without building any table. (I take it each triangle is defined as 3
> edge indices, and each edge by 2 vertex indices? Or is the triangle defined
> also by 3 vertices?)


> A hash function does not need to give a unique result; only reasonably well
> spread. However even such a table is going to need a lot of memory:


> How many entries will there be? I'm guessing 3 entries per triangle (A:B,
> A:C, A) which will include duplicating some information (so there will be
> a B:A in there somewhere). So about 30 million entries. Each entry is 2 ints
> (8 bytes); and with some extra capacity in the table, this will be at least
> 300Mbytes.


> My original idea was to store 3 ints per triangle: total cost about
> 120Mbytes. So the options include:


> Store no extra info, just analyze your existing data to see if A:B touch;
> cost: 0Mbytes;
> Store touching indices with each triangle; cost: 120Mbytes;
> Hash table: cost: 300Mbytes
> Simple bit-matrix: cost: 12.5million million bytes.


> All getting a little faster, or would be if the 12,500 Gbytes was feasible.


Hi, I think it should be reasonably fast. Because in my project there
are few pre processing steps to be carried out before the calculation
begins:

- Read mesh file and allocate memory for triangles, vertices etc.
- Traverse the lists to build edge list.
- From the list of edges, determine the list of sharp edges and
corenrs in the object.

Step no 2 takes massive amount of time because of the two passes. If
the mesh is very coarse with large number of vertices and triangles,
the user will have to wait for a huge time before any actual
calculation begins.

The triangle is represented by 3 indices into the vertex array:

v0 v1 v2

v0 to v1 is edge 1
v1 to v2 is edge 2
v2 to v0 is edge 3.

This is my data structure for a edge :

typedef struct
{
size_t triangle_index[2];
size_t vertex_index[2];

}edge;

triangle index will be storing the indices of neighbouring triangles
( exactly 2 in a non manifold mesh) which share this edge. Vertex
index array stores the vertex indices of the end points of the edge.

This is my program for building edge list and you can clearly see the
problem of 2 passes.

void build_edge_list(void)
{
size_t i, j;
size_t iv0, iv1, it0, it1;
node *ptr;
edge *e;

/* PASS 1 ON LIST OF TRIANGLES */

for (i = 0; i < mesh_ptr->ntri; i++)
{
for (j = 0; j < 3; j++)
{
iv0 = mesh_ptr->tri[i].v[j];
iv1 = mesh_ptr->tri[i].v[(j + 1) % 3];

if (iv0 < iv1)
{
add_to_list(&mesh_ptr->edge_list);
mesh_ptr->edge_list->data = malloc(sizeof(edge));
assert(mesh_ptr->edge_list->data != NULL);
e = (edge *)mesh_ptr->edge_list->data;
e->vertex_index[0] = iv0;
e->vertex_index[1] = iv1;
e->triangle_index[0] = i;
e->triangle_index[1] = i;
}
}
}

/* SECOND PASS ON LIST OF TRIANGLES */

for (i = 0; i < mesh_ptr->ntri; i++)
{
for (j = 0; j < 3; j++)
{
iv0 = mesh_ptr->tri[i].v[j];
iv1 = mesh_ptr->tri[i].v[(j + 1) % 3];

if (iv0 > iv1)
{
ptr = mesh_ptr->edge_list;
while (ptr != NULL)
{
e = (edge *)ptr->data;

if (e->vertex_index[0] == iv1)
{
if (e->vertex_index[1] == iv0)
{
if (e->triangle_index[0] ==
e->triangle_index[1])

{
e->triangle_index[1] = i;
}
}
}
ptr = ptr->next;
}
}
}
}

}

Basically what this algorithm does is it traverses list of triangles
and passes through every edge. Edges that satisfy iv0 < iv1 are put
into the list of edges. Its vertex indices are iv0 iv1 in that order
The other corresponding edge is obviously iv1 iv0. Lets call both
those components as half -edges. For this edge, record the same
triangle index for both entries.

In the next pass, we are only interested in edges where iv1 > iv0.
Then we pass linearly through the edge list and compare the vertex
indices with the edge vertex indices. If there is a match, then we
have found the other half edge with opposite orientation. Make the
necessary change in triangle index.
 
Reply With Quote
 
Bartc
Guest
Posts: n/a
 
      08-07-2008

"pereges" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...

> - Read mesh file and allocate memory for triangles, vertices etc.
> - Traverse the lists to build edge list.
> - From the list of edges, determine the list of sharp edges and
> corenrs in the object.


Just to clarify, your input includes all the vertices, and specifies all the
triangles in terms of those vertices?

So your real problem (the one you were considering the huge 1e7 x 1e7 array
for) is building the edge table (your struct edge) quickly?


--
Bartc

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
data structure suggestion (native python datatypes or sqlite;compound select) Vlastimil Brom Python 0 09-16-2010 10:11 PM
Wording suggestion for documentation (Data Model) Jason R. Coombs Python 0 06-13-2008 12:16 AM
seeking suggestion for posting spreadsheet data on web Ross HTML 11 08-24-2005 07:28 AM
Wireless Access Point Suggestion =?Utf-8?B?UyBMYW5l?= Wireless Networking 2 06-24-2005 10:39 PM
Need suggestion on data structure John C++ 5 01-23-2004 12:28 PM



Advertisments