Velocity Reviews > clustering in C

# clustering in C

querypk@gmail.com
Guest
Posts: n/a

 05-29-2005
Can someone help me with C programming here.
If I have
x is a 2D array of 0's and 1's
ex: x = array([1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0])
what I want is a boundingbox over the region where we find clusters of
1's.So for instance in the above list first 3 roes and colums have 1's
so the area of that box is 3x3
so my final array should have an array of clusters of
1's like
coords = [ (x1,y1,x2,y2), ....]
where x1,y1 -> top left point
x2,y2 -> bottom right point of that rectangle.
Hope I am clear with my question.

[0, 0, 0, 0, 0,
+-----+
0,|1, 1,|0, 0,
| |
0,|0, 1,|0, 0,
+-----+
0, 0, 0, 0, 0]

something like above. The resultant array should have the area of the
each such box. SOmething like a flood fill algorithm.
I would like the algorithm to return all the rectangles .Not just the
minimum ones.

jburgy
Guest
Posts: n/a

 05-29-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Can someone help me with C programming here.

<snip>

Hi there "querypk",

Jan

Malcolm
Guest
Posts: n/a

 05-29-2005

<(E-Mail Removed)> wrote
> Can someone help me with C programming here.
> If I have
> x is a 2D array of 0's and 1's
> ex: x = array([1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
> [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
> [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
> [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0])
> what I want is a boundingbox over the region where we find clusters of
> 1's.So for instance in the above list first 3 roes and colums have 1's
> so the area of that box is 3x3
> so my final array should have an array of clusters of
> 1's like
> coords = [ (x1,y1,x2,y2), ....]
> where x1,y1 -> top left point
> x2,y2 -> bottom right point of that rectangle.
> Hope I am clear with my question.
>
> [0, 0, 0, 0, 0,
> +-----+
> 0,|1, 1,|0, 0,
> | |
> 0,|0, 1,|0, 0,
> +-----+
> 0, 0, 0, 0, 0]
>
> something like above. The resultant array should have the area of the
> each such box. SOmething like a flood fill algorithm.
> I would like the algorithm to return all the rectangles .Not just the
> minimum ones.
>

First thing to do is to allocate a temporary array of integers. Set the ones
to one and the zeroes to zero.

Then iterate over the array. When you hit a one, do a flood fill, filling
with 2, 3, 4 etc, like this

void floodfill(int * array, int x, int y, int width, int height, int fill)
{
if(x < 0 || x >= width)
return;
if(y < 0 || y >= height)
return;
if(array[y * width + x] == fill)
return;
if(array[y * width + x] == 0)
return;
array[y * width + x] = fill;
floodfill(array, x + 1, y, width, height, fill);
floodfill(array, x - 1, y, width, height, fill);
floodfill(array, x, y + 1, width, height, fill);
floodfill(array, x, y -1, width, height, fill);
}

The resulting array should be a list of zeroes, and clusters from 2 up. Also
keep a count of how many clusters you have found.

Now allocate an array of rectangle structures, and an array of flags. Set
all the falgs to zero.

Iterate over the array a second time. When you hit a non-zero, check the
flag. If it is clear, you have the top point of the cluster. If the array is
relatively small and time isn't critical, it is probably easiest to iterate
from the top left going down columns to find the left point, to iterate from
the bottom up to find the bottom point, and from the right going up columns
to find the right point.
(If you have a huge array with small clusters you can search the cluster
using the floodfill again, and store maximum and minimum points).

Then set the flag to mark the fact that you have found the rectnagle for
that cluster, and continue until you reach the end of the array.

Guest
Posts: n/a

 06-02-2005
hi jburgy,
that kind of helped I tied to modify the code so that instead of
clustering 1's it would cluster 0's just to better understand the
algorithm..But I guess I am making some mistake and it does'nt do the
revers of what it does now. Can you help here. basically I am trying to
understand. what should I change in the C code to make it work for 0

int hoshen_kopelman(int **matrix, int m, int n) {

uf_initialize(m * n / 2);

/* scan the matrix */

for (int i=0; i<m; i++)
for (int j=0; j<n; j++)
if (!matrix[i][j]) { // if occupied ...

int up = (i==0 ? 0 : matrix[i-1][j]); // look up
printf("%d\n",up);
int left = (j==0 ? 0 : matrix[i][j-1]); // look left

switch (!!up + !!left) {

case 0:
matrix[i][j] = uf_make_set(); // a new cluster
break;

case 1: // part of an existing cluster
matrix[i][j] = max(up,left); // whichever is nonzero is
labelled
break;

case 2: // this site binds two clusters
matrix[i][j] = uf_union(up, left);
break;
}

}

/* apply the relabeling to the matrix */

/* This is a little bit sneaky.. we create a mapping from the
canonical labels
determined by union/find into a new set of canonical labels, which
are
guaranteed to be sequential. */

int *new_labels = calloc(sizeof(int), n_labels); // allocate array,
initialized to zero

for (int i=0; i<m; i++)
for (int j=0; j<n; j++)
if (matrix[i][j]) {
int x = uf_find(matrix[i][j]);
if (new_labels[x] == 3) {
new_labels[0]++;
new_labels[x] = new_labels[0];
}
matrix[i][j] = new_labels[x];
}

int total_clusters = new_labels[0];

free(new_labels);
uf_done();

}
I canges the if(matrix[i][j]) to if(!matrix[i][j]) but it did not
update the other clusters.

http://splorg.org:8080/people/tobin/...nkopelman/hk.c

jburgy
Guest
Posts: n/a

 06-02-2005
(E-Mail Removed) wrote:
> hi jburgy,
> that kind of helped I tied to modify the code so that instead of
> clustering 1's it would cluster 0's just to better understand the
> algorithm..But I guess I am making some mistake and it does'nt do the
> revers of what it does now. Can you help here. basically I am trying to
> understand. what should I change in the C code to make it work for 0
>

<snip>

Did you honestly believe you could change a single statement in Tobin's
code and it would magically do what you want? If programming were that
simple, we probably wouldn't discuss it so much... If you looked
carefully, Tobin overwrites the 1's in 'matrix' with the appropriate
cluster label. That's a common trick in implementations of the
Hoshen-Kopelman algorithm. Unfortunately for you, it also means that
you can't simply switch 0 and 1. What you could do is preprocess your
matrix thusly:

for (i = 0; i < m; i++)
for (j = 0; j < n; j++)
matrix[i][j] = 1 - matrix[i][j];

before you feed it into the subroutine. Otherwise, you could take a
simpler implementation (sorry Tobin, but I find yours unnecessarily
complicated), sit down, have a beer, and only get up when you
completely understand what it does. Then, and only then, can you start
hacking it for your need. I'll look around home for a simpler
implementation.

Good luck,

Jan

Guest
Posts: n/a

 06-02-2005
Hi Jan,
I did preprocess my matrix replacing 1's with 0's. But the mistake i
did was I also modified the code for that '!' condition and some more
changes. That was the reason probably my resultant matrix was not
giving what I expected.But if the kept the code un modified and pre
processed the matrix alone it worked. Thanks for that. Yes as you said
I have to take more time to understand what exactly is going on.
Definitely a simpler implementation would really help.Thanks again for

Guest
Posts: n/a

 06-02-2005
One more question. If I pre process the matrix then what happens is we
are replacing 1's with 0's and otherway. So does Hoshen-Kopelman
algorithm always replaces the 1's ie as i read the occupied cell. What
if you want to replace un occupied cells as i discussed the previous
post. And what is the necessity for 'union and find'. Cant we just
scan through the matrix and look for the 0's or 1's and compare the
neighbors and replace the neighbors with the same label as the parent.

jburgy
Guest
Posts: n/a

 06-02-2005
(E-Mail Removed) wrote:
> One more question. If I pre process the matrix then what happens is we
> are replacing 1's with 0's and otherway. So does Hoshen-Kopelman
> algorithm always replaces the 1's ie as i read the occupied cell. What
> if you want to replace un occupied cells as i discussed the previous
> post. And what is the necessity for 'union and find'. Cant we just
> scan through the matrix and look for the 0's or 1's and compare the
> neighbors and replace the neighbors with the same label as the parent.

I reread you original question. Why did you want to modify Tobin's code
in the first place? It already labels clusters of 1's exactly like you
demand. It doesn't return a list of cluster coordinates because that's
not a well-defined question (consider the following situation):

000010000
000111000
001110000
000111100
000001000

what are the "corners" of this cluster of ones? In case I wasn't clear
enough in my first post, here's what the Hoshen-Kopelman algorithm does
(and if it's not what you're after, please accept my apologies)

110000000 110000000
100001110 100002220
000001110 -> 000002220
110000100 330000200
011000000 330000000

That's why it's called cluster labeling.

Yours,

Jan

Guest
Posts: n/a

 06-02-2005
I understand what you say. Ok let me explain clearly. I think
Hoshen-Kopelman algorithm is what I might need but ok consider this...
_ _ _ _ _ _ _
|2222| 1|3333 | 000010000
|2221| 1|1333 | <------ 000111000
|2211| 1|3333 | 001110000
- - - - - --------
111111111 000111100
_ _ _ _ _ _
|44444|1|555| 000001000
---------- ------
Considering that we are looking for un occupied cells (clustering 0's
rather than 1's)...
this is one cluster . and corners of cluster are in this case
{1,1,4,4}(topleft and bottom right corners of this rectangle)something
like a bounding box...
so that the resultant array will have
{ {1,1,4,4},{-,-,-,-} .........includes all rectangles. If there is a
cluster inside another cluster even that is considered as a seperate
rectangle...

jburgy
Guest
Posts: n/a

 06-02-2005
(E-Mail Removed) wrote:
> I understand what you say. Ok let me explain clearly. I think
> Hoshen-Kopelman algorithm is what I might need but ok consider this...
>

Dude, you need to work on your ascii art! Alright, this is what I
understood: you define the bounding box of an irregularly shaped
cluster as the smallest rectangle which inscribes it entirely, don't
you. For example

111111111 111111111
111111111 1+-----+1
111 11111 1|1 111|1
11 1 111 is bounded 1| 1 1|1
11 11 as follows 1| |1
1111 111 1|11 1|1
111111111 1+-----+1
111111111 111111111

Is that correct? Assuming it is, here's what you need to do:

1. make a backup copy of your matrix 'coz we're gonna overwrite it
2. swap its 0's and 1's
3. label the clusters of 1's (which now represent /empty/ cells)
4. allocate as many quadruples (for bbox coordinates) as you have
clusters, initialize them as follows (xmin=ymin=0, xmax=ncols,
ymax=nrows)
5. do a one-pass scan of the labeled matrix, updating the bbox
coordinates of the corresponding cluster each time you encounter one

That should be pretty simple. My initial guess was still right, step 3
does the "hard" work, 1 and 2 some pre-processing and 4 and 5 some
post-processing. I've gotta ask now: why exactly are you doing this?

Jan

P.S. By now, we are completely off-topic on comp.lang.c (I'm surprised
we haven't been flamed yet), you better follow-up on comp.programming.