Velocity Reviews > Suggest improvements

# Suggest improvements

Michael Angelo Ravera
Guest
Posts: n/a

 10-14-2010
On Oct 14, 4:51*am, Gus Gassmann <(E-Mail Removed)>
wrote:
> I have a large vector i of integers and I need to identify repeated
> values and runs (of length at least 3) in it. For instance,
> i = 1 2 3 5 8 8 12 6 5 4
> has a run of 3 (with increment 1), a singleton 5, a duplicated 8, a
> singleton 12, and a run of 3 (with increment -1). (The decomposition
> might not be unique, but that is a secondary concern.)
>
> My approach has been to write a function to identify the first pattern
> only, and to call this function repeatedly starting right after the
> last identified pattern. In the above example, I would call the
> function successively with
> i[0],i[3],i[4],i[6], and i[7].
>
> I am concerned about performance and would be grateful for insights
> and suggestions.
>
> inline void getMultIncr(int* i, int *mult, int *incr, int size)
> {
> // i contains a pointer to the (remainder of the) vector
> // *mult is a pointer to the number of elements in the current pattern
> // *incr is a pointer to the increment in the current pattern
> // size contains the number of elements in the (remainder of the)
> vector
>
> * * * * int mark;
> * * * * int k;
>
> * * * * *mult = 1;
> * * * * *incr = 0;
>
> * * * * if (size == 1) return;
>
> //try to find duplicated values
> * * * * mark = i[0];
> * * * * for (k=1; k < size; k++)
> * * * * {
> * * * * * * * * if (i[k] != mark) break;
> * * * * * * * * (*mult)++;
> * * * * }
> * * * * if (*mult > 1 || size == 2) return;
>
> // now look for possible runs
> * * * * *incr = i[1] - i[0];
> * * * * if (i[2] - i[1] != *incr) return;
>
> * * * * *mult = 3;
> * * * * for (k=3; k < size; k++)
> * * * * {
> * * * * * * * * if (i[k] - i[k-1] != *incr) break;
> * * * * * * * * (*mult)++;
> * * * * }
> * * * * return;
>
>
>
> }- Hide quoted text -
>
> - Show quoted text -

If you are trying to compress
1 2 3 5 8 8 12 6 5 4
into 1, next two +1, 5, 8 twice, 12, 6, next 2 -1 so that it looks
something like:
1 2+1 5 2*8 12 6 2-1

and you only care about duplicated consecutive values, then you can do
the whole calculation in one pass with a difference and run length
counter

int idx_in;
int idx_out;
int diff;
int prev_diff;
int run_len;
int out_array [];
int in_array [];
int in_size; /* Argument */

for (out_array [0] = in_array [0], idx_in = 1, idx_out = 1,
prev_diff = impossible_value, run_len = 0;
idx_in < in_size; idx_in ++)
{
if (!(diff = in_array [idx_in] - in_array [idx_in - 1]) ||
(diff == prev_diff))
run_len ++;
else
{
if (run_len)
{
if (prev_diff)
{
out_array [idx_out ++] = previous_item_runs (run_len)
/* with or without +1 */
out_array [idx_out ++] = prev_diff;
}
else
out_array [idx_out ++] = previous_item_is_dup (run_len);
run_len = 0;
}
out_array [idx_out ++] = in)array [idx_in];
}
prev_diff = diff;
}

There are a couple of subtleties not handled, but it will be hard to
do much better.

The reason for requiring a run of three but only a duplicate of two is
that you probably need three cells to note a run, but only two to note
a duplicate.

Dann Corbit
Guest
Posts: n/a

 10-14-2010
In article <aeebbeec-b6aa-47d3-9e31-6257da2573e5
> If you are trying to compress
> 1 2 3 5 8 8 12 6 5 4
> into 1, next two +1, 5, 8 twice, 12, 6, next 2 -1 so that it looks
> something like:
> 1 2+1 5 2*8 12 6 2-1
>
> and you only care about duplicated consecutive values, then you can do
> the whole calculation in one pass with a difference and run length
> counter
>

If he just wants to compress the data, there are more effective methods
than RLL.

Michael Angelo Ravera
Guest
Posts: n/a

 10-15-2010
On Oct 14, 1:34*pm, Dann Corbit <(E-Mail Removed)> wrote:
> In article <aeebbeec-b6aa-47d3-9e31-6257da2573e5
>
> > If you are trying to compress
> > 1 2 3 5 8 8 12 6 5 4
> > into 1, next two +1, 5, 8 twice, 12, 6, next 2 -1 so that it looks
> > something like:
> > 1 2+1 5 2*8 12 6 2-1

>
> > and you only care about duplicated consecutive values, then you can do
> > the whole calculation in one pass with a difference and run length
> > counter

>
> If he just wants to compress the data, there are more effective methods
> than RLL.

Oh, I agree, but, if you want to do compression for something like a
real-time video, this is the kind of problem that you are trying to
solve.

Tim Rentsch
Guest
Posts: n/a

 10-17-2010
Gus Gassmann <(E-Mail Removed)> writes:

> I have a large vector i of integers and I need to identify repeated
> values and runs (of length at least 3) in it. For instance,
> i = 1 2 3 5 8 8 12 6 5 4
> has a run of 3 (with increment 1), a singleton 5, a duplicated 8, a
> singleton 12, and a run of 3 (with increment -1). (The decomposition
> might not be unique, but that is a secondary concern.)
>
> My approach has been to write a function to identify the first pattern
> only, and to call this function repeatedly starting right after the
> last identified pattern. In the above example, I would call the
> function successively with
> i[0],i[3],i[4],i[6], and i[7].
>
> I am concerned about performance and would be grateful for insights
> and suggestions. [snip]

Two suggestions (given the later information that this is being
done to transmit an array as XML):

1. A better perspective might be to focus on what
are the endpoints of the relevant sub-array, and
leave the size of the sub-array as a secondary
(derived) quantity;

2. The larger problem (of transmitting the entire array)
is still small enough so that it might reasonably be
done as just a single function.

As the saying goes, I have a modest example here:

void
transmit_array_values( int values[], unsigned n ){
int delta;
unsigned i, j;

for( i = 0, j = 1; ALWAYS( j-i == 1 ) && j < n; j++ ){
delta = values[j] - values[i];
while( j < n && values[j] - values[j-1] == delta ) j++;

if( ALWAYS( j-i > 1 ) && delta == 0 ){
repetition( j-i, values[i] );
i = j;

} else if( j-i > 2 ){
progression( j-i, values[i], delta );
i = j;

} else if( ALWAYS( j-i == 2 ) ){
singleton( values[i] );
i = --j;

}
}

if( i < n ) singleton( values[i] );
}

Notes --

The variable 'i' holds the start of the (next) relevant sub-array,
the variable 'j' holds one past the end (of the same sub-array);
this makes the size be 'j-i' for the non-singleton cases. Since
the singleton case sends only one value, but j-i == 2, we
decrement j to start the next sub-array at the next array element
(ie, right after the single value just transmitted).

The different kinds of subsequences -- duplicates, runs, single
values -- are handled by 'repetition()', 'progression()',
and 'singleton()', respectively. (Yes, clearly these function
names could be better; I thought these names would serve
their purpose well enough.)

The ALWAYS() macro yields the logical value of its argument
expression, but also requires the value to be true. Sort of
a combination of a truth value and an assertion -- in fact a
plausible definition could be

#define ALWAYS(e) (assert(e), 1)

I think it should be easy enough to look at the function
logic and construct a convincing argument that each of the
ALWAYS() conditions given above is in fact always true.
(Disclaimer: assuming no bad effects from integer overflow,
etc, which of course would mean all bets are off.)

At the end of the outer loop there might be one element
left over that hasn't yet been transmitted, which occurs
exactly when 'i < n'. Hence the last statement.