Suggest improvements

Michael Angelo Ravera · Oct 14, 2010

I have a large vector i of integers and I need to identify repeated
values and runs (of length at least 3) in it. For instance,
i = 1 2 3 5 8 8 12 6 5 4
has a run of 3 (with increment 1), a singleton 5, a duplicated 8, a
singleton 12, and a run of 3 (with increment -1). (The decomposition
might not be unique, but that is a secondary concern.)

My approach has been to write a function to identify the first pattern
only, and to call this function repeatedly starting right after the
last identified pattern. In the above example, I would call the
function successively with
i[0],i[3],i[4],i[6], and i[7].

I am concerned about performance and would be grateful for insights
and suggestions.

inline void getMultIncr(int* i, int *mult, int *incr, int size)
{
// i contains a pointer to the (remainder of the) vector
// *mult is a pointer to the number of elements in the current pattern
// *incr is a pointer to the increment in the current pattern
// size contains the number of elements in the (remainder of the)
vector

int mark;
int k;

*mult = 1;
*incr = 0;

if (size == 1) return;

//try to find duplicated values
mark = i[0];
for (k=1; k < size; k++)
{
if (i[k] != mark) break;
(*mult)++;
}
if (*mult > 1 || size == 2) return;

// now look for possible runs
*incr = i[1] - i[0];
if (i[2] - i[1] != *incr) return;

*mult = 3;
for (k=3; k < size; k++)
{
if (i[k] - i[k-1] != *incr) break;
(*mult)++;
}
return;

}- Hide quoted text -

- Show quoted text -

If you are trying to compress
1 2 3 5 8 8 12 6 5 4
into 1, next two +1, 5, 8 twice, 12, 6, next 2 -1 so that it looks
something like:
1 2+1 5 2*8 12 6 2-1

and you only care about duplicated consecutive values, then you can do
the whole calculation in one pass with a difference and run length
counter

int idx_in;
int idx_out;
int diff;
int prev_diff;
int run_len;
int out_array [];
int in_array [];
int in_size; /* Argument */

for (out_array [0] = in_array [0], idx_in = 1, idx_out = 1,
prev_diff = impossible_value, run_len = 0;
idx_in < in_size; idx_in ++)
{
if (!(diff = in_array [idx_in] - in_array [idx_in - 1]) ||
(diff == prev_diff))
run_len ++;
else
{
if (run_len)
{
if (prev_diff)
{
out_array [idx_out ++] = previous_item_runs (run_len)
/* with or without +1 */
out_array [idx_out ++] = prev_diff;
}
else
out_array [idx_out ++] = previous_item_is_dup (run_len);
run_len = 0;
}
out_array [idx_out ++] = in)array [idx_in];
}
prev_diff = diff;
}

There are a couple of subtleties not handled, but it will be hard to
do much better.

The reason for requiring a run of three but only a duplicate of two is
that you probably need three cells to note a run, but only two to note
a duplicate.

Dann Corbit · Oct 14, 2010

If you are trying to compress
1 2 3 5 8 8 12 6 5 4
into 1, next two +1, 5, 8 twice, 12, 6, next 2 -1 so that it looks
something like:
1 2+1 5 2*8 12 6 2-1

and you only care about duplicated consecutive values, then you can do
the whole calculation in one pass with a difference and run length
counter

If he just wants to compress the data, there are more effective methods
than RLL.

Michael Angelo Ravera · Oct 15, 2010

If he just wants to compress the data, there are more effective methods
than RLL.

Oh, I agree, but, if you want to do compression for something like a
real-time video, this is the kind of problem that you are trying to
solve.

Tim Rentsch · Oct 17, 2010

Gus Gassmann said:
I have a large vector i of integers and I need to identify repeated
values and runs (of length at least 3) in it. For instance,
i = 1 2 3 5 8 8 12 6 5 4
has a run of 3 (with increment 1), a singleton 5, a duplicated 8, a
singleton 12, and a run of 3 (with increment -1). (The decomposition
might not be unique, but that is a secondary concern.)

My approach has been to write a function to identify the first pattern
only, and to call this function repeatedly starting right after the
last identified pattern. In the above example, I would call the
function successively with
i[0],i[3],i[4],i[6], and i[7].

I am concerned about performance and would be grateful for insights
and suggestions. [snip]

Two suggestions (given the later information that this is being
done to transmit an array as XML):

1. A better perspective might be to focus on what
are the endpoints of the relevant sub-array, and
leave the size of the sub-array as a secondary
(derived) quantity;

2. The larger problem (of transmitting the entire array)
is still small enough so that it might reasonably be
done as just a single function.

As the saying goes, I have a modest example here:

void
transmit_array_values( int values[], unsigned n ){
int delta;
unsigned i, j;

for( i = 0, j = 1; ALWAYS( j-i == 1 ) && j < n; j++ ){
delta = values[j] - values;
while( j < n && values[j] - values[j-1] == delta ) j++;

if( ALWAYS( j-i > 1 ) && delta == 0 ){
repetition( j-i, values );
i = j;

} else if( j-i > 2 ){
progression( j-i, values, delta );
i = j;

} else if( ALWAYS( j-i == 2 ) ){
singleton( values );
i = --j;

}
}

if( i < n ) singleton( values );
}

Notes --

The variable 'i' holds the start of the (next) relevant sub-array,
the variable 'j' holds one past the end (of the same sub-array);
this makes the size be 'j-i' for the non-singleton cases. Since
the singleton case sends only one value, but j-i == 2, we
decrement j to start the next sub-array at the next array element
(ie, right after the single value just transmitted).

The different kinds of subsequences -- duplicates, runs, single
values -- are handled by 'repetition()', 'progression()',
and 'singleton()', respectively. (Yes, clearly these function
names could be better; I thought these names would serve
their purpose well enough.)

The ALWAYS() macro yields the logical value of its argument
expression, but also requires the value to be true. Sort of
a combination of a truth value and an assertion -- in fact a
plausible definition could be

#define ALWAYS(e) (assert(e), 1)

I think it should be easy enough to look at the function
logic and construct a convincing argument that each of the
ALWAYS() conditions given above is in fact always true.
(Disclaimer: assuming no bad effects from integer overflow,
etc, which of course would mean all bets are off.)

At the end of the outer loop there might be one element
left over that hasn't yet been transmitted, which occurs
exactly when 'i < n'. Hence the last statement.

How to multiply two matrices of size in using inline assembly in C++	3	Mar 3, 2024
Could you fix my code please? (i get no output after inputing the number)	1	Oct 16, 2023
Fibonacci	0	May 13, 2023
Code working properly in VS code for every test case but assigned wrong when submitted why?	0	Aug 21, 2022
Difference between using "let" in a "for" loop	0	Jul 3, 2022
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Minimum Total Difficulty	0	Nov 15, 2023
JavaFX tags not wrapping around	0	Sep 25, 2024

Suggest improvements

Michael Angelo Ravera

Dann Corbit

Michael Angelo Ravera

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads