M
Michael Angelo Ravera
I have a large vector i of integers and I need to identify repeated
values and runs (of length at least 3) in it. For instance,
i = 1 2 3 5 8 8 12 6 5 4
has a run of 3 (with increment 1), a singleton 5, a duplicated 8, a
singleton 12, and a run of 3 (with increment -1). (The decomposition
might not be unique, but that is a secondary concern.)
My approach has been to write a function to identify the first pattern
only, and to call this function repeatedly starting right after the
last identified pattern. In the above example, I would call the
function successively with
i[0],i[3],i[4],i[6], and i[7].
I am concerned about performance and would be grateful for insights
and suggestions.
inline void getMultIncr(int* i, int *mult, int *incr, int size)
{
// i contains a pointer to the (remainder of the) vector
// *mult is a pointer to the number of elements in the current pattern
// *incr is a pointer to the increment in the current pattern
// size contains the number of elements in the (remainder of the)
vector
int mark;
int k;
*mult = 1;
*incr = 0;
if (size == 1) return;
//try to find duplicated values
mark = i[0];
for (k=1; k < size; k++)
{
if (i[k] != mark) break;
(*mult)++;
}
if (*mult > 1 || size == 2) return;
// now look for possible runs
*incr = i[1] - i[0];
if (i[2] - i[1] != *incr) return;
*mult = 3;
for (k=3; k < size; k++)
{
if (i[k] - i[k-1] != *incr) break;
(*mult)++;
}
return;
}- Hide quoted text -
- Show quoted text -
If you are trying to compress
1 2 3 5 8 8 12 6 5 4
into 1, next two +1, 5, 8 twice, 12, 6, next 2 -1 so that it looks
something like:
1 2+1 5 2*8 12 6 2-1
and you only care about duplicated consecutive values, then you can do
the whole calculation in one pass with a difference and run length
counter
int idx_in;
int idx_out;
int diff;
int prev_diff;
int run_len;
int out_array [];
int in_array [];
int in_size; /* Argument */
for (out_array [0] = in_array [0], idx_in = 1, idx_out = 1,
prev_diff = impossible_value, run_len = 0;
idx_in < in_size; idx_in ++)
{
if (!(diff = in_array [idx_in] - in_array [idx_in - 1]) ||
(diff == prev_diff))
run_len ++;
else
{
if (run_len)
{
if (prev_diff)
{
out_array [idx_out ++] = previous_item_runs (run_len)
/* with or without +1 */
out_array [idx_out ++] = prev_diff;
}
else
out_array [idx_out ++] = previous_item_is_dup (run_len);
run_len = 0;
}
out_array [idx_out ++] = in)array [idx_in];
}
prev_diff = diff;
}
There are a couple of subtleties not handled, but it will be hard to
do much better.
The reason for requiring a run of three but only a duplicate of two is
that you probably need three cells to note a run, but only two to note
a duplicate.