An algorithm questions

Mathieu Dutour · May 31, 2011

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Right now, I am using the following structure

typedef struct {
int MaxElement;
int *ListElementInSubset;
int *ListNext;
int *ListPrev;
} IntegerSubsetStorage;

That is the code reads as
---MaxElement is the number N
---ListElementInSubset has size N and is a 0/1 vector to answer to the
first question.
---ListNext and ListPrev are array of size N+2 that indicate what is the
next element in X and what is the previous element in X (this allows
to answer the last 3 questions).

All this works fine and I can answer all questions without any iterative
loop. But can it be done better?

Mathieu

Stefan Ram · May 31, 2011

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside

inline int contains( int const set[], int const i ){ return set[ i ]; }

---remove an element of X

inline void remove( int set[], int const i ){ set[ i ]= 0; }

---add an element to X.

inline void add( int set[], int const i ){ set[ i ]= 1; }

---get one element from X if it is non-empty.

int get( int set[], int const n )
{ int i = -1; set[ n ]= -1;
while( !set[ ++i ]); return set[ i ]; }

Roberto Waltman · May 31, 2011

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Right now, I am using the following structure

typedef struct {
int MaxElement;
int *ListElementInSubset;
int *ListNext;
int *ListPrev;
} IntegerSubsetStorage;

I would use a bitmap. 200 booleans could be stored in 7 32-bit
integers.
You will still need loops to find the first/next bit in a set, but the
loop bounds are small (7, 32) and the loops could be unrolled.

Gene · Jun 1, 2011

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Right now, I am using the following structure

typedef struct {
int MaxElement;
int *ListElementInSubset;
int *ListNext;
int *ListPrev;

} IntegerSubsetStorage;

That is the code reads as
---MaxElement is the number N
---ListElementInSubset has size N and is a 0/1 vector to answer to the
first question.
---ListNext and ListPrev are array of size N+2 that indicate what is the
next element in X and what is the previous element in X (this allows
to answer the last 3 questions).

All this works fine and I can answer all questions without any iterative
loop. But can it be done better?

Mathieu

The advice to use bitmaps for N=200 is good.
I'll point out however, that your method can be made simpler and
probably faster by implementing the linked list in the same array you
use to indicate membership.

in other words, use something like the following (UNTESTED) cod:

// Sets hold 1 .. SET_SIZE
#define SET_SIZE 200

typedef unsigned char INDEX; // big enough for N < 256

typedef struct {
INDEX next, prev;
} SET_NODE;

// Use 0'th element for list header...
typedef SET_NODE SET[SET_SIZE];

// Initialize each item to a self-circular list. There are other
// possibilities, but this has a nice symmetry, and 0 won't work.
void init(SET set)
{
INDEX i;
for (i = 0; i < SET_SIZE; i++)
set.next = set.prev = i;
}

// If item is not self-circular, it's in the set.
int is_member(SET set, int item)
{
return set[item].next != item;
}

// Join item's list to head's list.
void add(SET set, int item)
{
if (!is_member(set, item)) {
INDEX head = set[0].next;
set[item].prev = (INDEX)0;
set[item].next = head;
set[0].next = set[head].prev = (INDEX)item;
}
}

// Unlink item from head's list and restore it's self-loop.
void remove(SET set, int item)
{
if (is_member(set, item)) {
set[set[item].next].prev = set[item].prev;
set[set[item].prev].next = set[item].next;
set[item].prev = set[item].next = (INDEX)item;
}
}

// Returns 0 if there is no member, else the last member added.
int get_member(SET set)
{
return set[0].next;
}

Gene · Jun 4, 2011

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Click to expand...

Right now, I am using the following structure

Click to expand...

typedef struct {
int MaxElement;
int *ListElementInSubset;
int *ListNext;
int *ListPrev;
} IntegerSubsetStorage;

Click to expand...

That is the code reads as
---MaxElement is the number N
---ListElementInSubset has size N and is a 0/1 vector to answer to the
first question.
---ListNext and ListPrev are array of size N+2 that indicate what is the
next element in X and what is the previous element in X (this allows
to answer the last 3 questions).

Click to expand...

All this works fine and I can answer all questions without any iterative
loop. But can it be done better?

Click to expand...

Nice little problem. Here is a list of the different answers you got
plus a couple not mentioned.

Storage:
Mathieu Dutour -- Doubly linked list in two arrays + a {1,0} array
Gene -- Doubly linked list in one array
Stefan Ram -- Int array with {1,0} entries with linear search
China Blue Angels -- Bit vector, keep min and max
Roberto Waltman -- Bit vector, short search for get
Richard Harter -- Tiered bit vector (see below)
Richard Harter -- The devil's list algorithm (see below)
Richard Harter -- The packed set algorithm

One of the problems with asking for the "best" solution is that there
is no best solution for all environments. "Fast" depends upon sundry
considerations. Thus in the past century execution times were roughly
proportional to instructions. In this century cache misses are often
the dominant factor affecting execution times. In algorithm analysis
it is customary to characterize algorithms by their big O behaviour.
However this often is inappropriate for small n. Another factor is
that the cost of very simple functions is dominated by context
switching costs. Etc.

All of that said, it is useful to look at (a) the instruction cost,
and (b) the storage used. The storage costs are:

Storage use:
Devil's list algorithm: 3 words per item
Doubly linked list: 3 words per item (Dotour)
Doubly linked list: 2 words per item (Gene)

Just saying... The array implementation of the linked list can use
indexes sized for the set. Thus the items in my code only use half as
word (2 bytes) if N < 256, and of course 1 word (4 bytes) would
suffice for N < 2^16.

This is an old space-saving trick. Back in the day, gcc allocated
syntax tree nodes in arrays so "pointers" could be 2 bytes rather than
4. Doubt that's still true because it limited syntax trees to 64K
nodes.

Optimize a discovery algorithm	15	Jan 1, 2014
Algorithm	1	Dec 15, 2019
Program to find the largest integer element of an array.	1	Mar 2, 2022
Rock paper scissors in python with "algorithm"	1	Feb 27, 2022
Java MemoryLayout/ValueLayout Questions.	2	Feb 5, 2023
Accessing array index addresses with custom datatype in a function	0	Jun 2, 2022
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Help with an algorythm	5	Aug 30, 2024

An algorithm questions

Mathieu Dutour

Stefan Ram

Roberto Waltman

Gene

Gene

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads