An algorithm questions

M

Mathieu Dutour

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Right now, I am using the following structure

typedef struct {
int MaxElement;
int *ListElementInSubset;
int *ListNext;
int *ListPrev;
} IntegerSubsetStorage;

That is the code reads as
---MaxElement is the number N
---ListElementInSubset has size N and is a 0/1 vector to answer to the
first question.
---ListNext and ListPrev are array of size N+2 that indicate what is the
next element in X and what is the previous element in X (this allows
to answer the last 3 questions).

All this works fine and I can answer all questions without any iterative
loop. But can it be done better?

Mathieu
 
S

Stefan Ram

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside

inline int contains( int const set[], int const i ){ return set[ i ]; }
---remove an element of X

inline void remove( int set[], int const i ){ set[ i ]= 0; }
---add an element to X.

inline void add( int set[], int const i ){ set[ i ]= 1; }
---get one element from X if it is non-empty.

int get( int set[], int const n )
{ int i = -1; set[ n ]= -1;
while( !set[ ++i ]); return set[ i ]; }
 
R

Roberto Waltman

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Right now, I am using the following structure

typedef struct {
int MaxElement;
int *ListElementInSubset;
int *ListNext;
int *ListPrev;
} IntegerSubsetStorage;

I would use a bitmap. 200 booleans could be stored in 7 32-bit
integers.
You will still need loops to find the first/next bit in a set, but the
loop bounds are small (7, 32) and the loops could be unrolled.
 
G

Gene

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.

Right now, I am using the following structure

typedef struct {
  int MaxElement;
  int *ListElementInSubset;
  int *ListNext;
  int *ListPrev;

} IntegerSubsetStorage;

That is the code reads as
---MaxElement is the number N
---ListElementInSubset has size N and is a 0/1 vector to answer to the
   first question.
---ListNext and ListPrev are array of size N+2 that indicate what is the
   next element in X and what is the previous element in X (this allows
   to answer the last 3 questions).

All this works fine and I can answer all questions without any iterative
loop. But can it be done better?

  Mathieu

The advice to use bitmaps for N=200 is good.
I'll point out however, that your method can be made simpler and
probably faster by implementing the linked list in the same array you
use to indicate membership.

in other words, use something like the following (UNTESTED) cod:

// Sets hold 1 .. SET_SIZE
#define SET_SIZE 200

typedef unsigned char INDEX; // big enough for N < 256

typedef struct {
INDEX next, prev;
} SET_NODE;

// Use 0'th element for list header...
typedef SET_NODE SET[SET_SIZE];

// Initialize each item to a self-circular list. There are other
// possibilities, but this has a nice symmetry, and 0 won't work.
void init(SET set)
{
INDEX i;
for (i = 0; i < SET_SIZE; i++)
set.next = set.prev = i;
}

// If item is not self-circular, it's in the set.
int is_member(SET set, int item)
{
return set[item].next != item;
}

// Join item's list to head's list.
void add(SET set, int item)
{
if (!is_member(set, item)) {
INDEX head = set[0].next;
set[item].prev = (INDEX)0;
set[item].next = head;
set[0].next = set[head].prev = (INDEX)item;
}
}

// Unlink item from head's list and restore it's self-loop.
void remove(SET set, int item)
{
if (is_member(set, item)) {
set[set[item].next].prev = set[item].prev;
set[set[item].prev].next = set[item].next;
set[item].prev = set[item].next = (INDEX)item;
}
}

// Returns 0 if there is no member, else the last member added.
int get_member(SET set)
{
return set[0].next;
}
 
G

Gene

I want to be able to store some numbers (in an interval [1..N] with N small
about 200) in a set X and be able to do the following
---test if a number i is inside
---remove an element of X
---add an element to X.
---get one element from X if it is non-empty.
And I want the maximum speed.
Right now, I am using the following structure
typedef struct {
 int MaxElement;
 int *ListElementInSubset;
 int *ListNext;
 int *ListPrev;
} IntegerSubsetStorage;
That is the code reads as
---MaxElement is the number N
---ListElementInSubset has size N and is a 0/1 vector to answer to the
  first question.
---ListNext and ListPrev are array of size N+2 that indicate what is the
  next element in X and what is the previous element in X (this allows
  to answer the last 3 questions).
All this works fine and I can answer all questions without any iterative
loop. But can it be done better?

Nice little problem.  Here is a list of the different answers you got
plus a couple not mentioned.

Storage:
Mathieu Dutour     -- Doubly linked list in two arrays + a {1,0} array
Gene               -- Doubly linked list in one array
Stefan Ram         -- Int array with {1,0} entries with linear search
China Blue Angels  -- Bit vector, keep min and max
Roberto Waltman    -- Bit vector, short search for get
Richard Harter     -- Tiered bit vector (see below)
Richard Harter     -- The devil's list algorithm (see below)
Richard Harter     -- The packed set algorithm

One of the problems with asking for the "best" solution is that there
is no best solution for all environments.  "Fast" depends upon sundry
considerations.  Thus in the past century execution times were roughly
proportional to instructions.  In this century cache misses are often
the dominant factor affecting execution times.  In algorithm analysis
it is customary to characterize algorithms by their big O behaviour.
However this often is inappropriate for small n.  Another factor is
that the cost of very simple functions is dominated by context
switching costs.  Etc.

All of that said, it is useful to look at (a) the instruction cost,
and (b) the storage used.  The storage costs are:

Storage use:
Devil's list algorithm: 3 words per item
Doubly linked list:     3 words per item (Dotour)  
Doubly linked list:     2 words per item (Gene)

Just saying... The array implementation of the linked list can use
indexes sized for the set. Thus the items in my code only use half as
word (2 bytes) if N < 256, and of course 1 word (4 bytes) would
suffice for N < 2^16.

This is an old space-saving trick. Back in the day, gcc allocated
syntax tree nodes in arrays so "pointers" could be 2 bytes rather than
4. Doubt that's still true because it limited syntax trees to 64K
nodes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,952
Messages
2,570,111
Members
46,692
Latest member
NewtonChri

Latest Threads

Top