The SETT Programming Contest: The fastest set<T> implementation

  • Thread starter SETT Programming Contest
  • Start date
S

SETT Programming Contest

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Functionally it should behave similar to std::set,
have practically no limitation on the number of items,
and be significantly faster, either always or under specified
conditions (for example for random items etc.).

The class should be placed in an own namespace.
Supply also a benchmark routine which compares your
implementation with that of std::set, and briefly document
and comment these results.

Hint: Since items in a set are always in an ordered sequence
(as determined by operator<()) you need to use some tree
routines (AVL, red-black, splay, etc.) or use your own tree
method in the heart of your implementation.

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf

Publish your finished project somewhere on the Web
(for example at your own homepage or at http://sourceforge.net/ )
and announce it in comp.lang.c++ .

There is no deadline on this contest. The fastest method
will be the winner of the SETT Programming Contest,
until someone comes up with a faster method.

Good Luck!
 
S

SETT Programming Contest

Chris Thomasson said:
What's the prize?

The winner will get the title "Winner of the SETT Programming Contest".
Also a wiki page will be created and the winner presented there.
Sponsors from the industry, academic institutions, and individuals
can decide on their own to honor the winner with additional prizes.
 
L

Lars Uffmann

SETT said:
The winner will get the title "Winner of the SETT Programming Contest".
Also a wiki page will be created and the winner presented there.
Sponsors from the industry, academic institutions, and individuals
can decide on their own to honor the winner with additional prizes.

[x] Do your own homework
 
S

shablool

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Functionally it should behave similar to std::set,
have practically no limitation on the number of items,
and be significantly faster, either always or under specified
conditions (for example for random items etc.).

The class should be placed in an own namespace.
Supply also a benchmark routine which compares your
implementation with that of std::set, and briefly document
and comment these results.

Hint: Since items in a set are always in an ordered sequence
(as determined by operator<()) you need to use some tree
routines (AVL, red-black, splay, etc.) or use your own tree
method in the heart of your implementation.

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford Universityhttp://www.stanford.edu/~blp/papers/libavl.pdf

Publish your finished project somewhere on the Web
(for example at your own homepage or athttp://sourceforge.net/)
and announce it in comp.lang.c++ .

There is no deadline on this contest. The fastest method
will be the winner of the SETT Programming Contest,
until someone comes up with a faster method.

Good Luck!

Interesting contest. However, the assumption that there should be no
limitation on the number of items implies that the underlying memory
management model will be similar to the one used by std::set. It
appears that a change to this memory model (particularly within modern
multi-threaded environments) may have a more dramatic affect on
performance benchmarks, then the selection of the underlying data
structure, as demonstrated by the String library
(see: http://strinx.sourceforge.net/docs/strinx.html).

ShacharS.
 
K

Keith Thompson

SETT Programming Contest said:
The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
[snip]

What is "C++/C"?

Since C doesn't have C++-style templates or namespaces, the problem as
stated cannot be solved in C (though you could probably do the
equivalent).

Why not just say "standard C++" and leave C out of it?
 
C

Christopher

The SETT Programming Contest: The fastest set<T> implementation
[snip drabble]

I am running a SCTMM contest: Send Chris the most money contest.
The contest has no deadline.
The person who sends the most money will be the winner.
Sponsors will decide on prizes.
I'll create a wiki page and be sure to thank you.
Send entries to:

SCTMM Contest
P.O box 50001
Austin TX 78735
 
S

SETT Programming Contest

Keith Thompson said:
SETT Programming Contest said:
The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
[snip]

What is "C++/C"?

Since C doesn't have C++-style templates or namespaces, the problem as
stated cannot be solved in C (though you could probably do the
equivalent).

Why not just say "standard C++" and leave C out of it?

It means that you can also use pure C for the underlying methods,
ie. by using the 'extern "C"' interface mechanism of the C++ compiler.
Only the public interface must be in C++, the rest can also be
in portable standard C. For example if you already have some
working C code (for example the tree routines) for re-use here...
 
S

SETT Programming Contest

Richard Harter said:
I take it that you really mean C++ and not C.

It means that you can also use pure C for the underlying methods,
ie. by using the 'extern "C"' interface mechanism of the C++ compiler.
Only the public interface must be in C++, the rest can also be
in portable standard C. For example if you already have some
working C code (for example the tree routines) for re-use here...
Must entries use binary trees? What about T trees or radix trees?

You are free to use any tree mechanism you like. It's your choice.
Are we talking about in-memory methods here or
do we have to account for disk based methods too?

In-memory methods only; much like std::set .
 
J

James Dow Allen

In-memory methods only; much like std::set

Too bad the contest doesn't include memory needs as a
criterion; I've a method with the same compaction as
Cleary's Compact Tree, but faster -- though not as fast as
non-compact methods.

Obviously, a compact method will be faster than a "fast"
method, if the former fits in core and the latter doesn't.

James Dow Allen
 
J

James Kanze

The SETT Programming Contest: The fastest set<T> implementation
Write the fastest set<T> implementation using only standard
C++/C.

Fastest at what? On what machine? With which implementation of
C++?

[...]
See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford Universityhttp://www.stanford.edu/~blp/papers/libavl.pdf

Which more or less establishes that "fastest" is meaningless
unless you define at what?
 
B

Bartc

SETT Programming Contest said:
The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

I'm not going to enter this (since, until today, I'd never heard of set<T>
or BST).

But I'm interested in clearing up what the task is.

A set<T> is just an arbitrary collection of values (each unique) of type T?
Like a Pascal set? (OK a Pascal set is limited to scalars and has a pre-set
range).

And T can be anything at all, provided operator < is meaningful for it?

In this case it all sounds pretty straightforward, so why limit the contest
to C++ only? Is it to eliminate languages that already have very fast
implementations built-in?
 
B

Ben Pfaff

SETT Programming Contest said:
Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Wouldn't it be easier to compare the implementations if you
required them to meet all the requirements placed on std::set by
the C++ standard?
See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf

How amusing.
 
M

Mike

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.

Does this mean that a totally incorect but exceedingly fast implementation can still get a 50% pass mark?
 
S

shablool

Wouldn't it be easier to compare the implementations if you
required them to meet all the requirements placed on std::set by
the C++ standard?


How amusing.


As stated in previous post, this requirement alone assumes certain
memory model, which may affect the actual performance. Nevertheless, I
would like propose the following code fragment as possible candidate
for the actual test (after-all, we should start somewhere):


template <typename SetT>
void sett_contest(SetT& s, int n)
{
typedef typename SetT::value_type value_type;

int i;
std::vector<value_type> v(n);

for (i = 0; i < n; ++i)
{
v = value_type(i);
}

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());

std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v));
}
}

ShacharS.
 
B

Ben Pfaff

I would like propose the following code fragment as possible
candidate for the actual test (after-all, we should start
somewhere):
[...]

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());

std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v));
}


That fragment is going to strongly favor implementations that do
little or no balancing of the tree, because balancing is just a
waste of time when you know that insertions and deletions occur
in random order. Is that your intention?

I don't see any specifications on iterator invalidation in the
original post. Perhaps a hash table implementation would be
acceptable. If so, that's probably going to beat any tree-based
implementation, especially as the sets grow larger. I wonder
whether that is the intention, too.
 
S

shablool

shablool said:
"SETT Programming Contest" <[email protected]>
writes:
Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.
I would like propose the following code fragment as possible
candidate for the actual test (after-all, we should start
somewhere):
[...]

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());
std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v));
}


That fragment is going to strongly favor implementations that do
little or no balancing of the tree, because balancing is just a
waste of time when you know that insertions and deletions occur
in random order. Is that your intention?

I don't see any specifications on iterator invalidation in the
original post. Perhaps a hash table implementation would be
acceptable. If so, that's probably going to beat any tree-based
implementation, especially as the sets grow larger. I wonder
whether that is the intention, too.


The original post said nothing about the underlying data-structure
representation. In particular, it did not assume usage of self
balancing binary search trees. The above code fragment is merely a
proposal for the actual test, which implicitly defines the interface
requirements. If you think it is incompatible with the spirit of the
original post, why don't you propose an alternative one?

ShacharS.
 
A

Anders Dalvander

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Is that you Razii? What are you trying to prove?

Regards,
Anders
 
B

Ben Pfaff

shablool said:
The original post said nothing about the underlying data-structure
representation. In particular, it did not assume usage of self
balancing binary search trees.

What non-tree-like data structures fulfill the asymptotic
requirements for std::set?
The above code fragment is merely a proposal for the actual
test, which implicitly defines the interface requirements.

The interface requirements were defined by the OP.
If you think it is incompatible with the spirit of the original
post, why don't you propose an alternative one?

I already proposed an alternative for the interface requirements,
in a direct followup to the OP.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top