The SETT Programming Contest: The fastest set<T> implementation

SETT Programming Contest · Jun 3, 2008

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Functionally it should behave similar to std::set,
have practically no limitation on the number of items,
and be significantly faster, either always or under specified
conditions (for example for random items etc.).

The class should be placed in an own namespace.
Supply also a benchmark routine which compares your
implementation with that of std::set, and briefly document
and comment these results.

Hint: Since items in a set are always in an ordered sequence
(as determined by operator<()) you need to use some tree
routines (AVL, red-black, splay, etc.) or use your own tree
method in the heart of your implementation.

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf

Publish your finished project somewhere on the Web
(for example at your own homepage or at http://sourceforge.net/ )
and announce it in comp.lang.c++ .

There is no deadline on this contest. The fastest method
will be the winner of the SETT Programming Contest,
until someone comes up with a faster method.

Good Luck!

SETT Programming Contest · Jun 3, 2008

Chris Thomasson said:
What's the prize?

The winner will get the title "Winner of the SETT Programming Contest".
Also a wiki page will be created and the winner presented there.
Sponsors from the industry, academic institutions, and individuals
can decide on their own to honor the winner with additional prizes.

Juha Nieminen · Jun 3, 2008

SETT said:
The SETT Programming Contest: The fastest set<T> implementation

Considerable speedups can be achieved by simply using the default
std::set with a custom memory allocator. I have done exactly that here:

http://warp.povusers.org/FSBAllocator/

I suppose that could be my entry.

Lars Uffmann · Jun 3, 2008

SETT said:
The winner will get the title "Winner of the SETT Programming Contest".
Also a wiki page will be created and the winner presented there.
Sponsors from the industry, academic institutions, and individuals
can decide on their own to honor the winner with additional prizes.

[x] Do your own homework

shablool · Jun 3, 2008

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.
Functionally it should behave similar to std::set,
have practically no limitation on the number of items,
and be significantly faster, either always or under specified
conditions (for example for random items etc.).

The class should be placed in an own namespace.
Supply also a benchmark routine which compares your
implementation with that of std::set, and briefly document
and comment these results.

Hint: Since items in a set are always in an ordered sequence
(as determined by operator<()) you need to use some tree
routines (AVL, red-black, splay, etc.) or use your own tree
method in the heart of your implementation.

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford Universityhttp://www.stanford.edu/~blp/papers/libavl.pdf

Publish your finished project somewhere on the Web
(for example at your own homepage or athttp://sourceforge.net/)
and announce it in comp.lang.c++ .

There is no deadline on this contest. The fastest method
will be the winner of the SETT Programming Contest,
until someone comes up with a faster method.

Good Luck!

Interesting contest. However, the assumption that there should be no
limitation on the number of items implies that the underlying memory
management model will be similar to the one used by std::set. It
appears that a change to this memory model (particularly within modern
multi-threaded environments) may have a more dramatic affect on
performance benchmarks, then the selection of the underlying data
structure, as demonstrated by the String library
(see: http://strinx.sourceforge.net/docs/strinx.html).

ShacharS.

Keith Thompson · Jun 3, 2008

SETT Programming Contest said:
The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.

[snip]

What is "C++/C"?

Since C doesn't have C++-style templates or namespaces, the problem as
stated cannot be solved in C (though you could probably do the
equivalent).

Why not just say "standard C++" and leave C out of it?

Christopher · Jun 3, 2008

The SETT Programming Contest: The fastest set<T> implementation

[snip drabble]

I am running a SCTMM contest: Send Chris the most money contest.
The contest has no deadline.
The person who sends the most money will be the winner.
Sponsors will decide on prizes.
I'll create a wiki page and be sure to thank you.
Send entries to:

SCTMM Contest
P.O box 50001
Austin TX 78735

SETT Programming Contest · Jun 3, 2008

Keith Thompson said:
SETT Programming Contest said:

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.

Click to expand...

[snip]

What is "C++/C"?

Since C doesn't have C++-style templates or namespaces, the problem as
stated cannot be solved in C (though you could probably do the
equivalent).

Why not just say "standard C++" and leave C out of it?

It means that you can also use pure C for the underlying methods,
ie. by using the 'extern "C"' interface mechanism of the C++ compiler.
Only the public interface must be in C++, the rest can also be
in portable standard C. For example if you already have some
working C code (for example the tree routines) for re-use here...

SETT Programming Contest · Jun 3, 2008

Richard Harter said:
I take it that you really mean C++ and not C.

It means that you can also use pure C for the underlying methods,
ie. by using the 'extern "C"' interface mechanism of the C++ compiler.
Only the public interface must be in C++, the rest can also be
in portable standard C. For example if you already have some
working C code (for example the tree routines) for re-use here...

Must entries use binary trees? What about T trees or radix trees?

You are free to use any tree mechanism you like. It's your choice.

Are we talking about in-memory methods here or
do we have to account for disk based methods too?

In-memory methods only; much like std::set .

James Dow Allen · Jun 4, 2008

In-memory methods only; much like std::set

Too bad the contest doesn't include memory needs as a
criterion; I've a method with the same compaction as
Cleary's Compact Tree, but faster -- though not as fast as
non-compact methods.

Obviously, a compact method will be faster than a "fast"
method, if the former fits in core and the latter doesn't.

James Dow Allen

James Kanze · Jun 4, 2008

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard
C++/C.

Fastest at what? On what machine? With which implementation of
C++?

[...]

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford Universityhttp://www.stanford.edu/~blp/papers/libavl.pdf

Which more or less establishes that "fastest" is meaningless
unless you define at what?

Bartc · Jun 4, 2008

SETT Programming Contest said:
The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

I'm not going to enter this (since, until today, I'd never heard of set<T>
or BST).

But I'm interested in clearing up what the task is.

A set<T> is just an arbitrary collection of values (each unique) of type T?
Like a Pascal set? (OK a Pascal set is limited to scalars and has a pre-set
range).

And T can be anything at all, provided operator < is meaningful for it?

In this case it all sounds pretty straightforward, so why limit the contest
to C++ only? Is it to eliminate languages that already have very fast
implementations built-in?

Ben Pfaff · Jun 4, 2008

SETT Programming Contest said:
Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Wouldn't it be easier to compare the implementations if you
required them to meet all the requirements placed on std::set by
the C++ standard?

See also this paper:
Ben Pfaff, "Performance Analysis of BST in System Software", Stanford University
http://www.stanford.edu/~blp/papers/libavl.pdf

How amusing.

Mike · Jun 5, 2008

The SETT Programming Contest: The fastest set<T> implementation

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Here, speed and correctness are the 2 most important factors.

Does this mean that a totally incorect but exceedingly fast implementation can still get a 50% pass mark?

shablool · Jun 5, 2008

Wouldn't it be easier to compare the implementations if you
required them to meet all the requirements placed on std::set by
the C++ standard?

How amusing.

As stated in previous post, this requirement alone assumes certain
memory model, which may affect the actual performance. Nevertheless, I
would like propose the following code fragment as possible candidate
for the actual test (after-all, we should start somewhere):

template <typename SetT>
void sett_contest(SetT& s, int n)
{
typedef typename SetT::value_type value_type;

int i;
std::vector<value_type> v(n);

for (i = 0; i < n; ++i)
{
v = value_type(i);
}

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());

std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v));
}
}

ShacharS.

Ben Pfaff · Jun 5, 2008

I would like propose the following code fragment as possible
candidate for the actual test (after-all, we should start
somewhere):
[...]

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());

std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v));
}

That fragment is going to strongly favor implementations that do
little or no balancing of the tree, because balancing is just a
waste of time when you know that insertions and deletions occur
in random order. Is that your intention?

I don't see any specifications on iterator invalidation in the
original post. Perhaps a hash table implementation would be
acceptable. If so, that's probably going to beat any tree-based
implementation, especially as the sets grow larger. I wonder
whether that is the intention, too.

shablool · Jun 5, 2008

shablool said:
shablool said:

"SETT Programming Contest" <[email protected]>
writes:
Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Click to expand...

I would like propose the following code fragment as possible
candidate for the actual test (after-all, we should start
somewhere):
[...]

std::random_shuffle(v.begin(), v.end());
s.insert(v.begin(), v.end());

Click to expand...

std::random_shuffle(v.begin(), v.end());
for (i = 0; i < n; ++i)
{
s.erase(s.find(v));
}

Click to expand...

That fragment is going to strongly favor implementations that do
little or no balancing of the tree, because balancing is just a
waste of time when you know that insertions and deletions occur
in random order. Is that your intention?

I don't see any specifications on iterator invalidation in the
original post. Perhaps a hash table implementation would be
acceptable. If so, that's probably going to beat any tree-based
implementation, especially as the sets grow larger. I wonder
whether that is the intention, too.

The original post said nothing about the underlying data-structure
representation. In particular, it did not assume usage of self
balancing binary search trees. The above code fragment is merely a
proposal for the actual test, which implicitly defines the interface
requirements. If you think it is incompatible with the spirit of the
original post, why don't you propose an alternative one?

ShacharS.

Anders Dalvander · Jun 5, 2008

Write the fastest set<T> implementation using only standard C++/C.
Ideally it should have the same interface like std::set.
At least the following methods must be implemented:
insert(), find(), begin(), end(), erase(), size(), operator<(),
and at least the forward iterator.

Is that you Razii? What are you trying to prove?

Regards,
Anders

Ben Pfaff · Jun 5, 2008

shablool said:
The original post said nothing about the underlying data-structure
representation. In particular, it did not assume usage of self
balancing binary search trees.

What non-tree-like data structures fulfill the asymptotic
requirements for std::set?

The above code fragment is merely a proposal for the actual
test, which implicitly defines the interface requirements.

The interface requirements were defined by the OP.

If you think it is incompatible with the spirit of the original
post, why don't you propose an alternative one?

I already proposed an alternative for the interface requirements,
in a direct followup to the OP.

Cognizance presents INSOMNIA: The Midnight Programming Contest	0	Mar 26, 2010
C++ Contest Challenge	15	Mar 31, 2007
Astronomy image processing contest	0	Nov 19, 2010
The POTM - Programming COntest - Create A Crossword	0	Dec 18, 2004
Fastest ip address searching method/implementation	6	Nov 26, 2003
The container abstraction and parallel programming	38	Jan 6, 2012
A Sample auto_ptr implementation	13	Oct 13, 2008
[ANNOUNCE] OpenECG programming contest - new data set	0	Jul 16, 2003

The SETT Programming Contest: The fastest set<T> implementation

SETT Programming Contest

SETT Programming Contest

Juha Nieminen

Lars Uffmann

shablool

Keith Thompson

Christopher

SETT Programming Contest

SETT Programming Contest

James Dow Allen

James Kanze

Bartc

Ben Pfaff

Mike

shablool

Ben Pfaff

shablool

Anders Dalvander

Ben Pfaff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads