Questions about "mismatch"

Anand Hariharan · Dec 20, 2010

Le 20/12/10 18:30, Anand Hariharan a crit :

When updating the cursor of course. Since in list you can move only
one element at the time, it is fairly easy to do

This has nothing to do with the discussion.

You said "I didn't even think that the C++ STL would be that bad", and
I was responding that it is a design decision. Your design of caching
the index would be bad when it comes to insertion or removal of
elements.

No, if the cursor has an index...

I was responding to "... but really, it can't be that bad." Having an
index has tradeoffs.

- Anand

jacob navia · Dec 20, 2010

Le 21/12/10 00:26, Anand Hariharan a écrit :

You said "I didn't even think that the C++ STL would be that bad", and
I was responding that it is a design decision. Your design of caching
the index would be bad when it comes to insertion or removal of
elements.

The index is cached in the iterator object. If any additions or
modifications to the list are done, then the iterator is invalid
and the validation of the indices is no longer a problem.

Contrary to C++ design, I have a very simple rule to remember for
all containers of the library: Any modification of a container when
there are iterators pointing to it invalidates all those iterators
that will return always NULL, instead of a pointer to the data.

But you are right, it is a design decision and C++ has done a different
one than me.

Joshua Maurice · Dec 21, 2010

Le 21/12/10 00:26, Anand Hariharan a crit :

The index is cached in the iterator object. If any additions or
modifications to the list are done, then the iterator is invalid
and the validation of the indices is no longer a problem.

Contrary to C++ design, I have a very simple rule to remember for
all containers of the library: Any modification of a container when
there are iterators pointing to it invalidates all those iterators
that will return always NULL, instead of a pointer to the data.

But you are right, it is a design decision and C++ has done a different
one than me.

Perhaps one of the most important design goals of C++ is inherited
from C, which is: Be cost competitive with assembly. Your library
design is clearly not cost competitive with assembly, C, nor C++ code
because of your additional checks, indirections, upkeep, etc. This
will cause some people (probably too many people honestly) to roll
their own because they want "the fastest".

We can discuss if that's a good design goal if you want. At the end of
the day, some applications actually are performance critical enough
that the lessened "safety" in favor of speed can make or break actual
business projects. C++ isn't the best for everything, but it does
quite well at its targeted design goals.

jacob navia · Dec 21, 2010

Le 21/12/10 01:17, Leigh Johnston a écrit :

The fact that iterators are not invalidated for certain operations on
certain containers in the C++ standard library is an important and
useful property; I have code which relies on this and would feel very
uncomfortable trying to use a library which didn't support this.

There are too many rules to learn in the C++ model. Some containers
invalidate everything, others not, others in some operations, etc.
This promotes bugs, it is all too easy to forget some rule. Besides,
if you change the type of the container, the rules change and
previous code that worked stops working.

All this is just a too complex interface. So, I decided to make
a fresh start and a simple rule: do not modify the container
when you are iterating over it except by using the iterator
to delete the current item.

That is a simple interface. No exceptions, easy to learn and
use.

But that is also a design decision. I prefer simple interfaces
that are easy to use rather than complex interfaces that are error
prone. (In my opinion of course).

jacob

jacob navia · Dec 21, 2010

Le 21/12/10 03:08, Joshua Maurice a écrit :

Perhaps one of the most important design goals of C++ is inherited
from C, which is: Be cost competitive with assembly.

Well, that could have made sense in 1990, but for a modern software
component in 2010 it is more important to have in mind things like:

(1) Robust software that fails gracefully without undefined behavior.
All errors must produce always the same result and must be
anticipated by the library.
(2) Speed is mainly a consequence of the algorithm. Do not risk a
crash to save a few instructions. Remember that an integer
comparison and a (not taken) conditional branch do not disturb
the pipeline and have a cost of 1/(length of the pipeline) cycles
Do not try to save in software security!

(3) Offer simple interfaces with a lot of default values that can be
changed later. Avoid complexity, and use a small brain memory
footprint. Remember that you can buy a RAM extension but brain
memory extensions aren't available. KEEP IT SIMPLE.

Your library
design is clearly not cost competitive with assembly, C, nor C++ code
because of your additional checks, indirections, upkeep, etc.

This is just not true. My library is as fast as optimized C++ STL
to within a few percent.

This is not surprising since it does essentially the same thing.

This
will cause some people (probably too many people honestly) to roll
their own because they want "the fastest".

Those people do not use the STL either because they roll always
their own. If your application is so CPU bound you have to roll
your own list package, you can't afford a vector and you use
a simple array.

We can discuss if that's a good design goal if you want. At the end of
the day, some applications actually are performance critical enough
that the lessened "safety" in favor of speed can make or break actual
business projects.

Business projects are interested more in speed than in safety?
You prefer brittle software rather than software that goes at
essentially the same speed but makes more tests and is safer?

Ineteresting point of view.

This discussion was about whether "mismatch" should receive information
about the length of its second argument and test whether there is a
buffer overflow when the second container is too short. In my opinion
3 more assembler instructions do not make ANY difference.

True it *could* be that calculating the length between two iterators
in some containers could be a lengthy operation. Then, my library would
be slower for those containers. I do not level by the worst case,
I just let the worst case be the worst case and try to improve the
average case.

C++ isn't the best for everything, but it does
quite well at its targeted design goals.

Yes. I am not using C++ for my library but plain C. I have great respect
for the C++ community and this is not meant in any pejorative way.
It is just a different design viewpoint.

Thanks for your contribution.

jacob navia · Dec 21, 2010

Le 21/12/10 13:30, Leigh Johnston a écrit :

It is your opinion yes; in my opinion the C++ standard library is fine;
one can create bugs using any language feature.

How would you do the following using your iterator invalidating library?

for (std::set<foo>::iterator i = c.begin(); i != c.end()
{
if (i->done())
c.erase(i++);
else
++i;
}

/Leigh

Like this:

// "c" is some container: list, vector whatever
void *obj;
Ierator *it = newIterator(c);
for (obj = it->GetFirst(it);
obj != NULL;
obj = it->GetNext(it)) {
if (done(obj)) {
it->EraseCurrent(it);
}
}
deleteIterator(it);

The only change allowed is erasing the current item
pointed by an iterator. This invalidates all OTHER
iterators (if you happen to have more than one) that
use this container but doesn't invalidate the one you are
using. I specified that in another subthread but I do not
remember if I told you this.

The iterators of my library are bidirectional, and they
remember if you last called GetNext or GetPrevious. If
you are iterating forward (with GetNext()) you get positioned
at the previous one than the one erased. If you last
called GetPrevious() you get positioned at the next one after
the one you erase it. If you erase the first one and you were
forward iterating (i.e. you did GetFirst() and then EraseCurrent())
You still get positioned at the first one, since there is
no previous. If the container contained just one element
the iterator is invalidated since there are no elements.

Note that my iterators return NULL when invalid, they will NEVER
go beyond the limits of the underlying container, and they will
never return a bad pointer. When they are invalidated, they
just return NULL.

This are roughly the specs but they are not fully implemented yet.
This is a work in progress.

It is true that you can write buggy software in ANY language,
but that should not hinder us to try to specify the BEST interfaces
and avoid error prone constructs.

Error-prone means in this context that
(1) There are many rules to remember without underlying principles.
In C++ you have to know the specifics of each container to know
if the iterators are invalidated or not.

(2) Any error leads directly to catastrophic consequences instead
of being catched and signalled in an orderly fashion.

(3) Any modifications of the container type lead to a review of
all code that uses that container since the rules change from
container to container. Iterators that worked could be invalid
now. This another source of errors.

This makes C++ marginally faster than what I am writing. In several
million instructions C++ will save maybe a few hundred instructions.
Nothing really important. In my opinion, less brittle softawre is
better than extremely fast one that breaks and crashes at the slightest
problem.

But this is (again) a personal point of view.

Thanks for your input.

jacob

Joshua Maurice · Dec 21, 2010

Le 21/12/10 03:08, Joshua Maurice a écrit :

Well, that could have made sense in 1990, but for a modern software
component in 2010 it is more important to have in mind things like:

(1) Robust software that fails gracefully without undefined behavior.
All errors must produce always the same result and must be
anticipated by the library.
(2) Speed is mainly a consequence of the algorithm. Do not risk a
crash to save a few instructions. Remember that an integer
comparison and a (not taken) conditional branch do not disturb
the pipeline and have a cost of 1/(length of the pipeline) cycles
Do not try to save in software security!

(3) Offer simple interfaces with a lot of default values that can be
changed later. Avoid complexity, and use a small brain memory
footprint. Remember that you can buy a RAM extension but brain
memory extensions aren't available. KEEP IT SIMPLE.

That's nice. However, in the real world, it's different. For example,
in my company, we sell a product that has to move gigabytes of data at
a time, do various transformations and cleanup to it, and put it
elsewhere, on a regularly and daily basis. The same product is also
used for sub-sonic response time for SQL queries to access this data.
If we coded as you suggested, then we would not meet our performance
criteria. I've done some non-trivial benchmarks for some of our more
critical codepath code pieces, written in sane C++ and written in sane
Java 1.6 on the Sun JVM, for example, and the sane Java was about 4.5
times slower.

When I went to using naked Java arrays instead of its container
library ArrayList, it went down to only ~2.3x slower. And I'm talking
really easy code which the compiler ought to be easily able to prove
that no polymorphism is going on, yet I still got the penalty for
using abstraction. This is fundamentally different than the C++ world
where the implementations will quite likely produce code with
std::vector that is as good as code you wrote by hand to implement a
dynamically resizing array. (One caveat: for types with trivial
construction and no trap values, vector can be slower because it
initializes all of its elements. Generally that's not a big deal.)

Yes, I hesitate to extrapolate too far in a language pissing match
between C++ and Java, but the fact remains that the additional "safety
features" you propose would easily degrade my particular product's
runtime by 2 times, and that would easily cause our customers to
choose our competitors.

You know how we got a significant performance boost? We changed the
layout of the data from row major to column major. That resulted in a
significant speedup. This is due to data access patterns and caching.
Big-O is indeed a useful tool, but only when used correctly, and when
you understand its limitations. First, you must be accurately modeling
your problem, and for many years now, all desktop and server machines
have not been random access equivalent, which makes all common Big-O
analysis sketchy at best.

This is just not true. My library is as fast as optimized C++ STL
to within a few percent.

This is not surprising since it does essentially the same thing.

Those people do not use the STL either because they roll always
their own. If your application is so CPU bound you have to roll
your own list package, you can't afford a vector and you use
a simple array.

No. That's the beauty of the STL. It's almost certainly faster than
what you're going to roll by hand for the common use case. You
generally cannot quite a faster dynamically resizing array than
std::vector.

Your library is almost certainly measurably slower because you're
maintaining this indexed thingy to maintain iterator positions to
always be able to do constant time iterator difference. I suspect
you've done some micro benchmarks and completely ignored any
significant cache miss effects from your extra code.

Business projects are interested more in speed than in safety?
You prefer brittle software rather than software that goes at
essentially the same speed but makes more tests and is safer?

Ineteresting point of view.

It makes money. That's what a professional programmer is concerned
with at the end of the day. (Well, that and being moral, such as not
killing people, not lying, not stealing. I hope that's implicitly
understood, but just in case it isn't, I'll mention it explicitly.)

This discussion was about whether "mismatch" should receive information
about the length of its second argument and test whether there is a
buffer overflow when the second container is too short. In my opinion
3 more assembler instructions do not make ANY difference.

True it *could* be that calculating the length between two iterators
in some containers could be a lengthy operation. Then, my library would
be slower for those containers. I do not level by the worst case,
I just let the worst case be the worst case and try to improve the
average case.

It's more than 3 assembly instructions. You're talking about
maintaining extra memory to maintain this indexed lookup table to do
constant time iterator difference for even iterators of a binary tree
data structure. This is not just 3 assembly instructions. This is more
memory, a lot more instructions for iterator manipulation, insertion,
removal, everything. We're also talking about potentially significant
cache effects because a simple access has to access another allocated
block, potentially far from the first.

Ian Collins · Dec 21, 2010

No. That's the beauty of the STL. It's almost certainly faster than
what you're going to roll by hand for the common use case. You
generally cannot quite a faster dynamically resizing array than
std::vector.

Which is partly down to the design of vector and partly down to the
design of the language. The use of templates exposes everything to the
optimiser.

Marc · Dec 21, 2010

Paavo said:
I'm quite sure there is no container library in the C standard. In C
you build everything needed by yourself to your own liking. The same
applies to C++ - but here you have more predefined pieces to play
with.

For example, if you want to avoid such undefined behavior when calling
mismatch(), then you can easily write a wrapper function with error-
checking and only use that. Note that it would not be easy to do it
vice- versa: to remove error checking from the standard library
function (presumably in the case it has been proven to be too
time-consuming for your usage case). This is in the spirit of C++:
provide building blocks which one can easily build upon. It is not
meant as a final solution for all problems (that would be 42 ;-)

Since error handling is apt to be developer or team or company or
product -specific, it makes sense not to impose any one particular policy
and to just rather provide the mechanisms like C++ does. Your statement
of "building blocks" is right on target.

jacob navia · Dec 21, 2010

Le 21/12/10 23:59, Marc a écrit :

Paavo Helde wrote:
Since error handling is apt to be developer or team or company or
product -specific, it makes sense not to impose any one particular policy
and to just rather provide the mechanisms like C++ does. Your statement
of "building blocks" is right on target.

Excuse me but for this case (std::mismatch) there is no "mechanism"
As far as I understood the program will just CRASH.

I find that behavior inacceptable but apparently I am alone in
that view.

Note that I am speaking about error SIGNALLING, i.e. a return code or an
exception. Not error HANDLING that is of course application specific.

Ian Collins · Dec 21, 2010

Le 21/12/10 23:59, Marc a écrit :

Excuse me but for this case (std::mismatch) there is no "mechanism"
As far as I understood the program will just CRASH.

I find that behavior inacceptable but apparently I am alone in
that view.

If you call strlen() with something other ran a null terminated array of
char, program will just crash.

All of the standard library range compare templates assume the second
range contains enough elements. It's entirely possible the size of the
second range is unbounded (you could use an istream_iterator).

As I've said many times on this thread, if you want another overload
with an end iterator, add one.

Marc · Dec 21, 2010

jacob said:
Le 21/12/10 23:59, Marc a écrit :

Excuse me but for this case (std::mismatch) there is no "mechanism"
As far as I understood the program will just CRASH.

I find that behavior inacceptable but apparently I am alone in
that view.

Note that I am speaking about error SIGNALLING, i.e. a return code or
an exception. Not error HANDLING that is of course application
specific.

I don't know "mismatch", but you may wrap it up with anything you wish,
surely: error code return, exception, whatever, for there is no policy
imposed. C++ provides that handy-dandy exception thing if you are so
inclined as to use it. Anything else in your toolbox if fine too. Not
defining the policy gives you a choice. I would think that a C guy like
you would rather have that choice rather than someone else's idea of what
the error mechanisms and semantics are.

jacob navia · Dec 21, 2010

Le 21/12/10 23:30, Joshua Maurice a écrit :

That's nice. However, in the real world, it's different. For example,
in my company, we sell a product that has to move gigabytes of data at
a time, do various transformations and cleanup to it, and put it
elsewhere, on a regularly and daily basis. The same product is also
used for sub-sonic response time for SQL queries to access this data.
If we coded as you suggested, then we would not meet our performance
criteria. I've done some non-trivial benchmarks for some of our more
critical codepath code pieces, written in sane C++ and written in sane
Java 1.6 on the Sun JVM, for example, and the sane Java was about 4.5
times slower.

Java is not C. I am writing in C, and my code is quite fast. What
bothers me is that you do not compare
(1) Your C++ optimized with no checks for bounds
(2) Your C++ optimized with a single check for bounds at each
call to "mismatch"

That's what we are talking about. Not about the difference between Java
and C++. That's completely irrelevant!

Yes, I hesitate to extrapolate too far in a language pissing match
between C++ and Java, but the fact remains that the additional "safety
features" you propose would easily degrade my particular product's
runtime by 2 times, and that would easily cause our customers to
choose our competitors.

You do not have a single *measurement* to justify this assertion.
A bounds check is an integer comparison and a conditional jump,
I REPEAT. If we assume that the jump is NOT taken in the normal
case, the pipeline is NOT disturbed and the cost is
1/(pipeline depth) cycles. To say that this will slow down the
software by a factor os two is ridiculous. This instructions would be
issued at each call to std::mismatch, not in the loop of mismtach.

If you call in your software mismatch in a loop 1000 times with
sequences of (say) 100 elements, you will have 1 million comparisons,
1 million iterator advancing and 1 million tests for the end of the
loop in the worst case. Assuming random data and a match after only
50 elements in average we have 1.5 million instructions for those
thousand calls to mismtach. If you add an integer comparison for
length you wuld have 1000/(pipeline length)cycles, assuming a pentium
20 stage pipeline (pentium 4) we have 50 cycles overhead for a bounds check.

So we have

1 500 000 (C++ no checks instructions)
1 500 050 (C++ with checks instructions)

And you tell me those 50 cycles make any difference?
At 2.7 GHZ???

You know how we got a significant performance boost? We changed the
layout of the data from row major to column major. That resulted in a
significant speedup. This is due to data access patterns and caching.

You see?

This confirms that changing the algorithms and many other factors
have a MUCH MUCH BIGGER INFLUENCE than some integer comparisons!!!

Big-O is indeed a useful tool, but only when used correctly, and when
you understand its limitations. First, you must be accurately modeling
your problem, and for many years now, all desktop and server machines
have not been random access equivalent, which makes all common Big-O
analysis sketchy at best.

I am NOT doing "Big-O" analysis since I am comparing several millions of
instructions to the same millions plus 0.003%

It's more than 3 assembly instructions. You're talking about
maintaining extra memory to maintain this indexed lookup table to do
constant time iterator difference for even iterators of a binary tree
data structure. This is not just 3 assembly instructions. This is more
memory, a lot more instructions for iterator manipulation, insertion,
removal, everything. We're also talking about potentially significant
cache effects because a simple access has to access another allocated
block, potentially far from the first.

All that is true, and it could be that you can afford NOT making any
checks and your software is perfect and you use mismatch always
correctly .

In that case you do not need my library. I am writing it for people
that prefer to be SURE there are no bugs since bad results would be
catastrophic and are ready to pay 0.001% overhead to have more
robust and secure software.

There is a market for different libraries, and i am not targetting
your special case but the MUCH MORE COMMON average usage of lists
and vectors in business applicatoons where speed is not a first
priority but robustness of the software and certainty of the results
are MUCH more important.

I am targetting people that do not want to pay a 4 fold increase
in overhead and go to Java, but do not want the brittleness
and the complexities of the STL.

I am targetting anyway first the C language.

Interesting discussion anyway. I think you have a very special
application, but please take a step away from it and try to see the
general picture.

jacob

Ian Collins · Dec 22, 2010

On 12/22/10 12:42 PM, jacob navia wrote:

I'm guessing you are not seeing my replies, but I'll have another crack!

In that case you do not need my library. I am writing it for people
that prefer to be SURE there are no bugs since bad results would be
catastrophic and are ready to pay 0.001% overhead to have more
robust and secure software.

There is a market for different libraries, and i am not targetting
your special case but the MUCH MORE COMMON average usage of lists
and vectors in business applicatoons where speed is not a first
priority but robustness of the software and certainty of the results
are MUCH more important.

I am targetting people that do not want to pay a 4 fold increase
in overhead and go to Java, but do not want the brittleness
and the complexities of the STL.

I think you are looking at this problem with a C mind set, not a C++
one. The point you appear to be missing (or ignoring) is this is C++,
not C. Thus you are not stuck with the interface provided by the
standard library. So if you want to add another interface, do so.

It is easy to add layers to the standard library, but it is impossible
to remove them. So the library provides the base algorithms and you as
a programmer are free to augment them as you see fit. Your must haves
may well be someone else's final performance straw.

Interesting discussion anyway. I think you have a very special
application, but please take a step away from it and try to see the
general picture.

You would do well to follow your own advice!

Joshua Maurice · Dec 22, 2010

Le 21/12/10 23:30, Joshua Maurice a crit :

Java is not C. I am writing in C, and my code is quite fast. What
bothers me is that you do not compare
(1) Your C++ optimized with no checks for bounds
(2) Your C++ optimized with a single check for bounds at each
call to "mismatch"

That's what we are talking about. Not about the difference between Java
and C++. That's completely irrelevant!

You do not have a single *measurement* to justify this assertion.
A bounds check is an integer comparison and a conditional jump,
I REPEAT. If we assume that the jump is NOT taken in the normal
case, the pipeline is NOT disturbed and the cost is
1/(pipeline depth) cycles. To say that this will slow down the
software by a factor os two is ridiculous. This instructions would be
issued at each call to std::mismatch, not in the loop of mismtach.

If you call in your software mismatch in a loop 1000 times with
sequences of (say) 100 elements, you will have 1 million comparisons,
1 million iterator advancing and 1 million tests for the end of the
loop in the worst case. Assuming random data and a match after only
50 elements in average we have 1.5 million instructions for those
thousand calls to mismtach. If you add an integer comparison for
length you wuld have 1000/(pipeline length)cycles, assuming a pentium
20 stage pipeline (pentium 4) we have 50 cycles overhead for a bounds check.

So we have

1 500 000 (C++ no checks instructions)
1 500 050 (C++ with checks instructions)

And you tell me those 50 cycles make any difference?
At 2.7 GHZ???

You see?

This confirms that changing the algorithms and many other factors
have a MUCH MUCH BIGGER INFLUENCE than some integer comparisons!!!

I am NOT doing "Big-O" analysis since I am comparing several millions of
instructions to the same millions plus 0.003%

All that is true, and it could be that you can afford NOT making any
checks and your software is perfect and you use mismatch always
correctly .

In that case you do not need my library. I am writing it for people
that prefer to be SURE there are no bugs since bad results would be
catastrophic and are ready to pay 0.001% overhead to have more
robust and secure software.

There is a market for different libraries, and i am not targetting
your special case but the MUCH MORE COMMON average usage of lists
and vectors in business applicatoons where speed is not a first
priority but robustness of the software and certainty of the results
are MUCH more important.

I am targetting people that do not want to pay a 4 fold increase
in overhead and go to Java, but do not want the brittleness
and the complexities of the STL.

I am targetting anyway first the C language.

Interesting discussion anyway. I think you have a very special
application, but please take a step away from it and try to see the
general picture.

I suggest that you do actual measurements of the code involved.
Specifically, do not treat the machine as random access equivalent,
because it is not. There are so many different levels of caching, from
virtual pages, main memory, L1 and L2 caches, instruction caches, TLB
cache, branch prediction cache, and so on. 50 instructions isn't just
50 cycles.

My major complaint was when you started talking about some nonsense to
allow constant time iterator difference for data structures which do
not naturally allow it, like a red-black tree. That's a whole lot more
than a couple of instructions per call to mismatch. That's a very
large number of additional instructions, sufficient that the
performance crazed (read: too many probably) would want to roll their
own instead of using the likely superior standard library
implementation.

In that case, it's not just an integer comparison. In your scheme for
mismatch on a red-black tree, it's some sort of indirect, possibly 2-
way (?) indirect table lookup. That can't be good on the data cache.
Then you have to do at least an integer subtract. That's hitting the
ALU. Only then can you do your test and branch.

To repeat, we're not talking about a couple more instructions on every
call to mismatch. We're talking about a lot more instructions on every
insert, find, removal, etc. - basically most of the interesting
operations on the red-black tree. This upkeep must be paid by all
users of the red-black tree just to support the unnatural requirement
that you have constant time iterator difference.

Öö Tiib · Dec 22, 2010

Le 21/12/10 23:30, Joshua Maurice a crit :

All that is true, and it could be that you can afford NOT making any
checks and your software is perfect and you use mismatch always
correctly .

Hey. But we make checks. In our software. When std::mismatch() does
not suit us without checking then we check things or use something
else. It is like when we know that a pointer may be null then we check
it prior to dereferencing it or use something else but pointer.
However when we know that std::mismatch() works for us like it is then
we need no checks in it.

For example if we know that there is at least one object in second
container that has no equals in first container then it is *never* an
error to call std::mismatch(first.begin(),first.end(),second.begin())
without any prior or mid-way checking.

In that case you do not need my library. I am writing it for people
that prefer to be SURE there are no bugs since bad results would be
catastrophic and are ready to pay 0.001% overhead to have more
robust and secure software.

What it feels here is that you are overly intrusive with your library.
We already have such checked iterators and containers all present in
our toolset. There are libstdc++ debug for debug builds, interface is
usual STL and it checks everything. However that debug stuff runs
*lot* slower, especially with maps and sets.

May be you have made some discovery how to achieve quickness with
checked maps. Then that is great, but there is no point to bash our
STL because of that in comp.lang.c++. We are not going to delete our
code-bases because of that. You better use your time to market with
your great product or to improve its other sides too. C is still very
popular language and modules written in C are used in lot of other
languages. Usenet argument ... it is simply sort of cheap
entertainment.

gwowen · Dec 22, 2010

Interesting discussion anyway. I think you have a very special

application, but please take a step away from it and try to see the
general picture.

The general picture is that it is so very easy to define a safe
version of mismatch that anyone contributing to the thread could write
that routine in less time that it took them to form their arguments.
If that doesn't define a pointless argument, I don't know what does.

template<typename Iterator1,typename Iterator2>
std:

air<Iterator1,Iterator2>
safe_mismatch(Iterator1 beg1, Iterator1 end1, Iterator2 beg2,
Iterator2 end2)
{
Iterator1 i1 = beg1;
Iterator2 i2 = beg2;
for(; (i1 != end1 && i2 != end2); ++i1, ++i2){
if(*i1 != *i2) break;
}
return std::make_pair(i1,i2);
}

Specialisation using iterator_traits::iterator_category to define a
faster version where possible is left as an exercise for the reader.

jacob navia · Dec 22, 2010

Le 22/12/10 10:35, gwowen a écrit :

The general picture is that it is so very easy to define a safe
version of mismatch that anyone contributing to the thread could write
that routine in less time that it took them to form their arguments.
If that doesn't define a pointless argument, I don't know what does.

template<typename Iterator1,typename Iterator2>
std:air<Iterator1,Iterator2>
safe_mismatch(Iterator1 beg1, Iterator1 end1, Iterator2 beg2,
Iterator2 end2)
{
Iterator1 i1 = beg1;
Iterator2 i2 = beg2;
for(; (i1 != end1&& i2 != end2); ++i1, ++i2){
if(*i1 != *i2) break;
}
return std::make_pair(i1,i2);
}

Specialisation using iterator_traits::iterator_category to define a
faster version where possible is left as an exercise for the reader.

Nobody claims that the function you propose is impossible to write.
The question was why the library choosed to use a much more unsafe
version.

Why was end2 omitted?

That is the core of the discussion.

Even if you do not use the faster version, a difference test is just an
integer comparison, not likely to cost much.

But let's not start the discussion again.

Joshua Maurice · Dec 22, 2010

Even if you do not use the faster version, a difference test is just an
integer comparison, not likely to cost much.

For pointers or std::vector::iterator as commonly implemented in non-
debug mode, I fail to see how this is the case. You effectively need
to do the test:
((aEnd - aStart) < (bEnd - bStart))

I don't see how you can get around at least two subtractions or their
equivalents, and that means hitting the ALU. It definitely does not
mean a simple single test and branch processor instruction.

However, I suppose that if you're doing your silly system where you
keep indexes along with the iterators, then it might resolve down to a
single test and branch. However, designing iterators to be like that
would add so much overhead to the normal usage of the iterators that
it would almost certainly noticably negatively affect your
performance.

SG · Dec 22, 2010

Why was end2 omitted?
That is the core of the discussion.

Even if you do not use the faster version, a difference test is just an
integer comparison, not likely to cost much.

Assuming std::mismatch accepted an end2 parameter you could write

if (distance(beg2,end2)<distance(beg1,end1)) {
signal an error
}

but that imposes an additional cost, of course. And this cost may be
O(n) depending on the kind of iterator.

Admittedly, the precondition about the second sequence's length could
be more explicit in the C++ standard. But it's rather self-
explanatory, if you apply common sense.

Anyhow, I couldn't care less about std::mismatch omitting an end2
parameter. As others have pointed out already, some vendors
(Microsoft, GNU) offer an "STL debug mode". So, you get to choose
between checking iterators/algorithms and fast iterators/algorithms.
I'm working in the high performance computing domain and I appreciate
how the C++ language/library design turned out.

Cheers!
SG

deque questions	4	May 5, 2010
Container persistence (save/load)	4	Mar 30, 2010
several newbie questions about stl	10	Feb 8, 2009
Questions about buffered streams	28	Jan 7, 2008
Questions on various string literals in c++0x	1	Dec 7, 2010
some questions about multiple inheritance	9	Jun 26, 2007
STL:: Questions on set_intersection	3	Oct 11, 2005
Questions about optmodel constraints	0	Nov 21, 2010

Questions about "mismatch"

Anand Hariharan

jacob navia

Joshua Maurice

jacob navia

jacob navia

jacob navia

Joshua Maurice

Ian Collins

Marc

jacob navia

Ian Collins

Marc

jacob navia

Ian Collins

Joshua Maurice

Öö Tiib

gwowen

jacob navia

Joshua Maurice

SG

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads