Clean ways to count remove_if() removals?

R

Richard Herring

In message <[email protected]>, Victor Bazarov

[of std::remove_if]
IIRC, 'remove_if' only places the elements to be removed to the end of
the container.

No, it copies the elements not to be removed to the beginning. Nothing
is said about the end; it's not a partition or a partial sort.

The end is not guaranteed to contain exactly those elements that are to
be removed. On the contrary, it's likely to contain copies of some of
the elements that are being kept, and there may be no instances at all
of those to be removed.
 
J

James Kanze

The standard says it "should have constant complexity". That's a
recommendation, not a requirement.

Ah, yes. Should, instead of shall. (But without some
qualifier, I don't see the point of the text.)
One issue here is the requirement that list::splice have
constant complexity, which can't be done if size is also
constant complexity.

I know there had been a lot of discussion about this; that's why
I actually looked it up in my copy of C++03. There, for the one
the one case of splice where it is relevant, the complexity is
given as "Constant time if &x == this; otherwise, linear time."
That sounded to me as if the issue had been resolved, in favor
of requiring size() to have constant complexity.

Interestingly, the SGI documentation says the opposite: size()
is linear, and all of the splice() functions are of constant
complexity, always. I suspect (but I don't know) that the SGI
documentation actually reflects the original, pre-standard STL
more closely. Which again suggests that the committee
"resolved" the issue in favor of requiring size() to have
constant complexity, rather than splice.

The issue is further clouded by the fact that in the standard,
complexity doesn't refer to time, but to a number of specific
operations, and the standard neglects to specify which
operations are being considered in the case of size() or
splice(). Even in the case where splice (or size()) is O(n), I
would expect zero calls to the copy constructor, or anything
else which might be expensive. (If the current wording in the
standard is maintained, and splice is allowed to be O(n), it
might be nice to specify that no calls to the copy constructor
or destructor. Perhaps at the top of the subsection on splice,
since this is really the whole point of the function. One could
argue that the name of the function already says this, but given
the semantics of functions like std::remove, it's hard to argue
anything based on the name.)
 
J

James Kanze

Well, here is stlport4 std::list<>::size() (bundled with Sun
Studio 12):
size_type size() const {
size_type __result = distance(begin(), end());
return __result;
}

The STL port is hardly a reference.

In this case: the STL port is based on the SGI implemenation.
And the SGI implementation clearly offers a different set of
guarantees than the standard---in the SGI documentation, size()
is O(n), and list<>::splice() is required to be constant in all
cases. The standard allows list<>::splice() to be linear when
it is necessary to recalculate the size, but says that size()
should be constant. (Of course, as Pete has pointed out, in the
standard, "should" is not the same thing as "shall".)
 
J

James Kanze

Jerry said:
[ ... ]
The standard says it "should have constant complexity".
That's a recommendation, not a requirement. One issue here
is the requirement that list::splice have constant
complexity, which can't be done if size is also constant
complexity.
I was looking at that, but after looking carefully, I
believe size() can be made constant complexity while meeting
all the complexity requirements for splice().
I think you're right. I was relying on received lore, and
didn't check the standard. But I'm _sure_ there was supposed
to be a conflict here. <g>

The conflict may be in what people want. I'm not too sure about
the history, and all that, but it's possible that the standard
is hedging it---that it was (intentionally) written in a way
that would allow both possibilities. (Although the "should"
still sounds like a strong recommendation, if not a requirement,
whereas the complexity requirements for splice are clearly
written in a way to allow recomputing the size.)
 
J

James Kanze

* joseph cook:
It doesn't.
There are two possible in practice list implementations: one
where size is O(1), and one where splicing is O(1).
The standard allows both.

That depends on how you interpret the "should". Formally, yes,
it allows both. But the intent is clear: size() should be
constant, and splice has linear complexity in the cases where
this would be necessary to update size().
In order to make splicing O(1).

The SGI documentation requires splice() to be O(1), in all
cases, and says that size() is O(n). This is in direct
contradiction with the standard. The SGI documentation is, I
believe, pre-standard, which suggests that the standards
committee made an intentional change here, which is, in itself,
significant.
 
J

James Kanze

[ ... ]
The standard requires list<>::size() to have constant time,
which means that it must be cached.
Where is that requirement? The only thing I know of is the
infamouse "note A" attached to Table 65, which only says it:
"should have constant complexity."

Yes. That's the passage I was referring to. But as Pete has
reminded me, "should" doesn't mean much in standardese---to be a
requirement, it should say "shall". (At least, that's what I've
been told. I don't have access to ISO/IEC 2382 to verify it.
If someone does have access to it, I'd be curious to hear what
it says about "should" and "shall".)
 
A

Alf P. Steinbach

* Jerry Coffin:
[ ... ]
The standard says it "should have constant complexity". That's a
recommendation, not a requirement. One issue here is the requirement
that list::splice have constant complexity, which can't be done if size
is also constant complexity.

I was looking at that, but after looking carefully, I believe size()
can be made constant complexity while meeting all the complexity
requirements for splice().

There are three overloads of splice:
void splice(iterator position, list<T, Allocator> &x);
Inserts the entirety of x at before position. The cached size can be
updated in constant time:
cached_size += x.size();
x.cached_size = 0;

void splice(iterator position, list<T, Allocator>& x, iterator i);

This removes x and inserts it as this[position]. The sizes can be
updated in consant time:
++cached_size;
--x.cached_size;

void splice(iterator position, list<T, Allocator> &x, iterator first,
iterator last);

This is the (somewhat) tricky one: the requirement is:
Constant time if &x == this; otherwise, linear time

If &x==this, we're just moving elements around in the same container
-- so the container's size doesn't change. We can just modify the
pointers in constant time, and never have to touch the cached size at
all.


Well, the problem is making this last case O(1) when moving a sequence of
elements from from one list to another, while keeping size() as O(1).

Can't be done.

If we're moving elements from one container to another, linear time
is allowed. This allows walking the elements in the list from first
to last to count the elements.

Yeah, the standard's requirements are achievable, but the conflict between
size() and splice() still stands.

If there was no requirement for updating a cached count, all the
overloads of splice could always have constant complexity.

Yep, that's the conflict.

I'm leaning in the direction of having splice() as guaranteed constant time.

For without it a list doesn't support much more functionality than a vector
(disregarding possible C++0x relaxation of assignable requirement for elements).
After all, how often does one insert and/or delete via 2 or more iterators at
the same time in the same list? And when there's just one iterator then a
vector, used as a cursor gap array, offers O(1) insert/delete of elements at
iterator position (keeping the sequence for other elements), as well as O(1)
read/write random access, better than the list.

That is, I think the original STL had it right.

Provided I have it right that the original STL had O(1) splice.


Cheers & hth.,

- Alf
 
A

Alf P. Steinbach

* James Kanze:
That depends on how you interpret the "should".

That's simple. Where constant time is a requirement that table just says
"constant". But in cases where that might not be practically achievable in all
cases, it says "Note A", which says "should", i.e. it's not required.

Formally, yes,
it allows both. But the intent is clear: size() should be
constant, and splice has linear complexity in the cases where
this would be necessary to update size().



The SGI documentation requires splice() to be O(1), in all
cases, and says that size() is O(n). This is in direct
contradiction with the standard. The SGI documentation is, I
believe, pre-standard, which suggests that the standards
committee made an intentional change here, which is, in itself,
significant.

See my posting else-thread: I think SGI and original STL had it right, and the
standard has it wrong, with respect to usefulness/practicality. I used to have
Stepanov's STL spec. But I had a disk crash which selectively removed whole
directories and one time before a very similar thing made files with a certain
/name pattern/ unreadable (I guess there's something physically wrong, but
mostly it's the darned driver software and caching that messes it up, and then
confuses Windows, which goes on rampant kill! delete! spree).


- Alf
 
M

Martin Eisenberg

Pete said:
One reason for all this handwaving is a desire to not preclude
debugging versions of splice, which sometimes have to root
around in the underlying data in ways that can't be done in
constant time.

Would it raise any technical problem for a standard (be it C++ or
something newly drawn up) to give debug implementations of any
facility a waiver up front to do whatever they have to, leaving
reasonableness to QoI, rather than muddy descriptions of ordinary
cases?


Martin
 
J

Jorgen Grahn

[snip a linear-time implementation]
The STL port is hardly a reference.

But the implementation which comes with gcc does the same thing, so
you cannot ignore that possibility in practice.

IMO, the standard should specify the complexity of the STL, even if it
means leaving little freedom for different implementation techniques.
You *have* to know if it's O(1) or O(n) when you're designing your
code (unless your containers are small). But maybe it's too late for
that.

/Jorgen
 
M

Martin Eisenberg

Pete said:
For some folks, debugging implementations are "ordinary cases."

The standard doesn't require that implementations conform to it.
It can't. So for the standard to say "debugging implementations
need not conform to this standard" would be vacuous. That's an
issue between you and your vendor.

Thanks, that's logical. But... then it shouldn't leak to the surface
of the splice specification either, right?


Martin
 
J

Jerry Coffin

[ ... ]
I think you're right. I was relying on received lore, and didn't check
the standard. But I'm _sure_ there was supposed to be a conflict here. <g>

Well, as I pointed out (and Alf did as well), there is sort of a
conflict there: you can make either splice or size O(1), but not
both. The standard's wording allows the implementer to choose either
(or neither, if you get down to it, though I can't imagine why
anybody would do that).

It's a question of which is more important. At least to me, that
comes down to which you want to emphasize: the uniformity of
containers or the unique capability of the list container.

I think it's better to cache the size so size() become O(k). My
reasoning behind that probably qualifies as somewhat condescending
though: I think 95% of the time that people choose std::list, it's a
mistake. It's better to protect people from the worst effects of
their mistake than to optimize a capability they rarely use.
Aside from the issue of unequal allocators, which the current standard
dodges.

Well, yes. Despite the standard's encouragement to support unequal
allocators, I pretty much treat it as a dead issue. I'm not entirely
convinced that portable code will ever be able to count on its being
supported; whether it will someday, it certainly can't now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,417
Latest member
DarrenGaun

Latest Threads

Top