Comparing two vectors of bools

F

fgh.vbn.rty

I know that std::vector<bool> is specialized to store individual bools
as single bits. I have a question about the comparison operation on a
pair of vector<bool>s.

Is the comparison done bit by bit? That would be slow. In theory it's
possible to just do a bit-wise XOR on all bits and then OR it all
together and compare the single bit right? That would make it O(1)
instead of O(n). What's bad about this method?

Does the size of the vector matter here? For instance if the vectors
are <= 32bits they can fit into two registers and it's easy to do the
XOR. But if they are more than that then it's more complicated

What about comparison on std::bitset? Does compare work the same way
for it?

And what about boost::dynamic_bitset? Any difference for that?

Thanks.
 
R

Rafał

I know that std::vector<bool> is specialized to store individual bools
as single bits.

Actually this is considered a bit of "hack" and AFAIR it will be removed
soon (in C++0x?).
 
F

fgh.vbn.rty

No, it wouldn't be O(1). It's still O(n). This would be faster, of course,
but the complexity is still linear, since the number of XOR operations is
linearly proportional to the vector's size.

Hmm, if we have a 32-bit ALU then we could load both vectors into a
pair of registers and use the comparator to get the result in "one
cycle" right? In other words, we are doing all XORs in parallel
followed by one OR. Please correct me if I'm wrong here.

Of course, I have no idea if the compiler would actually do this.

Thanks.
 
J

Jerry Coffin

Actually this is considered a bit of "hack" and AFAIR it will be removed
soon (in C++0x?).

I doubt it. The current draft (N2691) still includes it, with minor
changes from C++ 03. It now explicitly recommends: "a space optimized
representation of bits...", whereas C++ 03 merely described the class as
being provided: "To optimize space allocation..."
 
J

James Kanze

[/QUOTE]

The library is free to implement the function however it wants.
From a QoI point of view, of course, it would be a very poor
library that does the comparison bit by bit.
Hmm, if we have a 32-bit ALU then we could load both vectors
into a pair of registers and use the comparator to get the
result in "one cycle" right? In other words, we are doing all
XORs in parallel followed by one OR. Please correct me if I'm
wrong here.

I'm not sure I understand. How can the code load both vectors
into a pair of registers, if, say, the vectors each contain
several thousand bits.

What I would expect is that the bits are actually kept in an
array of unsigned, and that the library compare this. Something
like:

return lhs.size() == rhs.size()
&& std::equal( lhs.buffer_start,
lhs.buffer_end,
rhs.buffer_start ) ;

(Some additional logic is necessary to ensure that a possible
partial word at the end doesn't compare unequal, even though all
of the significant bits are equal.)

The algorithm remains O(N), but:

-- is O(1) is the lengths are different, and
-- only actually loops N/32 (or whatever) times, rather than N
(but this doesn't change the complexity, of course).
Of course, I have no idea if the compiler would actually do
this.

Typicallyl, the compiler will do whatever the code in the
library implementation tells it to do. (It doesn't have to;
it's allowed to build in knowledge of std::vector<bool>, and use
that knowledge to generate special code. But I don't know of
any compiler which does this.)
 
J

Juha Nieminen

Hmm, if we have a 32-bit ALU then we could load both vectors into a
pair of registers and use the comparator to get the result in "one
cycle" right? In other words, we are doing all XORs in parallel
followed by one OR. Please correct me if I'm wrong here.

If the std::vector<bool> has a size of 1 million, how do you suppose
it can load all at once into the ALU?

Btw, what do you need the xors and ors for? If you are comparing for
equality or inequality, just compare the array elements (eg. if they are
for example size_t elements) for equality of inequality. If any bit
differs, the results will be unequal.
 
J

Juha Nieminen

Rafał said:
Actually this is considered a bit of "hack" and AFAIR it will be removed
soon (in C++0x?).

Exactly how is it a "hack" and why should it be removed? Has a better
alternative been suggested?

I certainly would like to have some kind of bool data container where
each bool indeed takes only one bit, to save memory.

(OTOH, I assume the dynamic bitvector in boost is more of the kind of
data container people like rather than std::vector<bool>? I would be
fine with that too, but will such a data container be included in the
next standard?)
 
B

Bo Persson

Juha said:
Exactly how is it a "hack" and why should it be removed? Has a
better alternative been suggested?

It's strange because it is the only container that does not store
elements of the contained type. It also uses a proxy class for
vector<bool>::reference, which is not allowed by the standard for any
other container.

The real hack is that the standard assumes that everyone benefits from
a space optimization for vector<bool> and speed optimizations for
vector<everything_else>. Why?


It hasn't been removed so far, and as its section has actually been
edited in the latest draft for the next standard, it is unlikely to go
away anytime soon. We have also seen that other containers, like
std::string and std::array, are allowed to somewhat deviate from the
general container requirements, so why not?

I certainly would like to have some kind of bool data container
where each bool indeed takes only one bit, to save memory.

Always, or sometimes? :)
(OTOH, I assume the dynamic bitvector in boost is more of the kind
of data container people like rather than std::vector<bool>? I
would be fine with that too, but will such a data container be
included in the next standard?)

Doesn't seem so (yet).


Bo Persson
 
F

fgh.vbn.rty

I'm not sure I understand.  How can the code load both vectors
into a pair of registers, if, say, the vectors each contain
several thousand bits.

I was only referring to the case when size <= 32. If it is more than
that then clearly, the N/32 factor that you have mentioned will come
in.

Thanks.
 
R

Rafał

Juha said:
Exactly how is it a "hack" and why should it be removed?

This container do not act like a container, it has other semantics.
For example, taking address of N-th element will not work.

#include <vector>

template <typename T> void Test() {
std::vector<T> V(3);
T& third = V.at(2);
}

int main() {
Test<int>();
Test<bool>();
}

Works for int,
but for bool:
x.cpp:5: error: invalid initialization of non-const reference of
type ‘bool&’ from a temporary of type ‘std::_Bit_reference
I certainly would like to have some kind of bool data container where
each bool indeed takes only one bit, to save memory.

Yes, but this should be separate object - because it does not behave like a
vector<>
 
J

James Kanze

It's strange because it is the only container that does not
store elements of the contained type.

But it does. At least as far as client code can tell. And
bitset is in a similar situation. It's strange because it's
but it doesn't conform to the said:
It also uses a proxy class for vector<bool>::reference, which
is not allowed by the standard for any other container.

Again: bitset.

The problem isn't that it's there. The problem is that it is
called std::vector< bool >, and not something else. (Or
alternatively, the problem is that the contract for std::vector
is too constraining. You can argue it both ways. But if you
take this second approach, you're going to have to change some
of the basic principles of the STL.)
The real hack is that the standard assumes that everyone
benefits from a space optimization for vector<bool> and speed
optimizations for vector<everything_else>. Why?

In many cases, the space optimization may also be a speed
optimization (because of reduced cache misses). The difference
is that this optimization only applies to vector<bool>, and that
it isn't transparent, and that there's no way of making it
transparent, given the design of the STL and the integration of
std::vector into the STL.
It hasn't been removed so far, and as its section has actually
been edited in the latest draft for the next standard, it is
unlikely to go away anytime soon. We have also seen that other
containers, like std::string and std::array, are allowed to
somewhat deviate from the general container requirements, so
why not?

Because the name std::vector gives, or should give, certain
guarantees. std::bitset has a different name, and you don't
expect it to define the same contract as std::vector, so there's
no problem with it, and there'd be no problem with an
std::dynamic_bitset, either.
Always, or sometimes? :)

Well, if it's not in a container, a normal bool is fine.

My own experience is that I've never needed a vector of bool
(with the interface of vector, or something similar), but if I
did, I'd want it to behave like vector. My own experience is
that I often need something like bitset (with the interface of
bitset), but with dynamic length. Except that there is a lot
more functionality which is useful: isSubsetOf( other ), etc.;
bitset doesn't seem to be able to make up its mind what it
really is either. The result is that I still use (and actively
maintain) my pre-standard implementations, even today.
 
B

Bo Persson

James said:
But it does. At least as far as client code can tell. And
bitset is in a similar situation. It's strange because it's


Again: bitset.

Yes, I forgot about that one. :)

It claims to be an associative container, but doesn't have the
required interface.

I should have written that the general container requirements (section
23.1) doesn't allow for this (reference as a proxy). On the other hand
it is still required in some places...
The problem isn't that it's there. The problem is that it is
called std::vector< bool >, and not something else. (Or
alternatively, the problem is that the contract for std::vector
is too constraining. You can argue it both ways. But if you
take this second approach, you're going to have to change some
of the basic principles of the STL.)

The contract is cracking up anyway, as the revisions for the next
standard introduces new containers and new requirements that just
don't go well together.

Just like bitset is (supposedly) an associative container with a fixed
size (and therefore has a non-conforming different interface), C++0x
introduces a std::array as a fixed size sequence container which
cannot implement insert and erase, for example. We also have
basic_string claiming to be a sequence container, but falling short.

In comparison, vector<bool> seems like a minor problem. .-)


Bo Persson
 
J

Juha Nieminen

Rafał said:
This container do not act like a container, it has other semantics.
For example, taking address of N-th element will not work.

Ok, that's *one* example. You write as if there were numerous. Any
other examples of std::vector<bool> not behaving like other types of
std::vector?

(One could argue that not being able to get a pointer to a bool value
inside the vector is not such a big of a loss...)
 
E

Erik Wikström

Ok, that's *one* example. You write as if there were numerous. Any
other examples of std::vector<bool> not behaving like other types of
std::vector?

(One could argue that not being able to get a pointer to a bool value
inside the vector is not such a big of a loss...)

The loss is quite small until you try to use vector<bool> just like any
other vector in one of your nice parametrised libraries and stuff no
longer works and you are forced to rewrite half the library just to get
it working with vector<bool>.
 
R

Rafał

Juha said:
Ok, that's *one* example. You write as if there were numerous. Any
other examples of std::vector<bool> not behaving like other types of
std::vector?

(One could argue that not being able to get a pointer to a bool value
inside the vector is not such a big of a loss...)

Well, vector claims to allow to do T& x = &V.at(n); yet it brakes that
promise.

This is not USA government, but an official standardized language, so I
would expect it to keep such "promises".

It's unfortunate that some programs do rely on this behavior, and now it
will not be change therefore.

IMHO they should right away start with some vector_packed class.
 
J

James Kanze

Yes, I forgot about that one. :)
It claims to be an associative container, but doesn't have the
required interface.

And shouldn't have. Basically, the requirements for containers
don't really mean anything; it doesn't start becoming serious
until you consider the requirements for sequence.

[...]
The contract is cracking up anyway, as the revisions for the
next standard introduces new containers and new requirements
that just don't go well together.

The problem is that the original specifications aren't really
well designed. They work (more or less) for the containers that
were present in the original standard (with the exception of
bitset), but they aren't nearly flexible enough to cover all, or
even most, of the useful cases.
Just like bitset is (supposedly) an associative container with
a fixed size (and therefore has a non-conforming different
interface), C++0x introduces a std::array as a fixed size
sequence container which cannot implement insert and erase,
for example. We also have basic_string claiming to be a
sequence container, but falling short.

basic_string should be fixed so that it is a sequence.
Otherwise, however, I don't see anything wrong with introducing
containers which aren't sequences, and it seems obvious (to me,
anyways) that the basic concepts need working on: ) there should
be a distinction in the notion of mutable---you need containers
whose values can be modified, but whose topology is fixed (like
tr1::array), for example, and there are more than a few problems
with the iterators: why, for example, should an iterator into a
non-mutable sequence still require operator* to return a
reference, rather than a value? Or for that matter, why not
rework the concepts to allow proxies in general?
In comparison, vector<bool> seems like a minor problem. .-)

The problem isn't that it exists. The problem is that it is an
std::vector, and not something else.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top