C++ is Slow?

N

nw

Thanks for all the comments and suggestions, they've been very useful.

The consensus seems to be that yes, iostream is slower than stdio but
that it's largely down to poor implementations. I guess if I want to
know exactly why this is so, I'd need to dig around in the
implementations. Probably the best way to go is use iostreams for
small amounts of IO and write my own custom, possibly platform
specific, IO code when speed is critical.

The array versus vector discussion seems more problematic. A couple of
people made the point that actually it's not down to using vectors,
it's how you use them, it's basically down to allocating large blocks
versus lots of small ones. It's a good point and I'll keep it in mind.

The point was also made that if I decide to allocate large blocks and
calculate indexes then I should wrap this functionally in a class. I
guess it would also make sense to wrap it in a class that has a
similar interface to vector, assuming this can be done? This way I
could pass vector or my own object as a template parameter and easily
switch between the two.

For my own VectorContiguous, I guess operator[] would have to return a
dummy object which could be automatically type converted to the
storage type of the VectorContiguous object? Has this already been
done?

It sounds like if pushed I should be able to create a drop in
replacement for vector that contiguously allocates memory. Whither
this is valuable or not seems to be an open question, and probably
architecture dependent.
 
A

Alf P. Steinbach

* nw:
For my own VectorContiguous, I guess operator[] would have to return a
dummy object which could be automatically type converted to the
storage type of the VectorContiguous object? Has this already been
done?

It sounds like if pushed I should be able to create a drop in
replacement for vector that contiguously allocates memory. Whither
this is valuable or not seems to be an open question, and probably
architecture dependent.

Not sure what you're talking about here, but std::vector does allocate
contigious memory. Guaranteed by the (C++03) standard. And so does
std::string, in practice, but that won't be guaranteed until C++0x.


Cheers, & hth.,

- Alf
 
N

nw

Not sure what you're talking about here, but std::vector does allocate
contigious memory. Guaranteed by the (C++03) standard. And so does
std::string, in practice, but that won't be guaranteed until C++0x.

I'm talking about contiguous allocation of multidimensional vectors.
 
V

Victor Bazarov

nw said:
I'm talking about contiguous allocation of multidimensional vectors.

There is no such thing. There can be a vector of vectors, but it
is not a "multidimensional vector", unfortunately. The main reason
it isn't that is the ability of all [second-tier] vectors to have
different sizes.

V
 
D

Daniel T.

nw said:
The array versus vector discussion seems more problematic. A couple
of people made the point that actually it's not down to using
vectors, it's how you use them, it's basically down to allocating
large blocks versus lots of small ones. It's a good point and I'll
keep it in mind.

The point was also made that if I decide to allocate large blocks
and calculate indexes then I should wrap this functionally in a
class.

Actually, you should wrap the functionality in a class no matter how you
decide to implement it, that way you can change the implementation
easily.
I guess it would also make sense to wrap it in a class that has a
similar interface to vector, assuming this can be done? This way I
could pass vector or my own object as a template parameter and
easily switch between the two.

For my own VectorContiguous, I guess operator[] would have to
return a dummy object which could be automatically type converted
to the storage type of the VectorContiguous object? Has this
already been done?

It sounds like if pushed I should be able to create a drop in
replacement for vector that contiguously allocates memory. Whither
this is valuable or not seems to be an open question, and probably
architecture dependent.

Having op[] return a dummy object is a poor idea, it makes it harder to
drop in different implementations. Only do something like that if you
absolutely have to. See the FAQ for more about this subject.

http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-13.10
 
N

nw

Having op[] return a dummy object is a poor idea, it makes it harder to
drop in different implementations. Only do something like that if you
absolutely have to. See the FAQ for more about this subject.

http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...

hmm... ok, the FAQ makes a good point. But raises a couple of
questions.

Firstly is it possible to create a generalizable n-dimensional matrix
using this method?

Secondly is there anyway to make this compatible with (have a similar
interface to) a STL vector? It would be nice to use a STL container
for now and then implement my own classes later if required and select
between them using a template parameter.

Basically does this mean I should never use a multi-dimensional
vector, because I'm cutting myself off from the possibility of
changing the container later?
 
N

nw

I'm talking about contiguous allocation of multidimensional vectors.

There is no such thing. There can be a vector of vectors, but it
is not a "multidimensional vector", unfortunately. The main reason
it isn't that is the ability of all [second-tier] vectors to have
different sizes.

ok, I'm talking about implementing a vectorlike multidimensional
datastructure that uses contiguous memory (see initial message for
details of the problem I'm addressing).
 
J

Juha Nieminen

Jim said:
Unfortunately with my compiler my optimizations are disabled so I don't know
how it would be optimized.

You *can't* make any speed comparisons between things when compiling
without optimizations. In debug mode the speed of the code will be
completely random, depending on what debug checks the compiler chooses
to insert there.

Moreover, printing to a console makes the comparison pointless as
well, because printing to a console has an enormous overhead which
nullifies any speed difference there may be between different I/O
libraries. You have to print to a file to get a better comparison.
After all, the question was about reading (and presumably parsing)
enormous amounts of data, and outputting enormous amounts of data. These
things are never done with a console terminal as the output, but a file.
Writing to a file is quite a fast operation (probably hundreds of times
faster than printing to a console), and thus the speed difference
between the different I/O routines will have more effect.
 
D

Daniel T.

nw said:
Having op[] return a dummy object is a poor idea, it makes it harder to
drop in different implementations. Only do something like that if you
absolutely have to. See the FAQ for more about this subject.

http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...

hmm... ok, the FAQ makes a good point. But raises a couple of
questions.

Firstly is it possible to create a generalizable n-dimensional matrix
using this method?

Of course.
Secondly is there anyway to make this compatible with (have a similar
interface to) a STL vector? It would be nice to use a STL container
for now and then implement my own classes later if required and select
between them using a template parameter.

I'm guessing your nomenclature is confused again. Do you mean a vector
of vectors, when you say "vector" above?

If a Matrix (2 dimensional array, whatever you want to call it) is part
of the problem space and can be implemented in more than one way, then
it should be represented as a class. It would not be "nice" to use a
vector of vectors now and then try to shoehorn some other implementation
into the code later.
Basically does this mean I should never use a multi-dimensional
vector, because I'm cutting myself off from the possibility of
changing the container later?

By all means, use a vector of vectors when you need the functionality
that such a construct would provide. A ragged array, for example, not a
2 dimensional array.
 
I

Ioannis Vranos

nw said:
I'm talking about contiguous allocation of multidimensional vectors.

There is no such thing. There can be a vector of vectors, but it
is not a "multidimensional vector", unfortunately. The main reason
it isn't that is the ability of all [second-tier] vectors to have
different sizes.

ok, I'm talking about implementing a vectorlike multidimensional
datastructure that uses contiguous memory (see initial message for
details of the problem I'm addressing).


Where computational efficiency is a primary concern, you should use
valarray and its facilities (slice_array etc).
 
I

Ioannis Vranos

nw said:
Firstly is it possible to create a generalizable n-dimensional matrix
using this method?


Where computational efficiency is a primary concern you should use
valarray, and use its facilities for "simulating" n-dimensional matrices.
 
E

Erik Wikström

nw said:
I'm talking about contiguous allocation of multidimensional vectors.

There is no such thing. There can be a vector of vectors, but it
is not a "multidimensional vector", unfortunately. The main reason
it isn't that is the ability of all [second-tier] vectors to have
different sizes.

ok, I'm talking about implementing a vectorlike multidimensional
datastructure that uses contiguous memory (see initial message for
details of the problem I'm addressing).


Where computational efficiency is a primary concern, you should use
valarray and its facilities (slice_array etc).

I disagree, unless you have some commercial library which includes
specific optimisations for valarray using vector will be just as good.
The reason being that valarray was designed to allow optimisations but
they do not require it, and since valarray was a bit of a failure no
implementations that I know of perform any kinds of optimisations. The
only advantage of valarray is that it comes with a few mathematical
operators, but at least in the implementation that comes with Visual C++
they are implemented using normal loops.

In fact I would argue that writing your own might quite easily achieve
greater performance for some operations using OpenMP.

As for implementing a two-dimensional array I would do something like this:

template<typename T>
class Matrix
{
std::vector<T> m_vec;
size_t m_rows, m_cols;
public:
Matrix(size_t r, size_t c)
: m_vec(r * c), m_rows(r), m_cols(c)
{ }
T& operator()(size_t r, size_t c)
{
// Perhaps check if r < m_rows && c < m_cols
return m_vec[r * m_cols + c];
}
};

I might have messed up the calculation of where the element is (mixed
rows and columns) but I think that the general idea is clear. To make it
kind of usable with standard algorithms and containers you can offer
begin() and end() which just returns m_vec.begin()/end(), or you can
make more advanced stuff like allowing iterating over a row or column.
 
E

Erik Wikström

Having op[] return a dummy object is a poor idea, it makes it harder to
drop in different implementations. Only do something like that if you
absolutely have to. See the FAQ for more about this subject.

http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...

hmm... ok, the FAQ makes a good point. But raises a couple of
questions.

Firstly is it possible to create a generalizable n-dimensional matrix
using this method?

Secondly is there anyway to make this compatible with (have a similar
interface to) a STL vector? It would be nice to use a STL container
for now and then implement my own classes later if required and select
between them using a template parameter.

Basically does this mean I should never use a multi-dimensional
vector, because I'm cutting myself off from the possibility of
changing the container later?

That is why you should wrap it all in a class, that way you should be
able to change the internal workings without having to change the
class's external interface. That means that functions/classes using your
wrapper class can not tell the difference between the implementations.
 
I

Ioannis Vranos

Erik said:
I disagree, unless you have some commercial library which includes
specific optimisations for valarray using vector will be just as good.
The reason being that valarray was designed to allow optimisations but
they do not require it, and since valarray was a bit of a failure no
implementations that I know of perform any kinds of optimisations. The
only advantage of valarray is that it comes with a few mathematical
operators, but at least in the implementation that comes with Visual C++
they are implemented using normal loops.


Although this may be the case for the implementations you know, valarray
is still the type intended for high performance computations.

If we want to use the right tools for the right job, valarray is
intended for high performance computations.

Under usual circumstances, valarray will never be slower than vector,
while it can be faster than vector in a given implementation, or in a
future version of a given implementation. Also its facilities allow us
to "simulate" n-dimensional matrices without having to define our types,
elegantly and efficiently.
 
N

nw

Under usual circumstances, valarray will never be slower than vector,
while it can be faster than vector in a given implementation, or in a
future version of a given implementation. Also its facilities allow us
to "simulate" n-dimensional matrices without having to define our types,
elegantly and efficiently.

I'm finding it difficult to locate a good example of using valarray to
simulate
a multi-dimensional array. Can you give me an example or point me
towards one?
 
I

Ioannis Vranos

nw said:
I'm finding it difficult to locate a good example of using valarray to
simulate
a multi-dimensional array. Can you give me an example or point me
towards one?

"The C++ Programming Language" 3rd Edition or Special Edition by Bjarne
Stroustrup (the creator of C++), Chapter 22 "Numerics" (on page 657).
 
J

James Kanze

Actually, this only proves that this particular library has
both C and C++ implementations inefficient...

Or more likely, he has a very slow disk or IO bus. The more
time you spend in I/O, the less the differences in CPU make when
expressed as a percentage.

Not really related to the original question (we use Posix level
I/O for the critical parts in our application), but we've
noticed this a lot when migrating from Sparc/Solaris to
Linux/PC. Our Sparc park is old and slow---the new PC's are
often more than ten times as fast. But the Sparc I/O bus was
apparently a lot faster: our application actually runs at about
the same speed on both systems---with close to 100% use of the
CPU on the Sparcs, but less than 15% on the PC's.

My experience also suggests that SBM makes signficantly less
efficient use of the network than NFS (or maybe it is just Samba
which is a lot slower than the NFS servers), which means that
accessing a remote file under Windows will be really, really
slow. And that of course, any difference between stdio.h and
iostream will be negligeable, because even a poor and slow
implementation won't use measurable CPU compared to data
transfer times.
 
J

James Kanze

The array versus vector discussion seems more problematic. A
couple of people made the point that actually it's not down to
using vectors, it's how you use them, it's basically down to
allocating large blocks versus lots of small ones. It's a good
point and I'll keep it in mind.
The point was also made that if I decide to allocate large
blocks and calculate indexes then I should wrap this
functionally in a class. I guess it would also make sense to
wrap it in a class that has a similar interface to vector,
assuming this can be done? This way I could pass vector or my
own object as a template parameter and easily switch between
the two.

The trick is for operator[] to return a proxy object, which
supports operator[] to access the final value. As it happens,
T* fits the bill (although there'll be no bounds checking), so
implementing two dimensional arrays is particularly simple.
For my own VectorContiguous, I guess operator[] would have to
return a dummy object which could be automatically type
converted to the storage type of the VectorContiguous object?
Has this already been done?

class Vector2D
{
public:
Vector2D( size_t i, size_t j )
: m( i )
, n( j )
, data( i * j )
{
}
double* operator[]( size_t i )
{
return &data[ i * m ] ;
}
// ...

private:
size_t m ;
size_t n ;
std::vector< double > data ;
} ;

For starters.
It sounds like if pushed I should be able to create a drop in
replacement for vector that contiguously allocates memory.
Whither this is valuable or not seems to be an open question,
and probably architecture dependent.

The nice thing about using a class like the above is that you
can change the implementation at will without any changes in the
client code.
 
J

James Kanze


[...]
For my own VectorContiguous, I guess operator[] would have to
return a dummy object which could be automatically type converted
to the storage type of the VectorContiguous object? Has this
already been done?
It sounds like if pushed I should be able to create a drop in
replacement for vector that contiguously allocates memory. Whither
this is valuable or not seems to be an open question, and probably
architecture dependent.
Having op[] return a dummy object is a poor idea, it makes it harder to
drop in different implementations. Only do something like that if you
absolutely have to. See the FAQ for more about this subject.

The FAQ is wrong about this. The choice between [j] and
(i,j) should depend on personal preference and local
conventions; both offer exactly the same possibilities for
optimization and supporting different implementations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,965
Members
47,513
Latest member
JeremyLabo

Latest Threads

Top