STL speed

T

Tõnu Aas

The task was indeed simple. Read 2.000.000 words (average length = 9),
sort
them and write it to new file.

So, STL can reserve some space for each string.
+ string object overhead + vector object and you can have 512 bytes for each
word.
That is ~ 1GB for all you data !!! + program itselt & OS.
So you OS uses virtual memory, which is SLOOOOW.
STL is nice toy, but not for this kind of things.

Tõnu.
 
E

EnTn

I'm quite new to STL too...

but one thought was about the dynamic resizing of the vector as you
fill it.
AFAIK the vector will extend itself dynamically until it can no longer
remain a continuous allocation in memory, upon which is reallocates
itself in a new area of memory... (?)

Anyway, Can expensive allocations / deallocations / copies be avoided
by by using the reserve() member function to *try to* ensure that
there is enough contiginous space around your vector to avoid
reallocations....?

MHTWEOTSH...
 
J

Jeff Schwab

Please don't top-post.
I'm quite new to STL too...

but one thought was about the dynamic resizing of the vector as you
fill it.
AFAIK the vector will extend itself dynamically until it can no longer
remain a continuous allocation in memory, upon which is reallocates
itself in a new area of memory... (?)

That's the idea.
Anyway, Can expensive allocations / deallocations / copies be avoided
by by using the reserve() member function to *try to* ensure that
there is enough contiginous space around your vector to avoid
reallocations....?

Yes, that's the whole purpose of the "reserve" method.
MHTWEOTSH...

I give up. Is that a common greeting in some language I don't speak, or
does it stand for something?
 
S

Sean Clarke

tom_usenet said:
Why have wordstruct? What's wrong with using string directly?


The above would be much more efficient as:

if (q.word == "0")

since that saves an extra memory allocation per iteration.

Are you sure here that the compiler would not just create a string("0") ?

Personally I would use a either a const static or pre constructed const
string, is this overkill?

class HowIWouldDoIt
{
private:
const static String _testString;
public:
read_names(std::vector<wordstruct>& x)
};

HowIWouldDoIt::_testString = "0";

void read_names(vector<wordstruct>& x)
{
wordstruct q;
while (true) {
cin >> q.word;
if (q.word == _testString )
break;

cont...


Here you should add:
std::ios::sync_with_stdio(false);
since buffering will be disabled on cin and cout on some
implementations if you don't.

Interesting... I've not used this before.
Here you might want to reserve some space in the vector:
x.reserve(1000); //or more

I think this will probably be the crux of the problem, the other stuff I'd
consider "fine tuning", but worthwhile none the less.

Out of intestest, I wonder how this would perform using a set<> ? ie sorted
(operator<) on insert......

:)
 
K

Karl Heinz Buchegger

EnTn said:
Anyway, Can expensive allocations / deallocations / copies be avoided
by by using the reserve() member function to *try to* ensure that
there is enough contiginous space around your vector to avoid
reallocations....?

That's the whole idea of reserve.

Another possibility for the OP would be to change strategie. He could
try to use a std::map instead of the vector. I did this once when
toying around (counting and sorting the words of the bible) and achieved
a noticable speedup.
 
M

Martijn Lievaart

Anyway, Can expensive allocations / deallocations / copies be avoided
by by using the reserve() member function to *try to* ensure that
there is enough contiginous space around your vector to avoid
reallocations....?

As long as you interpret *try to* as *throws an exception when out of
memory*, yes. A vectors contents is always contiginous.

HTH,
M4
 
M

Martijn Lievaart

So, STL can reserve some space for each string.
+ string object overhead + vector object and you can have 512 bytes for each
word.
That is ~ 1GB for all you data !!! + program itselt & OS.
So you OS uses virtual memory, which is SLOOOOW.
STL is nice toy, but not for this kind of things.

No. This is an algorithmic issue. You can do the same in C which will be
equally slow, or you can choose another algortihm.

This case is one where the overhead of the STL (if any) completely
disapears.

HTH,
M4
 
T

tom_usenet

Are you sure here that the compiler would not just create a string("0") ?

Well, its a QOI issue (operator== can make as many allocations as it
likes), but there is a
template<class charT, class traits, class Allocator>
bool operator==(const basic_string<charT,traits,Allocator>& lhs, const
charT* rhs);

An implementation would have to be amazingly stupid to just forward
the call to the 2 string version, triggering an unnecessary string
construction.
Personally I would use a either a const static or pre constructed const
string, is this overkill?

A pre-constructed one is a better solution - in theory the comparison
can be slightly faster.
class HowIWouldDoIt
{
private:
const static String _testString;
public:
read_names(std::vector<wordstruct>& x)
};

HowIWouldDoIt::_testString = "0";

void read_names(vector<wordstruct>& x)
{
wordstruct q;
while (true) {
cin >> q.word;
if (q.word == _testString )
break;

How about just:

wordstruct q;
std::string const testString("0");
while (true) {
cin >> q.word;
if (q.word == testString)
break;
I think this will probably be the crux of the problem, the other stuff I'd
consider "fine tuning", but worthwhile none the less.

vector::reserve is useful, but because of the exponential growth
behaviour of vector, it only tends to make a small difference. If you
don't pre-allocate, you typically only construct twice as many
objects. With a reference counted string (rare these days) the gain
will be even smaller.
Out of intestest, I wonder how this would perform using a set<> ? ie sorted
(operator<) on insert......

Probably quite a bit worse. The O complexity will be the same, but the
constant is likely to be bigger.

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
D

David White

P.J. Plauger said:
possible.

Well, I didn't have access to any of that $50B when I wrote that code
in 1993, but I did write significant chunks of the C and C++ Standards
in those areas. I knew that printf gets right all sorts of subtle
corner cases that practically every iostreams implementation botched
one way or the other. I also had mineral rights to all the code I needed
to do the job other than `cheap and nasty,' and I was unable to get any
significant improvement over fabricating a format string and calling
sprintf.

That certainly sounds remarkable. In the case of string output with a given
field width there doesn't _seem_ to be a lot to do if it is done directly
(speaking from zero experience in implementing such things). To create a
format string and then have sprintf interpret it and then do the output
sounds like a lot of added overhead for a short string.

What about input? I remember reading a large text file full of numbers in
VC++ 5 or 6 and having to rewrite the code the C way because the C++
ifstream was many, many times slower. Maybe this is a clue to why sprintf
didn't make much difference to the output speed: there's already so much
overhead in C++ streams that using the C library didn't matter. If so,
programmers used to C won't exactly by encouraged to switch to streams.
FWIW, Microsoft's stash has roughly doubled since the day they chose to
adopt our cheap and nasty approach. Coincidence? (Probably.)

No, I think you deserve a cut :)

DW

P.S. Something I couldn't remember was whether you used sprintf to fabricate
the format string as well. Did you?
 
P

P.J. Plauger

That certainly sounds remarkable. In the case of string output with a given
field width there doesn't _seem_ to be a lot to do if it is done directly
(speaking from zero experience in implementing such things). To create a
format string and then have sprintf interpret it and then do the output
sounds like a lot of added overhead for a short string.

That's the trouble with software. What *seems* inefficient can only be
verified by measurement. Ones intuition is so often wrong.
What about input? I remember reading a large text file full of numbers in
VC++ 5 or 6 and having to rewrite the code the C way because the C++
ifstream was many, many times slower. Maybe this is a clue to why sprintf
didn't make much difference to the output speed: there's already so much
overhead in C++ streams that using the C library didn't matter. If so,
programmers used to C won't exactly by encouraged to switch to streams.

Perhaps you ran afoul of the regrettable bug we had in that version.
See http://www.dinkumware.com/vc_fixes.html for the one-line fix.
If you opened a stream by filename, the bug defeated file buffering.
But absent this bug, the raw overhead of shoveling bytes through
iostreams isn't all that bad.

And to answer your leadoff question, we supply our own scanners for
integers and floating-point fields, since scanf has always been
klunkier than printf.
No, I think you deserve a cut :)

I keep trying...
P.S. Something I couldn't remember was whether you used sprintf to fabricate
the format string as well. Did you?

No.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
R

red floyd

Przemo Drochomirecki wrote:
[code redacted]
compiled under VC++6.0

There's your problem. VC5 and VC6 had a known buffering issue with ifstreams.
PJ has commented on it, and there is a fix somewhere on the DinkumWare website.

red floyd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top