Building a Large Container

G

Geoff

Well, I've changed the whole process to use a std:array...and it
hasn't helped at all. 8<{{
So, okay, I've got some other problem - somewhere in the data
conversion and preparation process. Here's what I'm doing (without
exposing you all to an enormous amount of detail code):
1. I am reading a CSV file, which I parse into individual fields.

Simple enough, you're looking for commas and gobbling the data between
them.
2. For each field (~15 of them, I either trim the alphanumeric data or
convert the numeric data to populate the data object (which has been a
std::map, std::vector, and now is a std::array object).
3. I do some data validation on the data fields, before I post he
object to the container.

Describe "validation". String matching can be expensive. Post the
code.
4. As each object is stored, I display a running count of objects
processed. This is where I can see that the activity is running slowly
and erratically - the running counter does not update smoothly, and it
is quite slow for the large volume of records (~30,000) I'm processing.

Is this running count appearing in a window? Is that code running
inside your processing loop or monitoring a counter for which you have
read-only asynchronous access? Multithreading? Mutex?
The problem wasn't really noticeable prior to running this large
input file. I had been dealing with ~2000 input records and never
noticed a concern.
I now see that something in the above processing has substantial
processing, but I don't yet know where or what. For now, I'm going to
comment out various functions to see if I can determine where the
overhead is. <sigh...>

Post the code.
 
A

Alf P. Steinbach

So, okay, I've got some other problem - somewhere in the data
conversion and preparation process.
[snip]

Really, get a profiler. Or if you like, adapt the primitive one I
posted to comp.lang.asm.x86 a few years ago:

https://groups.google.com/forum/#!msg/comp.lang.asm.x86/4AmQm_G2mAg/q28bTyfRLVEJ

It's nice with tool suggestions.

Perhaps someone can confirm or disprove that the following works also
for Visual C++ Express for Desktop, which seems to be Mike's IDE:

http://www.codeproject.com/Articles/144643/Profiling-of-C-Applications-in-Visual-Studio-for-F

I have too many variants of Visual Studio installed to say for sure.

But as I recall from many years ago, the MS tool side (enabling
profiling information in the build, running vsperfcmd.exe) is there also
for VS Express, and I seem to recall that I looked at the (possibly
processed?) data in Excel -- not using a viewer like the above site
offers.


Cheers,

- Alf (in "tools you didn't even know you have, yay!" mode)
 
Ö

Öö Tiib

I now see that something in the above processing has substantial
processing, but I don't yet know where or what. For now, I'm going to
comment out various functions to see if I can determine where the
overhead is. <sigh...>

Are you joking? Why not to take profiler? Such huge slowdown.
If you really think that the code is too long to post then break the
program in debugger couple times and post call stacks. :D
 
M

Mike Copeland

You've probably got some detail wrong some place, but if you
Like others have explained over and over agsin, noone
can help you without seeing that code.
I've found the problem: it was in some of the data validation
processing, where I had a large vector array of support data that I was
adding to for many input records' processing. I was doing a linear
search of this vector, and I was also sorting it by various fields and
using that data for other functions.
Suffice to say that when I blocked this processing the performance of
the whole program improved a great deal, and the "counter display" is
consistent. 8<}}
Thanks to all who contributed to helping my plight!
 
M

Mike Copeland

I now see that something in the above processing has substantial
Are you joking? Why not to take profiler? Such huge slowdown.
If you really think that the code is too long to post then break the
program in debugger couple times and post call stacks. :D
I've downloaded a profiler, but in the interest of getting this
application debugged I didn't want to spend time installing, configuring
and learning how to use it. Perhaps later...
 
G

Geoff

I've downloaded a profiler, but in the interest of getting this
application debugged I didn't want to spend time installing, configuring
and learning how to use it. Perhaps later...

Interesting. Waste time and effort guessing where your bottleneck is
rather than test and measure and fix it with certainty. Of course you
could put some timing functions or debug outputs into your program and
get a handle on which function is causing your slowdown but that might
take time away from debugging your program.
 
Ö

Öö Tiib

I've found the problem: it was in some of the data validation
processing, where I had a large vector array of support data that I was
adding to for many input records' processing. I was doing a linear
search of this vector, and I was also sorting it by various fields and
using that data for other functions.

Ok, problem solved by erasing unneeded functionality. It should be
preferred. Always think: do you actually need that large container of
objects in memory is sorted and indexed in various different ways.

Each such index slows everything somewhat so frequency of need must
be also considered. When ordering is needed rarely and temporarily
then it is usually 'std::priority_queue' of iterators to that container.

If permanent indexes are *really* needed, then things get interesting.
One of most fast-performing option for that is to have the objects in
'std::array' unsorted and in 'boost::intrusive::set's (or 'multiset's or
something else from intrusive library) by various indexing schemas.
If the objects are inserted and erased only from ends by one indexing
schema then instead of 'std::array' a 'boost::circular_buffer' can be used.

If even more sophisticated (relational-database-like) functionality is
needed then there is Boost.Multi-index for that. If there is too lot of
data (few megabytes aren't too lot on most modern hardware) then
there are actual DBMS.
Suffice to say that when I blocked this processing the performance of
the whole program improved a great deal, and the "counter display" is
consistent. 8<}}

Thanks to all who contributed to helping my plight!

Note that it took us two days; with a profiler such major bottle-necks are
typically found out in some minutes. So it is more fruitful to invest those
two days into learning to use a profiler next time. ;) Few megabytes to
process taking minutes indicates quadratic (or worse) solution and
that is hard to find problems that actually need it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,099
Messages
2,570,626
Members
47,237
Latest member
David123

Latest Threads

Top