M
Mike Copeland
My current application has 2 large data sets that are combined into a
single data set that I must access by (part of) a string value.
Currently I have the structure declared as a map object, but after
populating the basic information I am adding information from another
database that's much larger - in a many-to-one situation.
Here's the fundamental information I use:
struct Res_Struct // Individual Event Finisher data
{
int resEvtNum; // link to Events table
int resYear; // Event Year
int resOAll; // OverAll Finish position
int resD_P; // Division Place
long resTime; // Finish Time
} resWork;
struct Hist_Fins // individual Finisher's results
{
int evtNum; // Result's Event # link
string PRF; // P/R indicator
Res_Struct histInfo; // Finisher's result(s) info
} histWork;
vector<Hist_Fins>::iterator hIter;
struct Fin_Struct // Individual Finisher data
{
long finLink; // unique Finisher (link)
char finGender; // gender
int finCount; // # Finishes by this participant
string finName; // Finisher Name (Last, First M.)
string finDoB; // (derived) DoB from event Age/Year
vector<Hist_Fins> histVect;
} finWork;
map<int, Fin_Struct> finMap;
map<int, Fin_Struct>::iterator fIter;
Yes, this seems a bit convoluted, but the application has been
growing in size and complexity, and I've not had time to redesign...
The important issue here is that I have ~160,000 records that
construct the basic information in the Fin_Struct. My other data (~
400,000 records) comprise the information that populates the "histVect"
object - 1-200 vector items in each map object. The input data files
are flat text data files (referencing some earlier posts about file I/o
efficiency).
Note that the map has an integer key value, and values range from 101
through ~160,000. I don't use the "name" as a key because I normally
scan the entire map object to look for objects that match some part of
the name value (e.g. I want to find all objects with names that start
with "WAL", etc.).
The use of an STL map doesn't seem best, because I don't use the map
in a traditional way, and the loading of the map takes a lot of time
<sigh>. Since the data objects are consecutive in an integer range, I
wonder if another container would be a better choice. I could use a
vector (and reserve a good amount of space "going in", rather than let
slow runtime grow occur), but I think I'd lose significant "load time"
by not referencing a map as I'd have to scan the vector 400,000 or more
times during the 2nd file population...
Both files contain the integer value that links them, as well as the
"name" string.
Any thoughts? TIA
single data set that I must access by (part of) a string value.
Currently I have the structure declared as a map object, but after
populating the basic information I am adding information from another
database that's much larger - in a many-to-one situation.
Here's the fundamental information I use:
struct Res_Struct // Individual Event Finisher data
{
int resEvtNum; // link to Events table
int resYear; // Event Year
int resOAll; // OverAll Finish position
int resD_P; // Division Place
long resTime; // Finish Time
} resWork;
struct Hist_Fins // individual Finisher's results
{
int evtNum; // Result's Event # link
string PRF; // P/R indicator
Res_Struct histInfo; // Finisher's result(s) info
} histWork;
vector<Hist_Fins>::iterator hIter;
struct Fin_Struct // Individual Finisher data
{
long finLink; // unique Finisher (link)
char finGender; // gender
int finCount; // # Finishes by this participant
string finName; // Finisher Name (Last, First M.)
string finDoB; // (derived) DoB from event Age/Year
vector<Hist_Fins> histVect;
} finWork;
map<int, Fin_Struct> finMap;
map<int, Fin_Struct>::iterator fIter;
Yes, this seems a bit convoluted, but the application has been
growing in size and complexity, and I've not had time to redesign...
The important issue here is that I have ~160,000 records that
construct the basic information in the Fin_Struct. My other data (~
400,000 records) comprise the information that populates the "histVect"
object - 1-200 vector items in each map object. The input data files
are flat text data files (referencing some earlier posts about file I/o
efficiency).
Note that the map has an integer key value, and values range from 101
through ~160,000. I don't use the "name" as a key because I normally
scan the entire map object to look for objects that match some part of
the name value (e.g. I want to find all objects with names that start
with "WAL", etc.).
The use of an STL map doesn't seem best, because I don't use the map
in a traditional way, and the loading of the map takes a lot of time
<sigh>. Since the data objects are consecutive in an integer range, I
wonder if another container would be a better choice. I could use a
vector (and reserve a good amount of space "going in", rather than let
slow runtime grow occur), but I think I'd lose significant "load time"
by not referencing a map as I'd have to scan the vector 400,000 or more
times during the 2nd file population...
Both files contain the integer value that links them, as well as the
"name" string.
Any thoughts? TIA