F
Fred Zwarts
I have an application in which data from one place needs to copied to another place.
These places can be network sockets, or a files on disk, etc.
In all cases the data consists of unformatted unsigned 32-bit integers.
Each record starts with a 32-bit integer with the record size.
Then the record with the data follows.
Then another count follows, or the end of the data.
The records usually have up to a few thousand integers,
but occasionally it can be very much larger.
For efficiency, the records are read as a block.
Reading the data elements one by one would significantly reduce the I/O performance.
The application uses a vector <uint32_t> as a buffer for reading and writing the data.
First it reads the count.
Then it can do two things, either resize the vector if it is too small,
or increase the capacity of the vector with the reserve function.
Then it reads the record into the vector.
Then it writes the count and writes the data.
The question is about resizing the buffer.
Using a resize has the disadvantage that all new elements of the vector are initialized,
which is an unnecessary operation, because these values will be overwritten immediately
by the subsequent read operation.
Alternatively, the vector size is kept at 1, and only the capacity of the vector is increased.
The address of the first element of the vector and the size of the record is supplied
to the read function and after that to the write function.
This means data elements of the vector are used as a buffer that do not belong to its defined size.
However, the standard says that all data elements are contiguous in memory,
so, this means that also the reserved space must be contiguous.
So, logically, there should not be a problem to use this reserved space in this way.
Is there a reason to think that there are environments were this does not work?
Is there another way to resize the buffer, without initializing all new elements?
These places can be network sockets, or a files on disk, etc.
In all cases the data consists of unformatted unsigned 32-bit integers.
Each record starts with a 32-bit integer with the record size.
Then the record with the data follows.
Then another count follows, or the end of the data.
The records usually have up to a few thousand integers,
but occasionally it can be very much larger.
For efficiency, the records are read as a block.
Reading the data elements one by one would significantly reduce the I/O performance.
The application uses a vector <uint32_t> as a buffer for reading and writing the data.
First it reads the count.
Then it can do two things, either resize the vector if it is too small,
or increase the capacity of the vector with the reserve function.
Then it reads the record into the vector.
Then it writes the count and writes the data.
The question is about resizing the buffer.
Using a resize has the disadvantage that all new elements of the vector are initialized,
which is an unnecessary operation, because these values will be overwritten immediately
by the subsequent read operation.
Alternatively, the vector size is kept at 1, and only the capacity of the vector is increased.
The address of the first element of the vector and the size of the record is supplied
to the read function and after that to the write function.
This means data elements of the vector are used as a buffer that do not belong to its defined size.
However, the standard says that all data elements are contiguous in memory,
so, this means that also the reserved space must be contiguous.
So, logically, there should not be a problem to use this reserved space in this way.
Is there a reason to think that there are environments were this does not work?
Is there another way to resize the buffer, without initializing all new elements?