reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st

S

someone

What Pavel meant was thet you have:

  char * memblock;
  // ...
  if (infile.is_open())
  {
    // ...
    memblock = new char [fsize];
    // ...
  }
  // ...
  delete[] memblock;
  return 0;

}

infile is a pointer, which is left uninitialized in the point where it is
defined. It is initialized only inside the if branch, but it is not
guaranteed that you enter the if branch always. If you don't, delete[]
gets a random value which is UB which might or might not crash your
application, or crash only when installed on the customer machine, for
example.

Aah, thanks... You're both right... For now I just made a flag so in
the end I write:

if (delete_memblock==true) // true if allocated
delete[] memblock;
Suggesting to use std::vector<char> instead of char* pointer and array
new, this avoids a potential for such kind of bugs. There are very few
situations which warrant the array new construct, most probably you will
never encounter one.

I'm actually a bit more confident in ordinary C than in C++ but I want
to improve my object-oriented skills. I think I understand the idea of
using std::vector<char> but I don't understand how to use "array
new" ?

In any case, for now I think the delete_memblock==true flag should
work. I have a more important issue about vectors and will write a new
post explaining it within a few minutes... Once this issue is done,
the program version 1.0 would work, I think :)
 
S

someone

  »seekg« seems to be defined in ISO/IEC 14882:2011
  (in 27.7.2.3p40) using »pubseekoff« , which is defined
  (in 27.6.3.2.2p2) in terms of »seekoff« , which is defined
  (in 27.6.3.4.2p3) based on »::std::fseek« for file-base streams
  (in 27.9.1.5p13); and ISO/IEC 9899:1990 explains about »fseek«:

      »A binary stream need not meaningfully support
      fseek calls with a whence value of SEEK_END.«

What does this mean?
 
J

Jorgen Grahn

On 10/15/11 6:43 PM, Martin Jørgensen wrote: ....
Your data in the file is little endian. You have a problem if your
processor is not little endian.

The usual interpretation is that this means he *does* have a problem.
Good code is supposed to have the same behavior no matter what
endianness the processor happens to have.

/Jorgen
 
S

someone

On 10/15/11 7:32 PM, someone wrote: ....
Last question for today and I shall soon close the thread: Can I ask
you which of the following you think is the best to use (or suggest
alternatives)?
   if (0) // my way of switching between alternate "solutions" whenI
want both in the code
      R1_begin = *(reinterpret_cast<unsigned char*>  (&memblock[0]));//,
1, 'uint32',endianNess);
   else
      memcpy(&R1_begin,&memblock[0], sizeof(R1_begin));

The first doesn't do what you want. &memblock[0] is already a char* so
casting it to unsigned char* and dereferencing isn't going to make a big
change. If you did a cast to unsigned int*, that would be different.

Ok, it was probably a coincidence that it worked... It sounds right,
what you write. Thanks.
You also shouldn't use reinterpret_cast here, static_cast is good
enough, and is actually what you want.

I ended up doing this (which I find is quite good, I stole some part
of the idea from the internet):

//=================================
template <class T>
void moveBytes(T *destination, const char *source, const unsigned int
multiplier=1)
{
// cout << "sizeof(*destination) = " << dec << sizeof(*destination)
<< endl;
memcpy(destination, &source[memLocation],
(multiplier*sizeof(*destination)));
memLocation += (multiplier*sizeof(*destination)); // memLocation =
global variable/counter!
}
//=================================

This allows me to extract different data types from memory and do,
e.g.:

unsigned int R1_begin;
moveBytes(&R1_begin, memblock); // memLocation automatically
increases for next time

unsigned int * R2_SensNum;
R2_SensNum = new unsigned int[R2_N];
moveBytes(R2_SensNum, memblock, R2_N);

etc... And I think it works for all data types: floats, doubles,
uint32, int32, whatever...

The memcpy is better, as the cast has the danger that some fields might
not line up on the right word boundaries in your data buffer. It also
says you can later change them to the network order calls later to make
your program more portable.

Ok, memcpy it is :)

Now, I have one last serious issue with my program before it works...
I sucessfully loaded all the "header"-information by using the
moveBytes-template (see above). After the header, follows a long loop
of consecutive data records so I do something like this (short
version, some irrelevant code parts is omitted):


//=================================
RTSensor_array = new float[R2_N]; // should be array of size R2_N!
vector<float> RTSensorReading[R2_N]; // R2_N = 10 (this is the sensor
array length)

while ( memLocation < fsize ) // fsize was the filesize, memLocation
is also seen above (global)
{
curLoopNumber++; // keep track of number of records (for debugging)

moveBytes(&myuint_begin, memblock); // this works, read 4 bytes!
moveBytes(&float_time, memblock); // this works, read 4 bytes!
moveBytes(&myuint_end, memblock); // this works, read 4 bytes!
moveBytes(&RTSensor_array, memblock); // issue: much longer than 4
bytes, i.e. 40++ bytes

RTbegin.push_back( myuint_begin ); // this works, save 4 bytes!
RTtime.push_back( float_time ); // this works, save 4 bytes!
RTend.push_back( myuint_end ); // this works, save 4 bytes!
RTSensorReading.push_back( RTSensor_array ); // <==== ISSUE, not
working, trying to save 40 bytes!
}
//=================================

The last line gives this compiler error:
error: request for member ‘push_back’ in ‘RTSensorReading’, which is
of non-class type ‘std::vector<float> [(((long unsigned int)(((long
int)R2_N) + -0x00000000000000001)) + 1)]’


So... In order to make it more clear: For each time-step (e.g.
time={0,1,2,3,4...} seconds), maybe 10 or 20 sensors values is stored.
Because float is 4 bytes, let's assume that I want to read 10 sensors,
e.g. RTSensor_array = 10x4 = 40 bytes. For each consecutive data
record, I have some temporary variables, i.e. the first 4 moveBytes-
lines: unsigned int myuint_begin, float float_time, unsigned int
myuint_end, vector<float> RTSensor_array.

My question: What is the most efficient way of storing this float
array using C++ ?

Maybe later I want to access sensor number 4 (index 4 in the
RTSensor_array) as a function of the time which is saved in
vector<float> RTtime;

I hope the question is understandable and else I'll try to explain it
better... The vector issue is quite essential to C++ so I hope I get
some helpful answers/hints which I can learn a great deal from. Thanks
so far!
 
S

someone

someone said:
For now I just made a flag so in
the end I write:
 if (delete_memblock==true) // true if allocated
     delete[] memblock;

This is better, but kind of convoluted; instead you can initialize your
pointer appropriately (this is now really C-style):

 char * memblock = NULL;

 delete[] NULL; is well-defined and legal (and does nothing), so there is
no need to introduce a new variable.

Ah, thanks. This is nicer...
This is still fragile in the sense that you have to remember to call
delete[] and if you later decide to throw some exception from inside the
function you might have a memory leak. std::vector is much better and
avoids all these issues. Refer to your C++ book for details.

Ok, got it.
I think I understand the idea of
using std::vector<char> but I don't understand how to use "array
new" ?

The how comes you used it in your original post:

memblock = new char [fsize];

Ah, ok. Understood... Thank you, sir! :)
 
R

Richard Damon

The usual interpretation is that this means he *does* have a problem.
Good code is supposed to have the same behavior no matter what
endianness the processor happens to have.

/Jorgen

Only so far as he need to confirm that his machine is little endian.
Good code runs reliably on the machines it is targeted, and minimizes
the possible issues in moving it to other machines it might possible by
run on.

If the program is know to only need to be run on a limited set of
machines (like it is a windows program), than assuming properties common
to the windows platform isn't totally bad. Later version of windows are
very unlikely to switch endianness without other major issues coming up.

Him move to having a template function move a given type of data is an
improvement, as it localizes the assumption. I would not at this point
rewrite those functions to be endian neutral.
 
I

Ian Collins

Only so far as he need to confirm that his machine is little endian.
Good code runs reliably on the machines it is targeted, and minimizes
the possible issues in moving it to other machines it might possible by
run on.

If the program is know to only need to be run on a limited set of
machines (like it is a windows program), than assuming properties common
to the windows platform isn't totally bad. Later version of windows are
very unlikely to switch endianness without other major issues coming up.

That assumption caught out a lot of SunOS programmers who thought their
code would only run on 68K or Sparc. Then Sun started producing x86
machines...
Him move to having a template function move a given type of data is an
improvement, as it localizes the assumption. I would not at this point
rewrite those functions to be endian neutral.

Fair enough.
 
S

someone

What does this mean?

Some googling produced:

"In addition, footnote 234 of Section 7.19.3 of [ISO/IEC 9899:1999] has
this to say:

Setting the file position indicator to end-of-file, as with fseek(file,
0, SEEK_END), has undefined behavior for a binary stream (because of
possible trailing null characters) or for any stream with state-dependent
encoding that does not assuredly end in the initial shift state."

I guess this has something to do with the implementations where files are
stored in fixed-size records. Probably seeking to SEEK_END may not be
able to find the actual end of the file in the middle of the record?

The POSIX manpage for fseek also contains an interesting section:

"If the stream is to be used with wide-character input/output functions,
the application shall ensure that offset is either 0 or a value returned
by an earlier call to ftell() on the same stream and whence is SEEK_SET."

Uh, I still don't get it...
On my system, fseek gives no problem. I get the correct file length...

Should I worry or change anything?
 
S

someone

someone said:
I ended up doing this (which I find is quite good, I stole some part
of the idea from the internet):
//================================template <class T>
void moveBytes(T *destination, const char *source, const unsigned int
multiplier=1)
{
  //  cout << "sizeof(*destination) = " << dec << sizeof(*destination)
<< endl;
  memcpy(destination, &source[memLocation],
           (multiplier*sizeof(*destination)));

ok, this should work, assuming that the endiannesses match.

My attempts to verify the code has so far worked with the above
template :)
And memLocation is a global or static variable? In the long term this
will hurt you. In C++ if one wants to keep a state one creates a class
for that.

It is currently a global variable, declared in the top of my *.cpp
file (right below the #include's and just before any real code
begins).
I just finished the program and I think everything works (although it
is pretty ugly code). I'm currently transforming the code into a
class, because I totally agree with you (that this is a bit ugly)...

Now, I have one last serious issue with my program before it works...
//================================ RTSensor_array = new float[R2_N];
// should be array of size R2_N!
 vector<float> RTSensorReading[R2_N]; // R2_N = 10 (this is the sensor
array length)

This should be:

vector<float> RTSensorReading(R2_N);

However, notice that push_back() would then append the 11-th etc.
elements here, probably you want to start with an empty vector instead:

vector<float> RTSensorReading;

Ah, thanks... Because I had to use my memcpy-template I ended up with
the below code (nothing else worked, by googling it seemed to me like
I could not push an array of floats into the vector so now I push a
vector into another vector instead and it seems to work):

//------------------
vector<float> RTSensorReading; // create empty vector of floats
for (int a=0; a<R2_N; a++)
{
moveBytes(&float_read, memblock); // read
RTSensorReading.push_back( float_read ); // save
}
if (RTSensorReading.size() != R2_N)
cout << "... warning message" << endl;

....

RTSensor.push_back( RTSensorReading ); // save vector of floats
//------------------

Where the last line/variable is declared:

vector< vector<float> > RTSensor;


Everything seems to work now! The binary reader program went through
15000 records without complaining about inconsistent record length and
that indicates that everything works... I've already began to rewrite
the code into a class and in a few days, I'll take the read input data
and plot it (I successfully managed to use mathgl to plot some data a
few days ago, so that's the end of the story for now) :)

Thank you very much to everybody who helped!

If there are more suggestions/remarks, I'll still read them and maybe
reply :)
 
S

someone

That assumption caught out a lot of SunOS programmers who thought their
code would only run on 68K or Sparc.  Then Sun started producing x86
machines...


Fair enough.


But ofcourse, *if* it's easy to change the template:
//------------------
template <class T>
void moveBytes(T *destination, const char *source, const unsigned int
multiplier=1)
{
// cout << "sizeof(*destination) = " << dec << sizeof(*destination)
<< endl;
memcpy(destination, &source[memLocation],
(multiplier*sizeof(*destination)));
memLocation += (multiplier*sizeof(*destination));
}
//------------------

to be endian neutral, then I would be interested in seeing/hearing
more about that.

However, I don't think I'll need it for this project (it would only be
good to know, for my own interests sake)... Most Windows and Linux
pc's are little-endian, I guess? (I'm on ubuntu linux and I think the
binary file I'm reading came from a windows pc)..
 
R

Richard Damon

That assumption caught out a lot of SunOS programmers who thought their
code would only run on 68K or Sparc. Then Sun started producing x86
machines...


Fair enough.


But ofcourse, *if* it's easy to change the template:
//------------------
template<class T>
void moveBytes(T *destination, const char *source, const unsigned int
multiplier=1)
{
// cout<< "sizeof(*destination) = "<< dec<< sizeof(*destination)
<< endl;
memcpy(destination,&source[memLocation],
(multiplier*sizeof(*destination)));
memLocation += (multiplier*sizeof(*destination));
}
//------------------

to be endian neutral, then I would be interested in seeing/hearing
more about that.

one endian neutral way to write this for specific types would be:

void moveByte<int32_t>(int32_t *destination, const char *source, const
unsigned int multiplier=1){
unsigned char* sour = static_cast<unsigned char*>(source);

for(int i=0; i<multiplier; i++){
*destination++ = (static_cast<int32_t>(sour[3])<<24) |
(static_cast<int32_t>(sour[2])<<16) | (static_cast<int32_t>(sour[1])<<8)
| (static_cast<int32_t>(sour[0]));
sour += 4;
}
memLocation += multiplier*sizeof(*destination);
}

However, I don't think I'll need it for this project (it would only be
good to know, for my own interests sake)... Most Windows and Linux
pc's are little-endian, I guess? (I'm on ubuntu linux and I think the
binary file I'm reading came from a windows pc)..

The processor used by Windows, and most Linux installations is the
"Intel x86 compatible processor" which is little endian. Linux is
available on both endian processors though. The fact that Unix lived on
both endians is one reason there is the network order libraries, as they
were helpful in writing portable libraries to handle network traffic.
 
R

Richard Damon

Thank you very much to everybody who helped!

If there are more suggestions/remarks, I'll still read them and maybe
reply :)

One comment, looking at the processing you are doing, there appears to
be no reason you needed to read the whole file into memory to begin
with. For each step where you are getting n bytes from your memory
buffer, you could just as easily read those n bytes from the file.

You could then encapsulate the file in a class, with members to extract
the various fundamental types, localizing the details of any format
conversions that might be needed, things like endianness and floating
point formats.
 
S

someone

But ofcourse, *if* it's easy to change the template:
//------------------
template<class T>
void moveBytes(T *destination, const char *source, const unsigned int
multiplier=1)
{
   //  cout<<  "sizeof(*destination) = "<<  dec<<  sizeof(*destination)
<<  endl;
   memcpy(destination,&source[memLocation],
            (multiplier*sizeof(*destination)));
   memLocation += (multiplier*sizeof(*destination));
}
//------------------
to be endian neutral, then I would be interested in seeing/hearing
more about that.

one endian neutral way to write this for specific types would be:

void moveByte<int32_t>(int32_t *destination, const char *source, const
unsigned int multiplier=1){
   unsigned char* sour = static_cast<unsigned char*>(source);

   for(int i=0; i<multiplier; i++){
     *destination++ = (static_cast<int32_t>(sour[3])<<24) |
(static_cast<int32_t>(sour[2])<<16) | (static_cast<int32_t>(sour[1])<<8)
| (static_cast<int32_t>(sour[0]));
     sour += 4;
   }
   memLocation += multiplier*sizeof(*destination);

}

Ah, ok... It's a shame that by saying that this is only valid for
int32_t then the "template-effect" of being able to work on many
different datatypes is lost. I had this sizeof(*destination) so the
template worked for 16bit, 32 and 64 bit data types. But if this is
how to do it, then I suppose that this is how to do it :)
The processor used by Windows, and most Linux installations is the
"Intel x86 compatible processor" which is little endian. Linux is
available on both endian processors though. The fact that Unix lived on
both endians is one reason there is the network order libraries, as they
were helpful in writing portable libraries to handle network traffic.

Ok... Maybe I should have a look at that network stuff some day... I
have access to real unix-machines and then they're probably big-endian
and then I would probably get a problem... Most of the time, however I
use intel x86 processors...

Thanks! :)
 
S

someone

One comment, looking at the processing you are doing, there appears to
be no reason you needed to read the whole file into memory to begin
with. For each step where you are getting n bytes from your memory
buffer, you could just as easily read those n bytes from the file.

Yes, I found out that too late. There was an excellent piece of code I
could have stolen from google if I didn't read everything into memory
before processing the data :)
You could then encapsulate the file in a class, with members to extract
the various fundamental types, localizing the details of any format
conversions that might be needed, things like endianness and floating
point formats.

Sounds like a good idea... I've made a structure instead of a class
(without endianness stuff, currently). I haven't made a specific class
for the various fundamental types yet, but maybe I should try that
whenever I get some time... I'm currently moving the class (structure)
out to separate *.cpp and *.h files so I can include it in other
source codes... I did it some years ago, but then I stopped
programming for quite some time and now I have to refresh my memory...
But I think I'll succeed with this - google helps me a lot :)

Thanks. Within a few days, I'll plot the data and make sure it is like
I expect. If I get into new problems, maybe I should create a new
thread/post.

My main issue in this thread has been resolved now :)
 
R

Richard Damon

But ofcourse, *if* it's easy to change the template:
//------------------
template<class T>
void moveBytes(T *destination, const char *source, const unsigned int
multiplier=1)
{
// cout<< "sizeof(*destination) = "<< dec<< sizeof(*destination)
<< endl;
memcpy(destination,&source[memLocation],
(multiplier*sizeof(*destination)));
memLocation += (multiplier*sizeof(*destination));
}
//------------------
to be endian neutral, then I would be interested in seeing/hearing
more about that.

one endian neutral way to write this for specific types would be:

void moveByte<int32_t>(int32_t *destination, const char *source, const
unsigned int multiplier=1){
unsigned char* sour = static_cast<unsigned char*>(source);

for(int i=0; i<multiplier; i++){
*destination++ = (static_cast<int32_t>(sour[3])<<24) |
(static_cast<int32_t>(sour[2])<<16) | (static_cast<int32_t>(sour[1])<<8)
| (static_cast<int32_t>(sour[0]));
sour += 4;
}
memLocation += multiplier*sizeof(*destination);

}

Ah, ok... It's a shame that by saying that this is only valid for
int32_t then the "template-effect" of being able to work on many
different datatypes is lost. I had this sizeof(*destination) so the
template worked for 16bit, 32 and 64 bit data types. But if this is
how to do it, then I suppose that this is how to do it :)

This isn't the only way, but is a straight forward way. If you look
closely at the code, you may be able to convert the fetch into a length
independent routine. Writing it out discreetly for a single case does
make it clear to read in my opinion.

One thing to note, is that when reading floating point numbers, there is
more to be concerned about than endianness, as the exact placement of
bits is not specified by the C standard. Again, the fact the the two
machines are likely on the same family of processors lets you ignore
some of the thornier issues.
 
J

Jorgen Grahn

Only so far as he need to confirm that his machine is little endian.
Good code runs reliably on the machines it is targeted, and minimizes
the possible issues in moving it to other machines it might possible by
run on.

I disagree, although I won't insist.

For two reasons:

- Doing binary I/O by casting various types (int, structs) gets you
into trouble sooner or later. Learning to do it right is a good
investment.

- I've spent months of my life, recently, fixing subtle endian bugs
caused by reasoning like the above. And I cursed the original
developers all the way. This is one of those things that is much,
much easier to get right from the start compared to fixing it up
afterwards.
If the program is know to only need to be run on a limited set of
machines (like it is a windows program), than assuming properties common
to the windows platform isn't totally bad. Later version of windows are
very unlikely to switch endianness without other major issues coming up.

Yes, because people do things like this ... Unix software manages to
be portable to many different architectures without any major effort.
Him move to having a template function move a given type of data is an
improvement, as it localizes the assumption. I would not at this point
rewrite those functions to be endian neutral.

Yes, isolating it to one place (and perhaps adding a TODO) is a good
start.

/Jorgen
 
J

Joshua Maurice

On 10/16/11 11:39 AM, Martin Jørgensen wrote:
On Oct 15, 11:35 pm, Richard Damon<[email protected]>
wrote:
On 10/15/11 3:00 PM, someone wrote:
==
I'm stuck...
And what about big endian / little endian? I'm confused. Looking
forward to hear from you!
First, what is the first byte in the file? you are saying that you are
expecting it to be 72 but you are getting 49. Have you looked at the
file..
Good point. The first four bytes of the input file has the values:
[048h 00h 00h 00h] (hexadecimal, I used a hex editor to see this). And
0x48H = 72 decimal... I don't even know if this is little endian or
not, but what I find very strange is the number 49 then... That number
makes no sense to me (why is it adding 1, if it's printing the number
in hexadecimal? What's it thinking?)..?
If you are in a 32 bit little endian (x86) machine, this will work:
#include <iostream>
int main()
{
char tmp[4] = {0x48,0,0,0};
std::cout << *reinterpret_cast<int*>(tmp) << std::endl;
return 0;
}
Formally isn't that invoking Undefined Behaviour?

3.10/10

"If a program attempts to access the stored value of an object through a
glvalue of other than one of the
following types the behavior is undefined:52
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the
dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned
types among its elements or nonstatic
data members (including, recursively, an element or non-static data
member of a subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the
dynamic type of the object,
— a char or unsigned char type."

Yep, I was about to mention this too. Let me reproduce the specific
code fragment:
char tmp[4] = {0x48,0,0,0};

std::cout << *reinterpret_cast<int*>(tmp) << std::endl;

You are reading a char object (or char array, whatever) through an int
lvalue. This is undefined behavior.

For example, under the default compiler options for gcc, this can blow
up in your face. In practice, gcc sometimes reorders or omits writes
and reads between improperly aliased objects, resulting in surprising
values.

In this case, I'd suggest simple bit shifting. Otherwise, if you /
really/ need to do this, then do your reads and writes through char*
or unsigned char*, or use memcpy, which all have a "get out of jail
free card" from the aliasing rules. Ex:
char tmp[4] = {0x48,0,0,0};
int x;
static_assert(sizeof(x) == sizeof(tmp));
memcpy(&x, tmp, sizeof(tmp);
std::cout << x << std::endl;
Alternatively:

char tmp[4] = {0x48,0,0,0};
int x;
static_assert(sizeof(x) == sizeof(tmp));
for (int i=0; i<sizeof(tmp); ++i)
reinterpret_cast<char*>(&x) = tmp;
std::cout << x << std::endl;
 
J

Joshua Maurice

On 16/10/2011 19:59, Leigh Johnston wrote:
On 16/10/2011 00:33, Ian Collins wrote:
On 10/16/11 11:39 AM, Martin Jørgensen wrote:
On Oct 15, 11:35 pm, Richard Damon<[email protected]>
wrote:
On 10/15/11 3:00 PM, someone wrote:
==
I'm stuck...
And what about big endian / little endian? I'm confused. Looking
forward to hear from you!
First, what is the first byte in the file? you are saying that youare
expecting it to be 72 but you are getting 49. Have you looked at the
file..
Good point. The first four bytes of the input file has the values:
[048h 00h 00h 00h] (hexadecimal, I used a hex editor to see this). And
0x48H = 72 decimal... I don't even know if this is little endian or
not, but what I find very strange is the number 49 then... That number
makes no sense to me (why is it adding 1, if it's printing the number
in hexadecimal? What's it thinking?)..?
If you are in a 32 bit little endian (x86) machine, this will work:
#include <iostream>
int main()
{
char tmp[4] = {0x48,0,0,0};
std::cout << *reinterpret_cast<int*>(tmp) << std::endl;
return 0;
}
Formally isn't that invoking Undefined Behaviour?

"If a program attempts to access the stored value of an object through a
glvalue of other than one of the
following types the behavior is undefined:52
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the
dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned
types among its elements or nonstatic
data members (including, recursively, an element or non-static data
member of a subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the
dynamic type of the object,
— a char or unsigned char type."

Yep, I was about to mention this too. Let me reproduce the specific
code fragment:
char tmp[4] = {0x48,0,0,0};
std::cout << *reinterpret_cast<int*>(tmp) << std::endl;

You are reading a char object (or char array, whatever) through an int
lvalue. This is undefined behavior.

For example, under the default compiler options for gcc, this can blow
up in your face. In practice, gcc sometimes reorders or omits writes
and reads between improperly aliased objects, resulting in surprising
values.

In this case, I'd suggest simple bit shifting. Otherwise, if you /
really/ need to do this, then do your reads and writes through char*
or unsigned char*, or use memcpy, which all have a "get out of jail
free card" from the aliasing rules. Ex:
char tmp[4] = {0x48,0,0,0};
int x;
static_assert(sizeof(x) == sizeof(tmp));
memcpy(&x, tmp, sizeof(tmp);
std::cout << x << std::endl;
Alternatively:

char tmp[4] = {0x48,0,0,0};
int x;
static_assert(sizeof(x) == sizeof(tmp));
for (int i=0; i<sizeof(tmp); ++i)
  reinterpret_cast<char*>(&x) = tmp;
std::cout << x << std::endl;


Oh, of course this can still crash and burn if you write a trap
representation. char and unsigned char are guaranteed to not have trap
representations, but int is not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,815
Members
47,361
Latest member
RogerDuabe

Latest Threads

Top