A salutary tale about alignment and undefined behaviour

K

Kevin Bracey

Having just upgraded a compiler, I found that the Internet stack in one of
our systems no longer functioned - it appeared that the compiler was not
compiling it correctly.

Closer inspection showed that it was actually a really nasty example of
undefined behaviour being invoked.

#define ETHER_ADDR_LEN 6

struct ether_header {
u_char ether_dhost[ETHER_ADDR_LEN];
u_char ether_shost[ETHER_ADDR_LEN];
u_short ether_type;
};

struct sockaddr {
u_short sa_family;
char sa_data[14];
};

{
u_char edst[ETHER_ADDR_LEN];
struct sockaddr *dst;
struct ether_header *eh;
//...

eh = (struct ether_header *)dst->sa_data;
memcpy(edst, eh->ether_dhost, sizeof (edst));
//...
}

This always used to work, despite there being an alignment problem - all
structures on this platform are word (32-bit) aligned, so eh is
incorrectly aligned, invoking undefined behaviour.

Looking at it, it's not clear how it could actually go wrong - you'd
instinctively assume that memcpy will work regardless of the alignment.

But the new compiler recognised memcpy, and decided it might like to inline
it. Further, it knew that the destination was a local array, and thus was
word-aligned on the stack with two bytes of padding after it. It further
"knew" that the source was at the start of a structure, and hence was
word-aligned. It thus decided that the most efficient code to output was an
in-line two-word (8-byte) copy from eh to edst. But eh was not actually
word-aligned, so the load-word instructions failed (silently reading the
wrong data).

The whole thing just goes to show that with today's modern optimising
compilers, you never know when undefined behaviour might bite you. Even
things that instinctively "should work anyway" can go horribly wrong.
 
C

Chris Croughton

eh = (struct ether_header *)dst->sa_data;

This is the bit which bit you, not:
memcpy(edst, eh->ether_dhost, sizeof (edst));

This always used to work, despite there being an alignment problem - all
structures on this platform are word (32-bit) aligned, so eh is
incorrectly aligned, invoking undefined behaviour.

Exactly. On some machines the assignment to eh might cause the pointer
to have bits truncated, or even cause a trap because it's an invalid
pointer. Or the dereferencing as a parameter to memcpy might cause a
trap. Or demons might fly out of your nose.

In spite of it being a very common thing to do in Internet code, which
was mostly written for 'safe' machines where all that was needed was
some endian conversions (that sort of code is in several books
describing how to use sockets, for instance).
The whole thing just goes to show that with today's modern optimising
compilers, you never know when undefined behaviour might bite you. Even
things that instinctively "should work anyway" can go horribly wrong.

Optimisation can cause a lot of problems. Worse than yours is when the
compiler gets its optimisation wrong, because that can be very hard to
track down or reproduce in a small code segment to show to the
developers, especially if your code is correct and similar code works
fine in other places.

Incidentally, which compiler (and version) was it which bit you? It may
well be worth letting the distributors of it know so that they can put
out a warning, you're probably not the only user who is going to get
bitten with network stack code...

Chris C
 
S

Servé La

Kevin Bracey said:
{
u_char edst[ETHER_ADDR_LEN];
struct sockaddr *dst;
struct ether_header *eh;
//...

eh = (struct ether_header *)dst->sa_data;
memcpy(edst, eh->ether_dhost, sizeof (edst));
//...
}

This always used to work, despite there being an alignment problem - all
structures on this platform are word (32-bit) aligned, so eh is
incorrectly aligned, invoking undefined behaviour.

What exactly is wrong with the code then? Because ETHER_ADDR_LEN is not of
the correct size dst and eh get incorrectly aligned or something?
 
M

Mathew Hendry

Kevin Bracey said:
{
u_char edst[ETHER_ADDR_LEN];
struct sockaddr *dst;
struct ether_header *eh;
//...

eh = (struct ether_header *)dst->sa_data;
memcpy(edst, eh->ether_dhost, sizeof (edst));
//...
}

This always used to work, despite there being an alignment problem - all
structures on this platform are word (32-bit) aligned, so eh is
incorrectly aligned, invoking undefined behaviour.

What exactly is wrong with the code then? Because ETHER_ADDR_LEN is not of
the correct size dst and eh get incorrectly aligned or something?

The problem is the cast: it lies about the type of sa_data. I saw a similar
problem recently, in code that extracts float 4-vectors (16 bytes, 16 byte
alignment on this platform) from a data packet. The code inappropriately
casted a pointer into the packet buffer to a pointer to the vector type,
fooling the compiler into assuming 16-byte alignment, and "over-optimising"
a subsequent memcpy.

[I posted a similar reply earlier but it seems to have been lost in the
ether]

-- Mat.
 
O

Old Wolf

Kevin said:
Having just upgraded a compiler, I found that the Internet stack in one of
our systems no longer functioned - it appeared that the compiler was not
compiling it correctly.

Closer inspection showed that it was actually a really nasty example of
undefined behaviour being invoked.
struct sockaddr {
u_short sa_family;
char sa_data[14];
};
eh = (struct ether_header *)dst->sa_data;

This always used to work, despite there being an alignment problem - all
structures on this platform are word (32-bit) aligned, so eh is
incorrectly aligned, invoking undefined behaviour.

Yes, you should have alarm bells ringing in your head whenever
you see a cast to a pointer to non-char.. and fire sirens
if the cast is from a pointer to char.

Still, I would have expected a compiler to warn about that.
Investigate the available compiler warnings to see if you
can turn something on that warns about alignment problems.
 
M

Mark L Pappin

Old Wolf said:
Kevin Bracey wrote:
[snip alignment problem, masked by cast]
Still, I would have expected a compiler to warn about that.

I wouldn't. The cast says to the compiler "Don't annoy me with
warnings about this - I know exactly what I'm doing."

My ideal compiler, however, would warn about any cast (other than
members of a select group, including _some_ arguments to varadic
functions) not accompanied by a suitable explanatory comment.

mlp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top