Tim said:
Ben Bacarisse said:
Marcin Grzegorczyk said:
Tim Rentsch wrote:
<snip>
/* get the two bytes into unsigned int ux and then... */
if (ux >= 0x8000u) {
int x = 0xffffu - ux; /* 0<= x<= 32767 */
return -x - 1;
}
else return ux;
<snip>
I believe you're right, by why not just this:
/* get the two bytes into unsigned int ux and then... */
return (int)(ux & 32767) - (long)(ux & 32768);
I'd prefer not to rely on a longer type being available, especially
given the context.
Easily done --
return (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);
Well, here's one I've thought up (incidentally, just a few days before
the stuff above was posted):
if (ux & 0x8000) return (int)(ux - 0x8000) - 0x7FFF - 1;
else return (int)ux;
Or, if you prefer a single-expression version:
return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux;
[snip]
I has not thought of optimisation and it's a good point. Interestingly
gcc will optimise both mine, yours and Tim's first version to a register
move and a return. I thought mine would be the worst in this respect,
but it seems not. Tim's second version (the one that does not need a
wider signed type) was too clever for gcc.
Out of curiosity, I've checked all four variants with the IAR C/C++
Compiler for the National Semiconductor CR16C (icccr16, v.2.20.1) at
work (as you may guess, I work in embedded software development), with
the maximum optimization level (optimizing for size). Only my version
was optimized into a single register move instruction. A bit
disappointing, but not really surprising; I've already noticed that
icccr16 isn't very good at optimizing bitwise operations. A bit more
surprising was that Ben's version turned out to be least size-efficient:
the compiler generated not only the branch, but also code to calculate
the expression (-1 - (0xFFFF - ux)). All that despite the fact that
CR16C is a regular two's complement architecture.
I acknowledge that these results are significant. However....
some additional points to consider:
1. These optimizations occur (I assume) on a two's complement
implementation. The code I wrote was chosen partly because
it likely behaves well on ones' complement and sign/magnitude
implementations also. How will the other methods do on
such systems?
Well, without being familiar with any such architecture, I can only
speculate what instructions a compiler could use; but note that in order
to preserve the full range of possible values, the conversion would have
to be to a wider type anyway. In that case I guess (and it's a pretty
wild guess) your first version /could/ be a bit more efficient.
[...]
3. The field being converted may not be an exact width field (eg
short or int). If the field being converted is other than
a full-width regular integer type, then (a) the shorter version
of mine is usable, and (b) the other two, both having a test in
them, will likely have a conditional branch in the generated
code, and that's likely to mean they will be slower than the
non-branching approach.
Actually I did check what GCC does when the field is 16-bit but the
types are 32-bit (IOW, the expressions exactly as written above, with a
32-bit int). On the i386 target, your first version is then a register
copy, two ANDs and a subtraction; mine is a test, branch and a
subtraction. (However, when the target is i686 or later, the branch is
replaced by a conditional move.)
(By the way, branching is slower only on deeply pipelined machines,
especially ones with branch prediction hardware. In embedded systems,
where the CPUs tend to be much simpler, branches are often cheap.)
Sure, if you're writing for portability (which this whole thread is
about, after all), you're better off picking what you find most
readable, and not worrying too much about how compilers may optimize it.
In that case, I think each of the four versions can do.
4. Again if the field being converted is not a full-width field
(of some regular integer type), then it may be important to
ensure that only the bits of interest participate in the result.
My code does that already with its AND operations; the other
two would need an extra masking AND to suppress extra bits.
True. OTOH, when the field width does match the return type, the AND
becomes a no-op...
Having said all that, I think all three ideas have something to be
said for them, and thank you both for presenting your cases. By
the way, related to that I have a comment for Ben's method, which
if I may be allowed I will re-write just slightly (32 bit int
version):
return ux < 0x80000000 ? (int) ux : -(int)(-ux-1) - 1;
What I find nice about this way of writing it is that the
second part (ignoring the conversion to int) is
(ux + 1) - 1
or just 'ux', which gives me a way of seeing how/why it works,
and also makes it easier to see why gcc can optimize it.
Yes, I think you're right. I may try this one with icccr16, but I doubt
it would come out any better than the original.