Julie said:
Talk to Ed Nisley -- he regularly talks about bit counting in embedded systems
in his "Embedded Space" forum in DDJ.
Are you sure there's even a difference in storage requirements?
Low-level optimizations like this are usually best left to the
compiler.
Here are two inline swap functions: swap1 which uses a temporary,
and swap2 which uses the XOR trick.
inline void swap1(int& a, int &b)
{
int temp = a;
a = b;
b = temp;
}
inline void swap2(int& a, int &b)
{
b = a ^ b;
a = b ^ a;
b = a ^ b;
}
For comparison, let's define a couple of test functions which
use swap1 and swap2, respectively, but are otherwise identical.
The conditional is there to ensure that swap can't be optimized
away entirely.
int Test1(int a, int b)
{
if (a > b) swap1(a, b);
return (a + 1) * b;
}
int Test2(int a, int b)
{
if (a > b) swap2(a, b);
return (a + 1) * b;
}
Here is the x86 assembly language output my compiler produces
for these two functions:
?Test1@@YAHHH@Z PROC NEAR
; Line 17
mov eax, DWORD PTR _a$[esp-4]
mov ecx, DWORD PTR _b$[esp-4]
cmp eax, ecx
jle SHORT Test$1
mov edx, eax
mov eax, ecx
mov ecx, edx
Test$1:
; Line 18
add eax, 1
imul eax, ecx
; Line 19
ret 0
?Test1@@YAHHH@Z ENDP
?Test2@@YAHHH@Z PROC NEAR
; Line 23
mov eax, DWORD PTR _a$[esp-4]
mov ecx, DWORD PTR _b$[esp-4]
cmp eax, ecx
jle SHORT Test1$1
xor ecx, eax
xor eax, ecx
xor ecx, eax
Test1$1:
; Line 24
add eax, 1
imul eax, ecx
; Line 25
ret 0
?Test2@@YAHHH@Z ENDP
They are identical except that the first uses three register MOV
and the second three register XOR instructions to do the swap.
Neither uses any memory, although the first uses an additional
register (which is already available).
Now let's consider another variation in which we remove the
condition:
int Trivial1(int a, int b)
{
swap1(a, b);
return (a + 1) * b;
}
int Trivial2(int a, int b)
{
swap2(a, b);
return (a + 1) * b;
}
And here's the assembly output:
?Trivial1@@YAHHH@Z PROC NEAR
; Line 30
mov eax, DWORD PTR _b$[esp-4]
add eax, 1
imul eax, DWORD PTR _a$[esp-4]
; Line 31
ret 0
?Trivial2@@YAHHH@Z PROC NEAR
; Line 35
mov ecx, DWORD PTR _a$[esp-4]
mov edx, DWORD PTR _b$[esp-4]
xor edx, ecx
xor ecx, edx
mov eax, ecx
xor eax, edx
; Line 36
add ecx, 1
imul eax, ecx
; Line 37
ret 0
?Trivial2@@YAHHH@Z ENDP
What's going on here? Well, in the first case, the compiler
was able to optimize away the swap entirely and just do the
equivalent of return (b + 1) * a.
In the second case, the use of the XOR trick obscured the
intent of the code so the compiler just had to follow the
instructions literally. This is a common problem with using
tricky, low-level optimizations in source code. Compilers
can optimize better using the "as-if" rule if the effect of
the code is straightforward.
Guarding against self-swapping could be an (assumed)
precondition and not included in the code.
This places a burden on the user of the function to understand
the restruction and perform the test his/herself if necessary.
It is an error waiting to happen.
Just trying to think outside of the box here, and not limiting
it to a 'there is no good reason to do it' mentality.
There's a natural tendency to be seduced by clever tricks that
seem to promise a tiny gain in efficiency. Given that, it seems
to me that "thinking outside the box" involves taking into
account other factors that may not be so obvious. Let's weigh
the pros and cons in this case, shall we?
Pros:
* May save one int of memory in rare cases (i.e., where
there are no available registers)
Cons:
* Obfuscates the source code.
* Is less general (e.g., doesn't work for UDTs).
* Intruduces potential obscure bugs due to self-swap.
* Undermines compiler optimizations.
It seems to me that in this case there really is no good
reason to do it. That's not a "mentality" but a considered
judgment.