N
Nelu
2. I wouldn't do this. I'd just iterate through swapping the bytes
individually.
Slow? Maybe. But:
It could probably be made faster using a static array of unsigned chars to swap blocks if there is no need for thread safety.
It can be made thread safe using a thread local variable but it would work best (avoiding malloc for each operation) if the threads were reused. This is, however, dependent on the threading library and not on topic here.