htons, htonl, ntohs, ntohl

S

Stephen Sprunk

Stephen Sprunk said:
Well, other things can be extremely fast too. Compilers can
optimize the p[0] + 256*p[1] example I gave in one of the
earlier threads, and I wouldn't be surprised if someone told me
they already do.

I've seen GCC on x86 recognize and replace the shift/or idiom with
a load (and byteswap, if applicable). It might also
strength-reduce the multiply/add version and then recognize the
idiom, but the code would be clearer to human readers if you used
the idiom in the first place; that's the point of idioms.

I'm puzzled as to why Jorgen's expression isn't idiomatic as it
stands.

p[0] + 256 * p[1]
p[0] + (p[1] << 8)
p[0] | p[1] << 8

Is any one of these more or less idiomatic than the others?

The third one is idiomatic; the others are not, even if they likely
produce the same result.
That aside, the latter two should be faster if the compiler does
nothing clever.

That's probably part of why it became the idiomatic form: it dates from
back before compilers had optimizations like strength reduction.

Beyond that, byte shuffling is obviously a bitwise operation, not an
arithmetic one, so bitwise operators seem more appropriate anyway.

S
 
J

Jorgen Grahn


You mean yourself. Well, it /looked/ like that to me. IMO you should
have continued in the original thread (possibly changing the subject
line).

Even though you did change the topic somewhat, it was still in the
same area ("foreign" binary data versus C objects), and
the responses tended to overlap.

/Jorgen
 
J

James Kuyper

You mean yourself. Well, it /looked/ like that to me. IMO you should
have continued in the original thread (possibly changing the subject
line).

Thunderbird shows me only a single thread on this topic; Google Groups
agrees. Your message headers indicate that you're using slrn, while
James Harris' indicate that he's using Microsoft Outlook Express.
Perhaps it's a difference between the different newsreaders?
 
J

James Harris

James Kuyper said:
Thunderbird shows me only a single thread on this topic; Google Groups
agrees. Your message headers indicate that you're using slrn, while
James Harris' indicate that he's using Microsoft Outlook Express.
Perhaps it's a difference between the different newsreaders?

No, they were separate threads, and deliberately so. One was about alignment
and two about different endianness issues. I posted that way because they
are separate topics. In particular, alignment is not the same as endianness.
Maybe it's a preference thing. I don't like to see a single thread change
topic over time. It makes it hard to refer back to. I take Jorgen's point,
though, that some of the replies could end up overlapping. Though I don't
agree with his characterisation that one thread was restarted three times!

James
 
J

Jorgen Grahn

Stephen Sprunk said:
On 27-Aug-13 08:54, Jorgen Grahn wrote:
Well, other things can be extremely fast too. Compilers can
optimize the p[0] + 256*p[1] example I gave in one of the
earlier threads, and I wouldn't be surprised if someone told me
they already do.

I've seen GCC on x86 recognize and replace the shift/or idiom with
a load (and byteswap, if applicable). It might also
strength-reduce the multiply/add version and then recognize the
idiom, but the code would be clearer to human readers if you used
the idiom in the first place; that's the point of idioms.

I'm puzzled as to why Jorgen's expression isn't idiomatic as it
stands.

p[0] + 256 * p[1]
p[0] + (p[1] << 8)
p[0] | p[1] << 8

Is any one of these more or less idiomatic than the others?

The third one is idiomatic; the others are not, even if they likely
produce the same result.
That aside, the latter two should be faster if the compiler does
nothing clever.

That's probably part of why it became the idiomatic form: it dates from
back before compilers had optimizations like strength reduction.

Yes ... I remember when I used to write x<<3 instead of x*8 when doing
pure arithmetic, because people said it was faster.
Beyond that, byte shuffling is obviously a bitwise operation, not an
arithmetic one, so bitwise operators seem more appropriate anyway.

It can be seen as something else than byte shuffling. You can say
"a big-endian unsigned 16 bit value is formed by a + 256*b, where a
and b are in [0..256)" and a lot of people would be comfortable seeing
it that way.

Of course a lot of people (including me) are slightly more comfortable
with shifts ...

/Jorgen
 
S

Stephen Sprunk

That aside, the [shift/or form] should be faster if the compiler
does nothing clever.

That's probably part of why it became the idiomatic form: it dates
from back before compilers had optimizations like strength
reduction.

Yes ... I remember when I used to write x<<3 instead of x*8 when
doing pure arithmetic, because people said it was faster.

Back then, it probably was faster. Unfortunately, it didn't always give
the correct result when applied to negative values, which was a problem.

Today, I would fail code if bitwise operators are used for arithmetic
computations; the compiler can be assumed to apply strength reduction
when/if safe, so it is more important for code to be easily understood
by its _human_ audience, i.e. the guy who will have to maintain your
code years in the future.
Beyond that, byte shuffling is obviously a bitwise operation, not
an arithmetic one, so bitwise operators seem more appropriate
anyway.

It can be seen as something else than byte shuffling. You can say "a
big-endian unsigned 16 bit value is formed by a + 256*b, where a and
b are in [0..256)" and a lot of people would be comfortable seeing it
that way.

Yes, that is technically valid, but I've never seen it presented that
way before. Shift/or is the accepted idiom, and outside of obfuscated
code contests, idioms should be preferred.
Of course a lot of people (including me) are slightly more
comfortable with shifts ...

Perhaps it's just how I was taught, but bitwise operators are applied to
strings of bits while arithmetic operators are applied to numbers. Even
though I know numbers are _represented_ by strings of bits, they are
completely different conceptual domains to me.

When I think of moving something from bits 7..0 to bits 15..8, that is a
bitwise operation, so I use a bitwise operator. Even just reading your
example with arithmetic operators, it took several seconds for me to
figure out what you were doing--and it gave me a headache.

S
 
J

Joe Pfeiffer

Not _really_. The man page on my computer says: "These routines
convert 16 and 32 bit quantities between network byte order and host
byte order.". But a value doesn't have a byte order. Only it's
representation has a byte order. Both ntohl and htonl convert values
to values. Now on every computer I have ever used, ntohl and htonl
perform exactly the same operation, so using the wrong one is no
problem. But what if a computer has a byte order where that isn't the
case, for example where 0x01020304 is stored as four bytes 2, 3, 4,
1?

OK, your quibble about the difference between a value and its
representation is noted and you are correct, but I'd be very surprised
if anybody out there has ever been confused by it.

If the host computer uses some weird order like you suggest, then the
routines should do the right thing; in this case, ntohl and htonl
wouldn't do the same thing any more, and a program that had a
long-standing bug of using the wrong one would start messing up.
Program bugs existing but not getting triggered under the existing
environment is unfortunate, but it happens.

A story I've told on myself many times is that for many years I thought
an empty string could be represented by a null pointer -- because I was
doing all my development on a VAX, and the byte at address 0 was
user-readable, and happened to contain a 0. When I moved to Sun
workstations, I had a *lot* of code start getting segfaults. The bugs
had always been there, they just hadn't been triggered.
These functions should really have been defined in terms of a 32 bit
value vs. an array of four bytes, not in terms of two 32 bit values.

Why? They convert a value on your host into something that can be
passed to the various functions that need network order, and back.
Really, once you've got something in network order you should treat it
as opaque, and convert it back if you want to look at it. I don't see
any particular advantage to converting to an array of four bytes, and of
course there is the potential problem of hosts whose byte isn't eight
bits.
 
J

Jorgen Grahn

.
anything). Note that this was in 1987. Today the only effect of
"register" is that you can't use the address operator anymore.

Yes -- if you see 'register' today you assume the code is very old,
or if it's not, that the programmer is.

/Jorgen
 
J

Jorgen Grahn

No, they were separate threads, and deliberately so. One was about alignment
and two about different endianness issues. I posted that way because they
are separate topics. In particular, alignment is not the same as endianness.
Maybe it's a preference thing. I don't like to see a single thread change
topic over time. It makes it hard to refer back to. I take Jorgen's point,
though, that some of the replies could end up overlapping.

Yes: alignment and endianness problems both have the same solution
IMO, so I (and others, I think) ended up repeating myself in at least
two of the threads.
Though I don't
agree with his characterisation that one thread was restarted three times!

Right, that was too strongly stated by me. You aimed for clarity.

/Jorgen
 
J

Jorgen Grahn

(e-mail address removed) writes: ....

Why? They convert a value on your host into something that can be
passed to the various functions that need network order, and back.
Really, once you've got something in network order you should treat it
as opaque, and convert it back if you want to look at it.

I suppose he's saying that if the "various functions that need network
order" didn't, there would be no need for htons & friends. (And no
need to remember what's opaque and what's not).

There's a trap hidden in htons & co: you /do/ need them to handle
endianness in the BSD socket API ... but they are not suitable for
handling general endianness issues, because they tend to come together
with alignment issues. E.g.

ntohl(*(int*)(buf+n));

/Jorgen
 
G

glen herrmannsfeldt

(snip)
I suppose he's saying that if the "various functions that need network
order" didn't, there would be no need for htons & friends. (And no
need to remember what's opaque and what's not).
There's a trap hidden in htons & co: you /do/ need them to handle
endianness in the BSD socket API ... but they are not suitable for
handling general endianness issues, because they tend to come together
with alignment issues. E.g.
ntohl(*(int*)(buf+n));

Yes, don't do that. If there is question about the alignment,
memcpy() to an appropriately aligned place and ntohl() that.

But that takes two statements instead of one.

-- glen
 
J

Joe Pfeiffer

Jorgen Grahn said:
I suppose he's saying that if the "various functions that need network
order" didn't, there would be no need for htons & friends. (And no
need to remember what's opaque and what's not).

You snipped some context from the post I responded to: I thought it was
pretty clear he was still talking about htons et al.

I said early in the thread that I regard the fact network order is
exposed to the user program in the networking API to be a flaw in the
API. But that's a different question from how htons and friends work.
 
J

Jorgen Grahn

(snip)




Yes, don't do that. If there is question about the alignment,
memcpy() to an appropriately aligned place and ntohl() that.

But that takes two statements instead of one.

Yes -- and that's when I prefer not to use ntohl at all.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,076
Messages
2,570,565
Members
47,200
Latest member
Vanessa98N

Latest Threads

Top