Ping - Other Misleading Non-Bugs

published: [nandalism home] (dark light)

Network Code and Byte Order

skip this section if you know what htonl() and ntohl() are for.

If you've ever done low-level, network programming, you will be aware of endianness (or the different order in which different machines store the bytes of their integers). Since networks send data from one machine to another, these differences become important.

A common beginner's mistake is to read an integer from a network-read byte buffer like this:

uint32_t u;
byte *p = ...

u = *(uint32_t*)(p); // (1) this will cause a bus error on CPU's which don't allow misaligned memory reads (but works on intel)
memcpy(&u, p, sizeof(u)); // (2) this solves the alignment problem. All is well?

Both of these solutions may work fine on your machine. It is only when you start testing your network code on machines with another CPU (alignment errors) or endianness that things break.

With endianness/byte-order differences you will put in a uint32_t on one end, say 16909060, and it will come out on the other machine as 67305985. This seems apparently random until you look at these numbers in hex. 0x1020304 and 0x4030201. See the difference? The bytes are reversed. This is endianness. It depends on the CPU running on the machine you are using.

The solution is to use the network/host conversion functions which are written on each machine to convert between network order (always big endian, for historic reasons) and native order (which may be big or little endian). These functions may be identity functions (on big endian machines) or may actually reverse the byte order (on little endian machines like intel). They are alway just synonyms but the 2 names document what we are doing, host->network or network->host.

// from network to host/machine
memcpy(&u, p, sizeof(u)); // this solves the alignment problem. All is well?
u = ntohl(u);

// from host/machine to network
u = htonl(u);
memcpy(p, &u, sizeof(u)); p+=sizeof(u);

There is another, better solution, than having htonl() and ntohl(), which is portable and agnostic of CPU architecture (see later).

That was all a preamble to this bug I suspected in linux iputils::ping source code.

Linux IPUtils Bug?

On linux the iputils package contains the source code for the network tool ping. In this code is what appears to be the naieve memcpy bug. The uint32 (ipv4 address) is read from a network packet and then used, without being converted using ntohl().

$ cat ping/ping.c
// function pr_options
  byte* cp = ...;
  uint32_t address;
  memcpy(&address, cp, 4); // <---- typical BUG!
  struct sockaddr_in sin = {
    .sin_family = AF_INET,
    .sin_addr = { address }

Could this really be a bug? Surely the code is being used all the time. Can you see why it's not? The address is out into a sockaddr_in structure and that is one of the few places in your code which requires the address in network byte order. So this is perfectly correct, the integer is read in network byte order, and then later put into a socket-related struct, which requires network byte order. All is well. The integer is never used on the host.

In fact, it turns out that this is a clever fix for another non-bug, in the openbsd ping sources (see below), which I actually "spotted" before this one.

Don't Mislead

The memcpy mistake is so common in network code that I think it should have been noted. Not with a comment but with a varname, like this:
  uint32_t netorder_addr;
  memcpy(&netorder_addr, cp, 4);
  struct sockaddr_in sin = {
    .sin_family = AF_INET,
    .sin_addr = { netorder_addr }

This shows that the author knows about network order and hints why this is actually correct.

OpenBSD Ping Bug?

This "bug" is more egregious than the previous one. It does a lot of complicated-looking work do achieve a nop. It does however show a neat piece of portable code for reading network integers into host machines which is guaranteed to give them in host order regardless of the CPU. This code doesn't need to be changed for different machines (it's portable) so there is no need for ntohl/htonl native implementations for every platform.
// function pr_ipopt (equivalent to iputils pr_options)
  l = *++cp;
  l = (l<<8) + *++cp;
  l = (l<<8) + *++cp;
  l = (l<<8) + *++cp; // <---- (1) read as host order
  if (l == 0)
  else {
    s_in.sin_addr.s_addr = ntohl(l); // <---- (2) change it back to network order
      pr_addr((struct sockaddr*)
      &s_in, sizeof(s_in)));

This code reads a uint32_t from a network-packet (byte buffer). Integers on the network are always big-endian i.e. most significant byte first. The first byte read ends up being shifted all the way to the top (first byte => most significant), and similarly for the other bytes. No matter the CPU, this code will always read the network integer 0x01020304 as 0x01020304 into l.

uint32_t l;
l = *++cp;
l = (l<<8) + *++cp;
l = (l<<8) + *++cp;
l = (l<<8) + *++cp;

We have read 'l' properly and it is now host-order. So, why are we then calling network-to-host-long (ntohl) to convert it to host order again?

Since ntohl == htonl == reverse_bytes we are now converting the number back to network order. So, why does it work? Again, the reason is that the use-site is a socket struct address, which requires network order integers.

The problem is that we already had the integer in network byte order in the network packet. We didn't need to convert to host and then convert it back. The integer is not used elsewhere, only in the socket struct.

Also ntohl should have been htonl for clarity reasons (they are synonyms so it doesn't matter). I suspect 2 different people wrote these 2 pieces of code. The first using the sophisticated, portable host-order read trick would not have made the mistake of using ntohl instead of htonl. I suspect that was a reflex fix when coder-2 noticed the number was the wrong way around.

All this code can be replaced with memcpy(&l, cp, 4); cp+=4;, and that's exactly what the linux version of ping does. Maybe if the original code used hostorder_addr and properly htonl the nop-ness of the code would have become obvious. At least it wouldn't look like a double-convert bug.

site built using mf technology