A
Alan Curry
Malcolm McLean wrote something similar upthread. Do you have any
references for this?
Ancient Unix source code is available. And it's not too big to grep. Let's go
mythbusting.
First observation: strncpy is present only in V7, not any earlier versions.
The item most commonly accused of being the original reason for strncpy is
the directory entry, which is defined like this:
#ifndef DIRSIZ
#define DIRSIZ 14
#endif
struct direct
{
ino_t d_ino;
char d_name[DIRSIZ];
};
That definition appears in 2 different header files, one which is used in the
kernel and one which is used in userspace. The headers are identical. In
fact, the above 8 lines are the entire contents.
This struct represents the format of a directory entry. There is no
distinction between the external format on permanent storage and the internal
format used by the system. There was no need for such a distinction because
there was only one supported filesystem type.
If the strncpy hypothesis is true, the original use of strncpy should have
been to copy from a \0-terminated string into a d_name field. There are 10
uses of strncpy in V7. (11 if you also include lpr from the "Addenda" tape.
But let's not, because it came later.)
Results of grepping for strncpy, after removing instances that were not
actually calls to strncpy:
usr/src/cmd/atrun.c: strncpy(file, dirent.d_name, DIRSIZ);
usr/src/cmd/crypt.c: strncpy(buf, pw, 8);
usr/src/cmd/ed.c: strncpy(buf, keyp, 8);
usr/src/cmd/expr.y: strncpy(Mstring[0], p, num);
usr/src/cmd/login.c:#define SCPYN(a, b) strncpy(a, b, sizeof(a))
usr/src/cmd/login.c: SCPYN(utmp.ut_name, "");
usr/src/cmd/login.c: SCPYN(utmp.ut_name, argv[1]);
usr/src/cmd/login.c: SCPYN(utmp.ut_line, index(ttyn+1, '/')+1);
usr/src/cmd/mkdir.c: strncpy(pname, d, slash);
usr/src/cmd/ranlib.c: strncpy(firstname, arp.ar_name, 14);
usr/src/cmd/xsend/lib.c: strncpy(buf, s, 10);
Some of the calls were through the SCPYN macro so I also included that as a
grep target.
First notice that all the matches are in usr/src/cmd, not in usr/sys where
the kernel source is. I expected the primeval strncpy to be in the kernel,
perhaps in the creat call where a \0-terminated string from userspace must be
used to populate a new directory entry. Nope.
Well, at least the first match (atrun.c) is working on a d_name. Yay! A
confirmation! Wait a minute, what's the argument order for strncpy again?
Destination first. Crap! A non-confirmation. With additional context, this
looks like exactly the kind of sloppy usage of strncpy that we now try to
avoid. The destination, "file", is declared like this:
char file[DIRSIZ+1];
And after the strncpy we find the usual fixup:
strncpy(file, dirent.d_name, DIRSIZ);
file[DIRSIZ] = '\0';
And then "file" is used as a \0-terminated string. It didn't need the extra
padding. strncpy is doing something useful here through, protecting against
an unterminated source buffer. That's probably as close as we're going to get
to confirming the strncpy hypothesis, since none of the rest of the uses
involve d_name.
In crypt.c we have
char buf[13];
and
strncpy(buf, pw, 8);
In this case, the source string is a user-supplied password, which is
\0-terminated, so strncpy is not protecting against an unterminated source.
It is, however, truncating the source if it is longer than 8 characters. And
if the password is less than 8 bytes long, the padding will actually be
relevant. The buffer is required to contain an 8-byte "key". This use of
strncpy actually needs all of its features.
In ed.c there is basically a copy of the same function from crypt.c, for
editing encrypted files.
In expr.y we have what looks like another sloppy strncpy. It has the fixup:
strncpy(Mstring[0], p, num);
Mstring[0][num] = '\0';
But the length "num" is not related to the size of the destination buffer.
It's the length of the portion of the source buffer that matched the \(...\)
subexpression of a regexp, which will be saved into Mstring[0]. Mstring is
declared like this:
char Mstring[1][128];
so there's a potential buffer overflow here if you use \(...\) to match a
string longer than 128 bytes. Maybe it's impossible to pass a string that
long to the program; I don't know. Moving on...
login.c looks good. The SCPYN macro ties the strncpy length limit to the
destination buffer size, and the destinations are ut_name and ut_line, both
of which are fixed-size buffers that need to be padded. This might actually
be a better case than the atrun.c usage. It fits everything we expected to
find except that it's utmp, not a directory.
mkdir.c is next, and it's an exciting candidate, isn't it? Especially if you
remember that mkdir wasn't a syscall yet, so the userspace mkdir program was
actually setuid root and worked at a low level. Not quite low enough to
operate on a struct direct though. What's happening here
strncpy(pname, d, slash);
is, like the expr.y usage, copying a substring of the source string, which is
taken directly from main's argv, so it's \0-terminated. strncpy is not
protecting against an unterminated source, and it's not truncating a long
source, so its only possibly useful feature is padding. Nope. After the
strncpy, pname is treated as a \0-terminated string. The fixup is hidden in a
strcat this time, but it's still there:
if(slash)
strncpy(pname, d, slash);
strcpy(pname+slash, ".");
Again there's a potential buffer overflow if the source string is longer than
128 bytes.
ranlib.c looks strange. I don't know exactly what it's doing, since I don't
know anything about the ar format. But the strncpy destination is
char firstname[17];
and the strncpy is
strncpy(firstname, arp.ar_name, 14);
seem a bit weird. Later, firstname is used as a \0-terminated string. So if
the source string arp.ar_name was shorter than 14 bytes, the first padding
byte added by strncpy will be the terminator and the rest will be
unnecessary. If the source string was 14 bytes, the terminator will be the \0
found at firstname[14]... not courtesy of strncpy or any post-strncpy fixup,
but just because it's in the bss and it never gets modified.
xsend/lib.c looks like another copy of the encryption key setup code found in
crypt.c and ed.c using a slightly different key generation method that uses
up to 10 characters of the user-supplied password.
I've looked in the V6 source for the code corresponding to each strncpy call
in V7. Most of them (at, expr, ranlib, xsend, and the encryption ability of
ed) don't exist in V6. mkdir was pure assembly in V6. The crypt program looks
like a total rewrite. login is the only one that had a direct equivalent.
struct utmp in V6 was different, using only a single char to identify the
tty, but ut_name was there (called simply "name") and it's appears plausible
that strncpy and SCPYN were added specifically to simplify the existing code,
which copied and padded the array with manual loops.
And one last thing... if strncpy wasn't used to populate the d_name field,
how was it done? Well, the kernel creates directory entries in response to
user requests like creat and mknod and link. All of those eventually end up
calling wdir() in usr/sys/sys/iget.c which does this:
bcopy((caddr_t)u.u_dbuf, (caddr_t)u.u_dent.d_name, DIRSIZ);
So a d_name is created as an exact copy of a u_dbuf, which is declared like
this:
char u_dbuf[DIRSIZ]; /* current pathname component */
and must already be properly padded. That was done before the wdir() call by
namei() in usr/sys/sys/nami.c and here's the answer:
(at this point, c is either the first character of a pathname or a non-slash
character that was found after a slash)
cp = &u.u_dbuf[0];
while (c != '/' && c != '\0' && u.u_error == 0 ) {
if (mpxip!=NULL && c=='!')
break;
if(cp < &u.u_dbuf[DIRSIZ])
*cp++ = c;
c = (*func)();
}
while(cp < &u.u_dbuf[DIRSIZ])
*cp++ = '\0';
Characters are read one at a time from the source string by calling a
callback function (*func). It does that because the source string may or may
not be in userspace, and userspace strings can't be directly addressed.
Afterward, the padding is done with a loop.