Remote-Url: https://utcc.utoronto.ca/~cks/space/blog/unix/Argv0IsEasy
Retrieved-at: 2022-01-30 12:45:46.103484+00:00

Famously, Unix passes arguments to programs in theargv[]array,
and the first entry in the array is the 'name' of the program being
run (the 0th element, since C arrays index starting from 0, hence
'argv[0]'). Recently, a whole bunch of people have found out that
argv[0] doesn't even have to be there due toCVE-2021-4034.
One thing you could reasonably ask in the wake of this security issue
is why this is even the case. Why doesn't Unix force argv[0] to always
have a value? Does Unix have some deep reason why this API is the way
it is?The unsatisfying answer is thatthe argv[0] API exists because
it's easy(well, almost certainly), like a great many things in
Unix. It's not necessarily a good idea, but once you decide that
on the one hand the shell (and other programs that run other programs)
should pass the argument array to the kernel and on the other hand
a program should be able to get at its name (or at least the name
it's being run under), putting the program name as argv[0] and
making the program invokingexec()supply it is the simple approach. The kernel simply copies a few C
arrays of strings into the new program's memory space. It doesn't
have to compose together some kernel information (the program being
run) and some user level information (the argument list and the
environment), or provide an additional API to provide the program's
name.(And then once you could manipulate the name that programs were run
under, people took advantage of this as an API. For example, the
traditional way that a Unix shell knew it was being run as a login
shell and so should source your.profilewas that itsargv[0]started with a '-' (a dash). All of this was a user level convention
that the kernel didn't have to get involved in; it was purely betweenloginand the shell.)Research Unix was a small and simple system,
which often took the easy approach both in implementation and (often)
in APIs. Many of the APIs are there not necessarily because they
are great but because they were simple and easy, and some of them
have wound up with problems over time (one example is errno, which
is now quite complicated behind the scenes). So while
there are certainly good things you can say about the UnixargvAPI, those good things are probably not the reason it exists in
this form. The most likely reason it exists is that it's a simple,
easy way to get that combination of features with little effort and
kernel code.(The kernel API is not really the API that C programs see, either, but the core elements are more or less the
same.)This is not to say that we should keep the full details of the
argv API today. My personal view is that argv[0] should always
exist. Because the whole argv API is partially implemented at user
level in early program startup, this wouldn't actually take any
kernel changes. On a modern Unix system, you could makethe
dynamic loaderlook forargcbeing
0 and passmain()a tiny little one-elementargvwith an empty
string as the program name.(This wouldn't necessarily protect programs written in languages
with their own runtimes, like Go, but you can only do so much the
easy way.)PS: The use of argv[0] for the name of the program goes back at
least as far asV3'sexec(2). I
don't think we have manpages for V2 and V1 (if we do, they're not
onwww.tuhs.org), and I'm not energetic
enough to dig through their shell source code (if we have that).