Posts Tagged ‘assembler’

argv and argc, and just how to get them

Monday, October 12th, 2009

argv and argc. They are vital parts of many programs. For the uninitiated, in many programming languages, they are the variables that hold the values typed into the commandline by the user. This article is about a particular problem we ran into trying to use them in a way that wasn’t really forseen. It really only applies to C and C++

Now most C/C++ programmers are probably thinking ‘what is so hard, you get them when you start main, right’…

int main(int argc,char **argv,char **envp)

That’s true, and in virtually all cases, you can store these values, either in a class, a global variable, a function, whatever your programming style uses. And from then on, you can simply access them from anywhere in the program. Its really simple, and something that most programmers know how to do in their sleep.

But what do you do before main happens? Or what about if you don’t have a main()

I know, you always have a main and nothing runs before it.

But that is very not true. Lets take two examples

Firstly, in C++

class gamedata
{
public:
  gamedata()
  {
    commandline=get_argv();
  }
  char **commandline;
}

static gamedata datastore;

int main(int argc,char **argv)
{
   set_argv(argv);
   ...
}

To start, this looks OK. Sure, Ive missed out a whole bunch of things, assumed what the user functions set_argv and get_argv do. But it looks fine. Until you look again. That static class instantiation will run its constructor before main runs. How on earth can you POSSIBLY get argv at this stage??

Lets look at an example in C now

If you are creating a shared object, you will very often need to initialise it to ensure that its state is known for the first call into the objects functions.

static char **commandline;

void __attribute__ ((constructor)) localinit()
{
  commandline=get_argv();
}

Again, you try this, and you get a big nothing. This function is called before main runs, before you ever have access to the values.

So, what is the answer?

The answer is, unfortunately, ugly. There is, in Linux, no good way of doing this. There are two perfectly good symbols inside libc which do the job perfectly, __libc_argv and __libc_argc, which are defined way before the program gets to any user-created code. Unfortunately they are declared private and you as a user are not permitted to see them. So, another way is needed.

We came up with two ways to make it work. Neither are portable, and one of them, while it works just fine, does make me go ewww. I’ll leave it to your imagination which one makes me go ewww the most.

char **get_argv()
{
  static char **preset_argv=NULL;

  if (!preset_argv)
  {
    FILE *fp;
    fp=fopen("/proc/self/cmdline","r");
    if (fp)
    {
      //Your implimentation to take the commandline as typed from this
      //file and turn it into argv. Its fairly basic
    }
  }
  return preset_argv;
}
char **get_argv()
{
  static char **preset_argv=NULL;

  if (!preset_argv)
  {
    extern char **environ;
    char ***argvp;
    void *p;
    asm ("mov %%esp, %0" : "=r" (p));
    argvp = p;
    while (*argvp != environ)
      argvp++;
    argvp--;
    preset_argv = *argvp;
  }
  return preset_argv;
}

So, a little explaination

The first example relies on /proc, the part of the filesystem that you can get all sorts of interesting information from. /proc/self/cmdline is always an exact duplicate of the command typed to execute the application, or the exact value passed in from the menu option you clicked to get it working. I haven’t bothered to create the bit of code to separate out the commandline parts onto their components. Partly because that code is fairly straightforwards, and partly because it is quite long and dull (remember it isn’t just a case of separating by spaces, you have to take into account things grouped in quotes, and other fun stuff). This is not portable beyond Linux, and people keep telling me that not all Linux distros have /proc either.

The second example requires a tiny bit of assembler knowledge. It is portable across most unix flavours, but I expect trying it outside of unix will cause you much pain. It relies on the fact that a unix standard is to push the argc and argv values right onto the top of the stack when a program starts. The example reads from the top of the stack until it finds the environ value (which is globally available at all times), and then reads back one value to get the pointer to argv. This way has the advantage that the values are correctly parsed for quotes and the like already. It makes the assumption that environ is the next thing on the stack after argv. I have seen many reports claiming this is always true, but I cannot find the location of an authoritative piece of documentation saying that it is specified true and will not change in future.

It isn’t often that you will need to do this. Most of the time, argv and argc are perfectly usable in the way you will probably have been using them for years. Even the examples above can be ‘worked around’ using initialisers called immediately after main() starts. But if one day, you come up against a problem where you need your argv where you usually don’t have access, I hope you find this post useful.

  • Share/Bookmark