Tuesday, May 23, 2006

Fun with global constructors

(Note: for the purpose of this discussion, "global" objects means:

int a;
static int b;
class foo {
static int c;
};
int foo:c;
void func()
{
static int q;
}

For our discussion, a, b and c are "globals" but "q" is not. While all of these will have static storage allocated for them, a b and c will be initialized during program startup; q will be iniitalized the first time func() runs - possibly never! I will have to rant on how the word static has 3 syntactical meanings and at least that many language meanings some other time.)

The rules for the construction of C++ global objects go something like this:

  • Plain old data (read: int = 0) get initialized before dynamic data (int = some_func(), map). Basically things that can be inited just by splatting their memory are initialized before any code is run.
  • Within a translation unit, dynamic initialization goes in order of the file.
  • Between translation units, dynamic initialization can happen in any order. (Essentially the compiler has no idea which file comes "first".)

That’s enough to make us miserable right there: because the order of static initialization is variable between files, it means that if we have an "API" in a translation unit that requires global data to function, we can’t use it before main() is called because our static initialization code might be running before the API’s. We have no way to control this.

But this is C++ - three rules can’t be everything when it comes to static initialization, right?

  • Dynamic initialization does not have to happen before "main". But it does have to happen before non-initialization code in that translation unit gets called. So going back to our "API" - if we have a global, it will be initialized before the API is used, but it may or may not be initialized before main.
  • Dynamic initialization can be replaced with static initialization (splatting memory) if:
    1. That initialization doesn’t have any side effects on other initialization and
    2. The compiler can figure out what values thet dynamic initialization would have produced under some cirumstances.
It is worth noting that in this case the compiler can initialize our object to the results of the dynamic initialization or some static value that would be legal too since this is happening before we are required to have an initialized object. Most compilers I have played with tend to fill such objects with zero, but it doesn’t look to me like thet spec requires this.

Okay now we’ve got something confusing enough to really do some damange. Not only will C++ call our globals’ constructors in a basically random order between files, but: it may call them in a random order within files by deciding that what we thought was dynamic was really static (poof - that constructor goes to the front of the line), and this may or may not be happening before main is called.

(For what it’s worth, at least CodeWarrior always initializes everything before main - it’s easier for them to make a big linked list of globals and run through it, translation unit by translatoin unit. And frankly since our global will be built before the translation unit is called, initialization after main is the least of our problems in practice.)

It’s pretty easy to get yourself in trouble with these limitations:

//header
class foo {
public:
foo();
~foo();
static void debug_all_foo();
private:
static set all;
};
// CPP implementation
set foo:all; // this is global
foo::foo()
{
all.insert(this);
}
foo::~foo()
{
all.erase(this);
}
void foo::debug_all_foo()
{
for (set::iterator i = all.begin(); i != all.end(); ++i)
(*i)->do_something();
}
// Usage - in a separate CPP file
static foo my_obj; // also global

The idea here is very simple: foo objs maintain a global set of their own ptrs - so we can do something to all foo() if we need. Set was chosen here because on many STL implementations it can’t be zero initialized without the entire world exploding, which is not true of vector. (Vector will however leak memory you use it while zero-initialized, but I digress.)

The problem is this: what gets constructed first? my_obj or foo.all? The answer is: we cannot know. If my_obj is inited first, foo_all contains, well, I don’t know what, but certainly not the necessary dynamically constructed parts to make a set. Thus my_obj will cause a crash before main or do something else not like what we want. If my_obj is initalized second, we go home happy. Only your C++ compiler knows for sure.

(In the case of vector, the fail case is: the global vector gets zeroed if your compiler is into that thing, then the client code puts an object into the vector, since a zero vector is legitimate in a lot of STL implementations, then the real constructor zeros it out again, leaking memory and "losing" your object mysteriously.)

I just went through this fire drill with X-Plane when making a stats-counter class; the stat object tends to be static to a clien’ts code so that it is "just there and ready" and some internal book-keeping keeps a global map of them around so we can zero all counters by catagory. Since static initialization was important, my solution was: use an intrinsically linked list to chain the objects together for tracking. Because the head of the list sis just a dumb pointer initialized to zero, it’s guaranteed to be correct before any code runs. Each constructor simply updates the head pointer and we end up with a linked list with on coflicts.

Generally I can recommend a few techniques to avoid such constructor chaos, but no one technique will fit all:

  • If you have a translation unit that forms an "API", don’t use static objects to initialize your internal state if you depend on an external API. If you can’t avoid this (because for example you have global STL variables and some kind of real initialization) consider breaking the initialization up and doing the initialization later.
  • Dynamically allocate global API-related stuff using operator-new, either in an explicit initialization fuction (called after main) or upon first use of the API.
  • If you can avoid using globals in an API implementation (and instead requiire some kind of "handle") you can push this problem off to client code.
  • Use explicit initialization of sub-systems. It’s simple, debuggable, and you never get into static-constructor trouble.

One comment on that last point: if you build up a table of static constructors to build object factories, you’re going to have to explicitly initialize that table anyway. That’ll have to be another blog entry too.

No comments:

Post a Comment