Switch on Type Construction

Many years ago I had a non-technical manager who was amazing. If you’ve been in the field long enough you’ll know that this is a rare and wonderful thing. One of the things that made this guy special was how he handled the yearly review process. Maybe I should tell you how every other company’s review process works so you can compare and contrast. Here are the steps for every other company:

  1. Fill out a humiliating form boasting about everything you did during the year because it’s apparent that your manager barely remembers who you are. Make sure you add a section about your failures so everyone will know you’re humble.
  2. Meet with your manager so he can tell you how amazing you are for a while.
  3. Then learn that despite how amazing you are and how awesome the company is, learn that there’s no money for raises because of convoluted financial magic.
  4. Receive disappointing raise with vague talk about how it might possibly be larger next year.

So compare that with my amazing manager who absolutely knew who you were and had clearly done a lot of work to create a review process that he thought would be useful. No forms to fill out. He’d ask you to come up with a handful of technical goals you wanted to achieve, and a handful of non-technical goals. That’s right, not only did he want to help you become a better programmer, he’d also play coach for anything else you wanted to do. Some people came to him with lists of things like “learn to cook” or “get motorcycle license” and he’d absolutely help you set milestones to accomplish those goals too.

So one year when it was my turn for the yearly review, I told him I wanted to learn how to use Design Patterns. Like most every programmer, I’d read the Design Patterns book, but I never could figure out why it was useful. I basically memorize them all when it’s time to interview for a new job, and promptly forget about them. So my manager gave me the task of learning a new design pattern that wasn’t in the book. This sounded interesting enough and after a little research, I came across what I call the Switch on Type Construction pattern, and this (finally) brings me to the point of this post.

Warning: I was recently horrified to learn that this solution does not always work, depending on compilers, optimizers, environments, etc. But it’s still awesome so I’m keeping the post. But be careful with it.

Often in C++ you end up in a situation where you want your code to do something different based on the type of the class. This is called “Switch on Type” and it is a “bad thing.” This phenomenon is so well known it’s a classic case of why we have classes in the first place. Here’s some code to demonstrate, let’s say you have something like this to handle images:

void render( Image* obj )
{
    if (get_extension(obj->file_name()) == "jpg") {
        obj->render_jpg();
    }
    else if (get_extension(obj->file_name()) == "png") {
        obj->render_png();
    }
}

This is bad because as the list of formats grows, you’re constantly back in here cutting and pasting to the list, and probably introducing bugs, etc. This situation was solved with virtual functions: a base image type, and then a subclass of Jpg and Png image types. So the previous code just turns into something obvious like:

    obj->render_image();

where the render_image function is virtual in the base type, and then overridden to do the correct things in the subclass.

This is all well and good, but what do we do with object creation? You can’t use virtual functions in a class that hasn’t been constructed, and there’s no such thing as a virtual constructor.  Our task for this post will be to create an appropriate image object depending on the file type (and we’ll assume that the type is correct and ignore errors).

So let’s take our starting point as this:

class Image { /* blah blah constructors, functions etc */ };
class Png_image : public Image { /* blah blah */ };
class Jpg_image : public Image { /* blah */ };

std::shared_ptr<Image> create_image( const std::string& file_name )
{
    if (get_extension(file_name) == "jpg") {
        return std::shared_ptr<Image>( new Jpg_image( file_name ) );
    }
    if (get_extension(file_name ) == "png") {
        return std::shared_ptr<Image>( new Png_image( file_name ) );
    }
    throw std::runtime_error( "Unknown image type" );
}

Does that seem like some code you may have written in the past? I know it’s definitely something I have written. Any why not? It’s short, concise, solves the problem, etc etc. Who cares that every time we need to add a new image type we have to update this function? Well what if I told you there’s a pattern that lets you add new image types without recompiling existing code? “Impossible” you’d say, and you’d be wrong.

The first thing we’ll need to create for our awesome solution is an object creation map. This is a map that uses our image type (either “png” or “jpg”) as a key and a creation function as a value. With this we can simply use the image type to find the correct creation function. Of course this map will need to be a singleton as we only want one of them in the whole program. Let’s expand our Image class to include:

  • Typedefs for complex types
  • The creation map
  • Accessors to this map
class Image {
public:
    static void register( const std::string& ext, Creation_func& func)
    {
        function_map()[ext] = the_function;
    }
    static std::shared_ptr<Image> create( const std::string& file_name )
    {
        return function_map()[get_extension(file_name)]( file_name );
    }
private:
    typedef std::function<std::shared_ptr<Image>(const std::string&)> 
            Creation_func;
    typedef std::map<std::string, Creation_func> Function_map;
    static Function_map& function_map()
    {
        static Function_map singleton_map;
        return singleton_map;
    }
};

First consider the Creation_func typedef. It defines a function signature for a function that creates shared_ptrs of Image. Such a function might look like std::shared_ptr<Image> create( const std::string& file_name ). This Creation_func is combined with a file extension (ie “png” or “jpg”) to make a Function_map. Then there’s a function_map() accessor, and a function to add entries to the map, and a function to create an actual image. Let’s see how our code could now use this to get rid of the “switch on type” issue:

// functions defined elsewhere
std::shared_ptr<Image> create_jpg( const std::string& file_name );
std::shared_ptr<Image> create_png( const std::string& file_mame );

// register them in the map
Image::register( "jpg", create_jpg );
Image::register( "png", create_png );

// and use them in your code
std::shared_ptr<Image> jpg_image = Image::create( "file_name.jpg" );

So that’s a pretty cool start. We no longer have the if (extension_check) { return Image; } mess that we had before. But it’s still not terribly easy to use. Now if we wanted to add support for a third image type we’d still have to edit our code and add a call to register. Still, we could stop right here and have a much better solution… or we could add templates to make the whole thing better.

Enter Templates

What we want to do to make this solution better is to create a system wherein new image types can be added to the code without recompiling anything. That’s sort of the holy grail of system extension. That means no testing of any existing code because the existing code won’t change. Heck you could take your existing object files, compile in the new image classes, link the whole thing together and magically have support for new types. Sounds like a trick to strive for. So to help us out we want to add a template class that acts sort of like a factory. Here’s how it looks:

template <class T>
struct Registration_object {
    Registration_object( const std::string& ext )
    {
        Image::Creation_func f = Registration_object<T>::create;
        Image::register( ext, f );
    }
    static std::shared_ptr<Image> create( const std::string& image_name )
    {
        return std::shared_ptr<Image>( new T( image_name ) );
    }
};

So what’s going on with this one? Well it has a static create function that simply creates a new Image of the templated type T. It also has a constructor that will add a pointer to that create function into the singleton function map.

So how do we use this new registration object? All we have to do is create exactly one of these for each image type, and it will automatically register that new type in the map. So let’s change our usage code to use these instead:

Registration_object<Jpg_image> jpg_registration( "jpg" );
Registration_object<Png_image> png_registration( "png" );

std::shared_ptr<Image> jpg_image = Image::create( "file_name.jpg" );

That’s starting to look pretty awesome. Now we just have to create one global variable that we never directly use, and our image type is magically added to the map. So how can we use this trick to add an entirely new image type without recompiling existing code? Hold on to your hats, cause this is the amazing bit. I’m not going to touch the code above (ie I will not recompile it) but I can add Gif support by simply adding the following in a separate .cpp file:

class Gif_image : public Image {
public:
    Gif_image( const std::string& file_name ) { /* Gif image stuff*/ }
};
Registration_object<Gif_image> gif_registration( "gif" );

Did you see it? Now when this new .cpp file is compiled and linked with the existing object files, your program will suddenly be able to deal with gif images. The reason this works is during static object creation time (before main starts) an instance of this new Registration_object<Gif_image> will be constructed, and during that construction it will register its create function with the static function map. If that doesn’t impress you then stop reading this blog and fuck you.

Oh, and what about that manager I had? He helped me reach a number of goals, both technical and non. Eventually the company merged with another one and he was replaced with the HR lady from the other company. The first thing she did was bring in those damned yearly evaluation forms like every other company has. I was more then happy to quit that job.

Static Creation Functions

When I wrote about Class Construction, I noted that of all the incorrect arguments about constructors not being sufficient, there were a handful of cases that are actually correct. While not an exhaustive list, here are some cases that spring to mind:

  1. If you are creating a class hierarchy wherein a base class defines a number of virtual functions that subclasses are meant to implement (like on_create). It’s not unreasonable to try calling a virtual function in the base class constructor and expect a subclass function to run. Unfortunately this won’t work. The v-table won’t be set up yet, so attempting to call a virtual function will in turn call the base class’ implementation.
  2. If you want to enforce HOW your objects are constructed. While it’s not possible to force all your classes to only be constructed on the stack, it is possible to force clients to only create objects as smart pointers (or raw pointers if you’re a big fan of resource leaks).

In both of these cases it won’t work to simply expect your clients to use the regular constructor. For case 2 you could solve this by adding documentation that says “please don’t create these on the stack” but if your code depends on developers actually reading the comments then it’s already doomed. You could solve case 1 by using an initialization function, but then you’d be totally ignoring all the great advice in the post that told you why that’s a horrible idea.  So what’s left? This is a perfect situation for a static creation function, which looks like this:

class Some_class {
public:
    static std::shared_ptr<Some_class> create();
};

So what’s going on here? Well, the first thing to notice is that it’s a static function, which is a pretty important aspect to this technique as it means you can call the function without having created an instance of the object (because if you need the object to exist before you can create it, it’s probably not gonna work). The next thing to notice is that I have control over the return type, which in this case is a shared_ptr. You can imagine other return types (or even void if you need to enforce that the object must be created into some global store that you access in some other way).

At this point you may be pointing out that while this is all fine and dandy, there’s still nothing to stop anyone from just creating an instance of Some_class directly and avoiding the create function.  You’d be right. The second half to this trick is that you have to make your constructor private. It will still be accessible inside the create function, but it won’t allow users to create your object any other way. So here’s a more complete example:

class Some_class {
public:
    ~Some_class();
    static std::shared_ptr<Some_class> create()
    {
        std::shared_ptr<Some_class> ptr( new Some_class);
        ptr->some_kind_of_initialization();
        ptr->on_create();
        return ptr;
    }
private:
    Some_class();
    Some_class( const Some_class& );
    Some_class& operator=( const Some_class& );
};

I don’t generally put the code in the actual header, but it makes the example easier to read. Notice that along with the constructor I also made the copy constructor and operator= all private. There is now no way to create one of these things without using the static create function. Here’s how you’d have to make one:

    std::shared_ptr<Some_class> ptr = Some_class::create(); // OK
    Some_class stack_instance;           // won't compile
    Some_class copy_construct( *ptr );   // won't compile

Pretty sporty, huh? Take some time to soak this in, controlling an object’s construction is awesome and definitely needs to be a part of your toolkit.

Now you may also have heard about factories.  A factory is a lot like a static creation function, but it exists as a separate class, adding a level of abstraction. I even had a co-worker who used to go on about how factories should actually be interfaces, so you could have entire hierarchies of factories that you could swap out to get different kind of creation functions. This all sounds well and good and impressive and fancy, but in practice I’ve never been in a situation where a static create function wasn’t good enough. Class factories always seem like going too complex for me. But as I’ve stated before, I’m not terribly clever, so if you really want to create a factory (or an entire inheritance tree of factories) then go right ahead. Just make sure you make your object’s constructors private to ensure people don’t just bypass the whole thing.

Some bullet points:

  • Occasionally (rarely) a constructor is not sufficient
  • Create a static function called “create” that does exactly that
  • Make your object’s other construction functions private
  • You can use class factories if you want to be a smarty pants

Exceptions are Awesome

Welcome to the year 2015. Obamacare is the law of the land, gay folks can get married, C++ has lambdas, and (this will surprise some folks) exceptions are the correct way to report errors. The fact that I still have to argue this point is, frankly, shocking. I don’t care that the Google style guide doesn’t allow them, and for the love of God don’t tell me they’re inefficient. Simply put, they are the way to report errors, you must know how to use them.

Now to some extent I actually sort of sympathize on this one. When I learned C++ (back in 1995 or so) exceptions weren’t widely used. And even up until around 2005 I was pretty convinced that they were a bad idea. But I’m old and that was 10 years ago… what’s your excuse? Some programmers today seem to have the same misconception I had in 2005, so let’s walk through the faulty logic. They argue that using exceptions makes code bloat substantially. They start with something like this:

return_code = blah();
if (return_code != SUCCESS) {
    return return_code;
}
return_code = something_else();
if (return_code != SUCCESS) {
    return return_code;
}

And apply exceptions by turning it into this:

try {
    blah();
}
catch (const std::exception& e) {
    throw runtime_error( "calling blah failed" );
}
try {
    something_else();
}
catch (const std::exception& e) {
    throw runtime_error( "calling something_else failed" );
}

And then argue that using exceptions has turned an 8 line function into a 12 line one. But they’re missing the point. So I’ll write down the point in bold letters: throw an exception when an error occurs, and catch an exception when you can do something about it. Armed with this knowledge, most junior programmers will head back to their keyboards and come back to me with something like this:

try {
    blah();
    something_else();
}
catch (const std::exception& e) {
    throw runtime_error( "calling blah or something_else failed" );
}

Well now we’re down to just 7 lines, so this is a step in the right direction, but it’s still wrong. Let’s take a look at the correct answer and then discuss why it’s correct

    blah();
    something_else();

Ah, much better. Now we’re just down to 2 lines, and better yet, we don’t have to worry about errors at all. We have two short, succinct lines that just assume the best case scenario and ignore errors completely, what could be better? “But wait, this can’t be correct, there’s no error handling at all!!!” And that’s the point. Take a look at the second part of the bold statement above, catch an exception when you can do something about it. In all of these examples there was nothing sensible we could do in the case of an error but just return an error up to the caller. Since an exception just naturally bubbles up the call stack, and since there’s nothing I can do about it here, just let it bubble up the call stack.

Do you see the beauty in that? Are you reading this and having a warm and tingly sensation rubbing your chest? If not, think about it some more. You are now free to write code where you don’t have to nit pick every single error case. By way of example, let’s say you’re writing some kind of web server. A request comes in for a web page and in order to respond you’ve got to hit a database for some content, the disk for some assets, and then maybe execute some business logic to tie it all together (God help you if it’s Ruby on Rails). Somewhere deep down in the bowels of your db connection something might go wrong. Do you really want to have to watch error codes at every function call and have to keep translating them as they work their way up the stack through multiple layers of return codes? Of course not, none of the intermediate layers can do anything anyway, the db is dead, they can’t fix that. Imagine a world where the db fails and the highest layer is simply notified so it can return a bland 500 error to the client. That’s the joy of exceptions. The db layer throws an exception, every layer in between ignores it, and the top layer catches it and returns. But wait, it gets even better. The system unwinds the stack for you, correctly destroying all the objects you created along the way. Sound awesome? That’s because it is awesome.

Oh but wait, isn’t it expensive? Well yes, it is, but that doesn’t matter. Why doesn’t it matter? Because exceptions are for exceptional cases. Take the case of our http server above. How often do you think a database blows up? Okay okay, I get the joke, almost constantly, but seriously, in the normal day of operations, how often? It’s an unusual thing, so who cares if it runs just a tiny bit slower? Users aren’t gonna be sitting on their web browser complaining that their 500 error page took an extra picosecond to return. In the normal run of things, exceptions don’t happen, so you don’t pay a performance penalty.

Of course this is only true if you don’t abuse exceptions. You need to make sure that you’re not using exceptions to report back something that is expected behavior. Let’s say you’re writing some function bool lookup_something_in_db( int id, Thing& thing_copy ). This function looks up a Thing in the db with the given id. It seems to me that the thing in the db might not exist. Or to say it in a more obvious way, it’s not an exceptional case that the id doesn’t find something in the db. So using an exception to return information that the thing doesn’t exist would be a really bad idea, you’d be throwing exceptions for cases that are expected. In this case returning a bool to notify the caller that thing they were looking for doesn’t exist is a good idea.

So here’s the bullet points:

  • Exceptions are awesome, use them
  • Throw when an error happens, catch when you can do something about it
  • Don’t use exceptions for expected, normal processing

And further reading

Class Construction

I’ll start with a story. Years ago I was starting out at a little software shop, trying to learn my way around the existing code base. As is usual in these cases, everything I encountered looked bad to me. As I’ve grown a bit as a programmer I’ve learned to realize that some of this feeling comes from simply not understanding the constraints on a given system, and some of this comes from actually encountering bad software. So of course many spirited discussions followed wherein the head of software would defend the code and I would try to tear it down. I think one of the primary goals of a library is that it should be “easy to use correctly and hard to use incorrectly” – Scott Meyers (see below). While I admit that this is not always achievable, it should at least be a goal. He stated emphatically that his library had done a good job of this. So imagine my revulsion when I stumbled upon something like this:

class Some_class {
public:
    Some_class();
    bool initialize();
    bool startup();
    void set_some_member( const std::vector<int>& member );
};

Have a longer look at this one and ask yourself if this is easy to use correctly. If you think it is then look again and ask yourself how to create an instance of Some_class. I asked this exact question and he told me it was obvious and I should look for some examples. So I did. I learned that the correct way to create an instance of Some_class is to first call the constructor (obviously), then call set_some_member (huh?), and then call startup. Also, for the love of God, whatever you do, never call initialize because that leads to undefined behavior.

So this leads me to the point of this post, which is that of all the things your object needs to do, construction is really fundamental. As a corollary to this, initialization functions are awful. Now I can already hear the more senior types pointing out that there are cases where a constructor alone simply cannot do the job. You’re correct, that is true, and in time we’ll be looking at some of those. However there are a whole other set of incorrect cases that junior developers mistakenly think justify initialization functions (and other horrors). We need to end this misconception.

The most pervasive incorrect case is when someone thinks that because there’s no way to return a status code from a constructor, you need an initializer function to truly bring your object to life. In this anti-pattern the constructor does the basic member initialization, which can not fail, and then the initializer does the dangerous work, returning a success code if everything went okay. This leads to two major problems. The first is for the users of this class. They will skim your documentation just enough to find some function they like, and the constructor, and immediately set about creating it and using it. Maybe this will kind of work without calling the initalizer, or maybe it won’t. When something eventually fails, they’ll be annoyed. But more importantly, this object will be hard for you to write because you’ll invariably have to hold some internal flag to check whether or not initialize has been called. Then you’ll need to worry about what happens if someone calls initialize twice, and blah blah blah it’s just not worth it.

In order to stay out of this quagmire you need to know one simple thing: it is correct to throw exceptions from constructors. Many of the junior programmers I speak with don’t know this simple fact. Often they’ll hedge their bets by telling me the code will compile, but it’s not the correct thing to do. They’re wrong. Throwing exceptions from a failed constructor is exactly the correct thing to do. In fact it’s the only way to safely let the calling function know about failed construction. But then what’s the state of an object that throws an exception halfway through construction? Simple, it doesn’t exist, it never did. Objects don’t exist until construction is finished, so if construction doesn’t finish, the object doesn’t exist. Will the destructor be called?  No, it most certainly will not. How can you destruct an object that was never constructed? Another question I often get is, doesn’t this lead to memory leaks?  The answer is: only if you’ve written a constructor that isn’t exception safe. But this has nothing to do with constructors and is simply a fact of writing any function that isn’t exception safe. You should never have a function that allocates anything and just assumes that no exception will ever be called.

C++11 Note: In C++11 we now have delegating constructors. This means a constructor can call another constructor. This makes it a little less clear about when an object is constructed. The rule is that when the first constructor is finished, the object has been created. So now if an exception is thrown from the calling constructor (after the called constructor is finished) the destructor will be called.

So there you have it, end of lesson. Here’s the summary in simple bullet points:

  • Never ever ever ever write an initialization function
  • If something goes wrong during construction, throw an exception
  • If an exception is thrown during construction, the object never existed (the destructor won’t be called)
  • Don’t write functions that aren’t exception safe (constructors or otherwise)

Further reading: