Xcode/Clang gamedev tips

Reassembly was written primarily on OSX using Xcode for compiling/debugging and emacs for text editing. Xcode is clearly designed primarily for developing iOS “apps” and there are several non-obvious settings that vastly improve the desktop game development experience.

My general experience with Xcode has been fairly positive. I think Visual Studio is a slightly better debugger overall but not enough to make me abandon OSX. As a solo developer I need to spend a fair amount of time in cafes and other public places to avoid going insane and as far as I can tell Apple makes the best laptops.

My OSX builds are all 64 bit and have a deployment target of OSX 10.7. The following was written with reference to Xcode 6.1.1.


This config file is read by lldb, the llvm debugger, which is the backend for the xcode debugger. Settings effect the xcode visual debugger window in addition to the console.

Print 64 bit unsigned values in hex and show a summary for vector types.

type format add -f hex uint64 "unsigned long long"
type summary add --inline-children --omit-names float2 double2 cpVect int2 float3

Print a summary for std::pair types (particularly useful for browsing std::map and std::unordered_map).

type summary add --inline-children --omit-names -x "^std::pair<.+>(( )?&)?$"

Print the address, “name” and “blocks” fields of custom type BlockCluster.

type summary add --summary-string "{addr=${var%V} name=${var.name} ${var.blocks}}" BlockCluster

One warning – invalid summary strings often crash Xcode. Restarting the debugging sessions is enough to reload .lldbinit after changes.

Develop No-resource build

For rapid iteration with good performance, I created a “Develop NoResource” target. This build has optimizations enabled (-O3), but no linktime codegen (which takes a long time). While iterating on code remove the main data directory from the “Copy Bundle Resources” build phase to prevent it from being copied every time. As long as you don’t clean previously copied resources will remain in the app directory.

In general lldb is “OK” at debugging optimized builds. It frequently looses the “this” pointer and stack variables but is nonetheless usually useful enough to get the job done. This is one of the areas where Visual Studio really shines, particularly with the magic option /d2Zi+ – see this Random ASCII post. Unfortunately C++ Debug builds, which the debugger readily understands, are usually too slow to be useful.

I have not been able to get the lldb debugger to call inlined stl symbols (e.g. std::vector::size) from the command line in optimized builds. If anyone knows how to do this please tell me.

Platform layers

I haven’t been able to completely abstract OS details into a third party platform independent layer like SDL. Reassembly uses SDL for window and event handling on windows and linux but is pure Cocoa on OSX. The Cocoa programming model is different enough from SDL that some level of native-ness would have been lost, and the amount of code required is not prohibitively large.

In general the Cocoa/Mac programming model is vastly nicer than win32 or POSIX/Linux. It does not carry 30 years of baggage, never refers to 16 bit OSs in in the docs, and provides easy to use APIs for common operations. Using the native API does not take more than a day or two to setup and forever after allows exact control and the ability to debug platform-specific bugs.

Once exception is the OSX Gamepad APIs which are unforgivably convoluted. Reassembly uses SDL to handle gamepads on OSX.

Command Line Builds

Xcode allows building projects from the command line. This is frequently convenient from e.g. emacs or as part of a release script.

xcodebuild -scheme "Reassembly Develop NoResources" \
    -workspace Reassembly.xcodeproj/project.xcworkspace \
    -jobs 4 -parallelizeTargets ONLY_ACTIVE_ARCH=YES

Note that, unlike with visual studio, xcode command line builds do not build the same target as the IDE and so you can’t do a command line build then switch to the IDE for debugging without doing an IDE build. Set the command line config for the non-root project in Project Settings -> Info.

For better interoperability with non-Xcode tools you can set the output directory of the compile .app in Xcode settings (command-,) under the Locations tab (see Derived Data).

Nested Projects

Screen Shot 2015-01-25 at 4.19.36 AM
A little-known feature of Xcode is the ability to add .xcodeproj files as subprojects to your main project file. This is analogous to adding various visual studio projects to the solution file. I generally find it easier to integrate 3rd party code as source vs. using static libs, dlls, dylibs, .so, etc. Creating or adding a separate .xcodeproj per library allows me to specify different compile time options for this code.

Crash Handling

OSX is a POSIX compatible OS which means we can use unix signals to handle Segmentation Faults and other joyous occurrences. Automatically handling game crashes and uploading stack traces to my webserver has been a powerful tool in increasing Reassembly stability, and probably a topic for another article in and of itself. Most of my OSX crash handling code was shamelessly copied from this article on Atomic Objects. Make sure to disable the signal handlers in debug/develop builds or the debugger won’t break on crashes.

The standard backtrace and backtrace_symbols calls work from signal handlers on OSX as expected. backtrace_symbols works even on release binaries and unlike Linux does not require the -rdynamic linker option.

Here is code for printing out some relevant registers on OSX or Linux after catching a SIGSEGV or similar:

    const ucontext_t *ctx = (ucontext_t*)context;
    const mcontext_t &mcontext = ctx->uc_mcontext;

#if __APPLE__
    const uint ecode = mcontext->__es.__err;
    const greg_t ecode = mcontext.gregs[REG_ERR];
    string msg0 = str_format("Invalid %s to %p", (ecode&4) ? "Exec" : (ecode&2) ? "Write" : "Read", siginfo->si_addr);
    message += msg0 + "\n";

    string mmsg;
#if __APPLE__
#ifdef __LP64__
    mmsg = str_format("PC/RIP: %#llx SP/RSP: %#llx, FP/RBP: %#llx",
                      mcontext->__ss.__rip, mcontext->__ss.__rsp, mcontext->__ss.__rbp);
    mmsg = str_format("PC/EIP: %#x SP/ESP: %#x, FP/EBP: %#x",
                      mcontext->__ss.__eip, mcontext->__ss.__esp, mcontext->__ss.__ebp);
#ifdef __LP64__
    mmsg = str_format("PC/RIP: %#llx SP/RSP: %#llx, FP/RBP: %#llx",
                      mcontext.gregs[REG_RIP], mcontext.gregs[REG_RSP], mcontext.gregs[REG_RBP]);
    mmsg = str_format("PC/EIP: %#x SP/ESP: %#x, FP/EBP: %#x",
                      mcontext.gregs[REG_EIP], mcontext.gregs[REG_ESP], mcontext.gregs[REG_EBP]);


By default c++11 std::threads created under OSX have some ridiculously tiny amount of stack space. I use pthreads directly to create threads to overcome this – std::mutex and friends work fine with pthread threads (std::thread wraps pthreads anyway under OSX).

You can also set the current thread name such that it will show up in the Xcode thread view via a non-standard pthread API.

Creating Pthread with increased stack size

#if __APPLE__

typedef pthread_t OL_Thread;
#define THREAD_IS_SELF(B) (pthread_self() == (B))
#define THREAD_ALIVE(B) (B)

OL_Thread thread_create(void *(*start_routine)(void *), void *arg)
    int            err = 0;
    pthread_attr_t attr;
    pthread_t      thread;

    err = pthread_attr_init(&attr);
    if (err)
        ReportMessagef("pthread_attr_init error: %s", strerror(err));
    err = pthread_attr_setstacksize(&attr, 8 * 1024 * 1024);
    if (err)
        ReportMessagef("pthread_attr_setstacksize error: %s", strerror(err));
    err = pthread_create(&thread, &attr, start_routine, arg);
    if (err)
        ReportMessagef("pthread_create error: %s", strerror(err));
    return thread;

void thread_join(OL_Thread &thread)
    if (!thread)
    int status = pthread_join(thread, NULL);
    ASSERTF(status == 0, "pthread_join: %s", strerror(status));


typedef std::thread OL_Thread;
#define THREAD_IS_SELF(B) (std::this_thread::get_id() == (B).get_id())
#define THREAD_ALIVE(B) ((B).joinable()) 

OL_Thread thread_create(void *(*start_routine)(void *), void *arg)
    return std::thread(start_routine, arg);

void thread_join(OL_Thread& thread)
    if (!thread.joinable())
    try {
    } catch (std::exception &e) {
        ASSERT_FAILED("std::thread::join()", "%s", e.what());


Naming threads

You can use the pthreads API to set the current thread name, and this name will show up in the Xcode debugger which is extremely helpful. Unfortunately the non-standard pthreads API for this is different across OSX and Linux (and of course windows) – code for all platforms follows, this works in GDB and Visual Studio.

#if _WIN32
// Usage: SetThreadName (-1, "MainThread");
const DWORD MS_VC_EXCEPTION = 0x406D1388;

#pragma pack(push,8)
typedef struct tagTHREADNAME_INFO
    DWORD dwType; // Must be 0x1000.
    LPCSTR szName; // Pointer to name (in user addr space).
    DWORD dwThreadID; // Thread ID (-1=caller thread).
    DWORD dwFlags; // Reserved for future use, must be zero.
#pragma pack(pop)

void SetThreadName(DWORD dwThreadID, const char* threadName)
    info.dwType = 0x1000;
    info.szName = threadName;
    info.dwThreadID = dwThreadID;
    info.dwFlags = 0;

        RaiseException(MS_VC_EXCEPTION, 0, sizeof(info) / sizeof(ULONG_PTR), (ULONG_PTR*)&info);


void thread_setup(const char* name)
    uint64 tid = 0;
#if _WIN32
    tid = GetCurrentThreadId();
    SetThreadName(tid, name);
#elif __APPLE__
    pthread_threadid_np(pthread_self(), &tid);
#else // linux
    tid = pthread_self();
    int status = 0;
    // 16 character maximum!
    if ((status = pthread_setname_np(pthread_self(), name)))
        ReportMessagef("pthread_setname_np(pthread_t, const char*) failed: %s", strerror(status));


All of the code above is part of Reassembly and is available in context on my Outlaws core github project – see the os/osx directory for mac specific code.

This one macro trick for easy data definitions

Commonly cited advantages of dynamic languages like Javascript or Python over C/C++ are the ability to easily mix data definitions with code in source files and to introspect classes and other code structures as if they were data. Standard C++ syntax also often requires duplicate and otherwise annoyingly verbose syntax when declaring many similar objects.

Thankfully, as with many other problems, we can overcome these limitations by creatively using the C preprocessor. Contrary to prevailing C++ doctrine, I believe that cpp macros can be the most concise, easy to understand, and high performance way to solve certain problems. With a little functional macro programming we can define reflective structs and enums with zero overhead.

Let’s say we are writing a game about building spaceships out of blocks. We have a class to represent the persistent data for each block, and we want to be able to serialize and deserialize that class, but we also read the fields of this class a gazillion times each frame and need them to be normal struct members – we can’t use a hash table or something instead. We also add or remove members frequently and don’t want to have to change the code in six places every time we do this. Here is a simple way to accomplish this.

(Note: This is simplified code from Reassembly – I hate iostreams and am not actually using them in Reassembly but for the sake of exposition it was the simplest way.)

#define SERIAL_BLOCK_FIELDS(F)                                   \
    F(uint,              ident,                   0)             \
    F(float2,            offset,                  float2(0.f))   \
    F(float,             angle,                   0.f)           \
    F(uchar,             blockshape,              0)             \
    F(uchar,             blockscale,              1)             \
    F(FeatureEnum,       features,                0)             \

    if ((NAME) != (DEFAULT)) os << #NAME << "=" << NAME << ",";

struct SerialBlock {
    std::ostream& operator<<(std::ostream &os)
        os << "{";
        os << "}";
        return os;

That’s it. Now we can add and remove block fields without having to update the serialization routine. We can manipulate the struct fields any way we want by writing new macros and passing them into the field declaration macro. Useful examples include parsing, resetting fields to default, operator==, or listing struct fields for purposes of tab completion.

Another useful application is reflective enums/bitfields.

#define BLOCK_FEATURES(F)                         \
    F(COMMAND,        uint64(1)<<0)             \
    F(THRUSTER,       uint64(1)<<1)             \
    F(GENERATOR,      uint64(1)<<2)             \
    F(TURRET,         uint64(1)<<3)             \

#define SERIAL_TO_ENUM(X, V) X=V,

At the risk of pissing off both the Effective C++ crowd and the hardcore C crowd I will introduce an elaboration using templates and the visitor design pattern for better generality. We define a function called getField(object, name) that returns the value of a reflective struct field given by name. We can use the same technique to parse/serialize arbitrary structs, list members, generate UI for editing structs, etc.

The actual game code for this is on my github repo, together with some convenient macros for defining reflective structs and enums.

    vis.visit(#NAME, (NAME), TYPE(DEFAULT)) &&

struct SerialBlock {
    template <typename V>
    bool accept(V& vis)

template <typename T>
struct GetFieldVisitor {
    T*           value;
    const string field;

    GetFieldVisitor(const char* name) : value(), field(name) {}

    bool visit(const char* name, T& val, const T& def=T())
        if (name != field)
            return true;
        value = &val;
        return false;

    template <typename U>
    bool visit(const char* name, U& val, const U& def=U())
        return true;

// get a reference to a field in OBJ named FIELD (the same as getattr in Python).
// will crash if field does not exist or type is slightly wrong
// use like getField(block, "angle")
template <typename T>
U& getField(T& obj, const char* field)
    GetFieldVisitor<U> vs(field);
    return *vs.value;

How to fix color banding with dithering

Update: Since found a much more in-depth presentation on this topic titled banding in games by Mikkel Gjøl.

Color Banding

Computers represent colors using finite precision – 24 bits per pixel (bpp) is standard today, with 8 bits for each of red, green, and blue. This gives us 16 million total colors, but only 256 shades for any single hue. This is typically not a problem with photographs, but the discontinuities between representable colors can become jarringly visible on gradients of a single color. This artifact is called color banding.

Space games often have dark gradient backgrounds and thus suffer from visible color banding. Games following in the tradition of Homeworld 2’s gorgeous vertex color skyboxes are particularly afflicted compared to games with texture art because the gradient is mathematically perfect and there is no noise to obscure the color bands.

Here are some screenshots from a few games showing the effect. Make sure to click through to the full size image and verify that the color bands are indeed typically only one bit apart (DigitalColor Meter on OSX is great for this). The color banding is easier to see in a dark room.

Homeworld 2 (R.E.A.R.M.)

Homeworld 2 (R.E.A.R.M.)

Obviously these games are still incredibly beautiful! I just wanted to point out that many popular games exhibit visible color banding despite the existence of well understood solutions. Color banding in Reassembly bothered me enough to fix, and I thought that the solution was simple and effective enough that it should be more widely known.


As mentioned above, color banding is caused by 24 bit color being unable to perfectly represent a gradient – the limit of color resolution. We can increase color resolution at the expense of spacial resolution via a process called Dithering. Since we are just trying to draw a smooth gradient and don’t care about spacial resolution, this is great. Dithering takes advantage of the fact that a grid of alternating black and white pixels looks grey. Please read the wikipedia article for a full explanation, and see also Pointalism for an early application.


Bayer Matrix

There are a lot of fancy dithering algorithms but I chose to implement Ordered Dithering via a Bayer Matrix because it can be done efficiently in the fragment shader. The basic idea is to add a small value to every pixel right before it is quantized (i.e. converted from the floating point representation used in the shader to 8 bits per channel in the framebuffer). The idea is that the least significant bits of the color that would ordinarily get thrown out are combined with this added value and cause the pixel to have a chance of rounding differently than nearby pixels. Bayer Dithering takes these values from an 8×8 matrix which is tiled across the image.

I store the Bayer Matrix in a texture which I sample at the end of my fragment shader. Here is the code to generate the texture. Note that we enable texture wrapping and nearest neighbor sampling and are using a one channel texture.

static const char pattern[] = {
    0, 32,  8, 40,  2, 34, 10, 42,   /* 8x8 Bayer ordered dithering  */
    48, 16, 56, 24, 50, 18, 58, 26,  /* pattern.  Each input pixel   */
    12, 44,  4, 36, 14, 46,  6, 38,  /* is scaled to the 0..63 range */
    60, 28, 52, 20, 62, 30, 54, 22,  /* before looking in this table */
    3, 35, 11, 43,  1, 33,  9, 41,   /* to determine the action.     */
    51, 19, 59, 27, 49, 17, 57, 25,
    15, 47,  7, 39, 13, 45,  5, 37,
    63, 31, 55, 23, 61, 29, 53, 21 };

GLuint tex_name = 0;
glGenTextures(1, &tex_name);
glBindTexture(GL_TEXTURE_2D, tex_name);
             GL_UNSIGNED_BYTE, pattern);

Then at the end of the fragment shader add the scaled dither texture to the fragment color. I don’t fully understand the 32.0 divisor here – I think 64 is the correct value but 32 (or even 16) looks much better.

gl_FragColor += vec4(texture2D(dither_sampler, gl_FragCoord.xy / 8.0).r / 32.0 - (1.0 / 128.0));

That’s it.

It’s important that this happens in a shader where the full gradient precision is available – if you do it in a post processing shader reading from a 24 bit color buffer it won’t work. In Reassembly I actually do it in two different places – in the tonemapping shader which reads from a floating point render texture and in the shader that draws the Worley background.


Background halo with color banding


background halo with dithering


shield with color banding


shield with dithering

How to prevent dangling pointers to deleted game objects in C++

Early versions of Reassembly were plagued with crashes due to game object lifetime problems. For example, the object representing the AI for a first spaceship would have a pointer to the enemy spaceship it was targeting. If the targeted spaceship was destroyed (and the object was deleted) the first spaceship could cause a crash the next time its AI ran. Alternatively, various parts of the user interface reference the player’s spaceship object and then crash when the player spaceship is destroyed and its object deleted.

There are a lot of potential solutions to this problem. We can delay deletion of doomed objects for a few frames and any code that references these objects can check if they are still alive before using them. We can traverse any objects that might reference a game object every time a game object is deleted and NULL any pointers to the deleting object. When there are many types of game objects and they all reference each other in erratic and complicated ways this can quickly become burdensome.

My eventual solution was inspired by an article on Coding Wisdom recommending the use of “Watchers” and arguing against reference counting smart pointers. To quote from the article:

  1. Create a base class “Watchable” that you derive from on objects that should broadcast when they’re being deleted.  The Watchable object keeps track of other objects pointing at it.
  2. Create a “Watcher” smart pointer that, when assigned to, adds itself to the list of objects to be informed when its target goes away.

This is the sort of thing that is probably common knowledge among AAA game programmers but was not obvious to me. I implemented this suggestion and it has worked out really well.


My version is available on on my github in stl_ext.h

  1. Game objects that will be pointed to should inherit from Watchable.
  2. Objects that store a reference to game objects should declare the pointer as watch_ptr<GameObject> m_ptr;. m_ptr will automatically become NULL when the pointee is deleted. watch_ptr is a smart pointer so use is the same as GameObject* m_ptr
  3. watch_ptrs are only used when the pointed object may be deleted while the pointer is stored.

It works by adding each watch_ptr to a doubly linked list, with the list pointers stored in the watch_ptr itself. When the watch_ptr destructs, it removes itself from the list. When the Watchable object destructs, it traverses the list and NULLs all the pointers.


Obviously the doubly linked list traversal is not thread safe. Reassembly uses two main threads, a render thread and a simulation/event thread, with a few more workers. There are a few rules for safe usage.

  1. When the update thread is done with the object, NULL all references to it (call nullReferencesTo()). This will prevent any further references to the object from that thread.
  2. Only delete game objects that need to be referenced by the render thread from the render thread. This is often necessary anyway because destroying these objects can also delete OpenGL buffers, which must be done from the render thread. I push ready-to-delete objects in the update thread to a queue, then delete at the end of the render thread frame.
  3. Always copy the watch_ptr to a normal pointer before checking NULLness when using from the render thread. This will prevent the update thread from NULLing the pointer after the NULL check but before the render thread is done with it. Since the render thread will not delete the object until the end of the frame, we don’t have to worry about referencing free’d memory


I published some of the utility code for Gamma Void on github under the MIT License. This is not intended to be a packaged library, but rather a collection of useful snippets that would have saved me a lot of time if I had found them in a github repository somewhere. The repository contains about 10k lines of code, about a fifth of the total Gamma Void source, and pretty much all the code that is not game-specific.



  • polygon intersections, interpolation, vector helpers, ternary digits, and more in Geometry.h
  • fast 2d spacial hash in SpacialHash.h
  • a class ‘lstring’ for fast symbol manipulation and easy to use string utils in Str.h
  • ‘copy_ptr’ and ‘watch_ptr’ smart pointers for managing sparse structs and automatically nulling object references on deletion, respectively, in stl_ext.h
  • C++ wrappers for OpenGL buffers and polygon drawing code in Graphics.h
  • Fast shader powered particle system in Particles.h
  • Color transformation utilities in RGB.h
  • mac/win/linux platform layer in Outlaws.h (implementations in ‘os’ directory)


HDR effects

This week I’ve been working on making explosions and weapon effects cooler. I ended up using some HDR lighting techniques in an interesting non-traditional way so I thought I would write up some of the things I learned. The main message is (1) particles rock (2) use premultiplied alpha and (3) make the brightest, most fully saturated parts of additively blended particle effects white to make them appear brighter. There is an easy way to do this in a post-processing “tonemap” shader.

Screen Shot 2013-12-19 at 2.03.39 PM


Particle systems are an easy and versatile way to add special effects to your game. Lutz Latta has a good technology overview over on Gamasutra. Relatively high performance particle systems do not have to be complex: My particle system supports a quarter million particles in about 300 lines of code. It is stateless (position is a function of time) and moves the particles in the vertex shader. Particles (vertex attributes) are stored and allocated in a single large circular buffer so I only need to do a single glBufferSubData to send new particles to the GPU every frame and a single draw call to draw it.

Smoke and Fire

Both smoke and fire contribute to that explodey look. Dark smoke in the background provides contrast to make the sparks really stand out. We want areas with a lot of fire particles to appear brighter than areas with only a few fire particles, whereas adding more smoke to a smokey area does not increase brightness.

In graphics parlance the fire particles are “additively blended”, which means that we just directly add the fire color to whatever was underneath the fire. In OpenGL this translates to glBlendFunc(GL_ONE, GL_ONE) (i.e. the final color is 1 * particle color + 1 * original color).

Smoke on the other hand can be modeled as a more typical semi-transparent material. Smoke particles do obscure underneath pixels. In OpenGL this is implemented with glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) (i.e. the final color is alpha * particle color + (1-alpha) * the original color). What if we want to support both types of particles without switching the blending function?

The solution is to use glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA). To support additive blending (fire), we set the alpha to zero. To support transparency (smoke), we set alpha to the transparency we want and then premultiply the RGB components by the alpha value. This gives the same image as the above blending functions but is more flexible. Scaling the RGB components like this is called “premultiplied alpha” and Tom Forsynth has a lucid explanation of its many uses on his blog. With premultiplied alpha you can think of the color components (rgb) as adding light to the image and the alpha component (a) as blocking light.

High Dynamic Range

From wikipedia:

Tone mapping is a technique used in image processing and computer graphics to map one set of colors to another in order to approximate the appearance of high dynamic range images in a medium that has a more limited dynamic range.

What happens when you have a bunch of fire particles on top of each other? Since they are additively blended, if you are using a typical framebuffer (8 bits each of red, green, and blue), the color components are just added together and then clamped. This quickly results in fully saturated colors as used color components (i.e. red and green for orange fire) are clamped to 255 and unused components (blue) remain at zero.

The basic idea in HDR lighting is to first draw everything to a floating point framebuffer (e.g. GL_RGBA16F_ARB, where colors may be brighter than 1.0) to avoid loosing information to clamping, then apply a tonemapping operator to squish the dynamic range back down to 8 bits per channel for display.

Typically in photographic or photorealistic applications this means making the darkest parts of the image brighter and the brightest parts darker so that detail is preserved everywhere. In my case I just wanted to make the brightest parts of the image bright white. This is my tonemapping operator (in GLSL). It just checks if any components are too bright to directly display, and if so blends the other components towards white (remember that each component is clamped to 1.0 when written to a non-floating point framebuffer).

float mx = max(color.r, max(color.g, color.b));
if (mx > 1.0) {
    color.rgb += vec3(mx - 1.0);

In the top left image, you can see that the brightest part of the explosion is pure yellow. This is because the orange and red explosion particles contain only red and green channels with no blue channel, so no amount of adding and clamping will result in white. The right images are using my desaturating tone-mapping operator to make the brightest parts white. It looks brighter right?

explosion with clamping explosion with desaturate tonemapping
shooting with clamping shooting with desaturate tonemapping