Xcode/Clang gamedev tips

Reassembly was written primarily on OSX using Xcode for compiling/debugging and emacs for text editing. Xcode is clearly designed primarily for developing iOS “apps” and there are several non-obvious settings that vastly improve the desktop game development experience.

My general experience with Xcode has been fairly positive. I think Visual Studio is a slightly better debugger overall but not enough to make me abandon OSX. As a solo developer I need to spend a fair amount of time in cafes and other public places to avoid going insane and as far as I can tell Apple makes the best laptops.

My OSX builds are all 64 bit and have a deployment target of OSX 10.7. The following was written with reference to Xcode 6.1.1.

~/.lldbinit

This config file is read by lldb, the llvm debugger, which is the backend for the xcode debugger. Settings effect the xcode visual debugger window in addition to the console.

Print 64 bit unsigned values in hex and show a summary for vector types.

type format add -f hex uint64 "unsigned long long"
type summary add --inline-children --omit-names float2 double2 cpVect int2 float3

Print a summary for std::pair types (particularly useful for browsing std::map and std::unordered_map).

type summary add --inline-children --omit-names -x "^std::pair<.+>(( )?&)?$"

Print the address, “name” and “blocks” fields of custom type BlockCluster.

type summary add --summary-string "{addr=${var%V} name=${var.name} ${var.blocks}}" BlockCluster

One warning – invalid summary strings often crash Xcode. Restarting the debugging sessions is enough to reload .lldbinit after changes.

Develop No-resource build

For rapid iteration with good performance, I created a “Develop NoResource” target. This build has optimizations enabled (-O3), but no linktime codegen (which takes a long time). While iterating on code remove the main data directory from the “Copy Bundle Resources” build phase to prevent it from being copied every time. As long as you don’t clean previously copied resources will remain in the app directory.

In general lldb is “OK” at debugging optimized builds. It frequently looses the “this” pointer and stack variables but is nonetheless usually useful enough to get the job done. This is one of the areas where Visual Studio really shines, particularly with the magic option /d2Zi+ – see this Random ASCII post. Unfortunately C++ Debug builds, which the debugger readily understands, are usually too slow to be useful.

I have not been able to get the lldb debugger to call inlined stl symbols (e.g. std::vector::size) from the command line in optimized builds. If anyone knows how to do this please tell me.

Platform layers

I haven’t been able to completely abstract OS details into a third party platform independent layer like SDL. Reassembly uses SDL for window and event handling on windows and linux but is pure Cocoa on OSX. The Cocoa programming model is different enough from SDL that some level of native-ness would have been lost, and the amount of code required is not prohibitively large.

In general the Cocoa/Mac programming model is vastly nicer than win32 or POSIX/Linux. It does not carry 30 years of baggage, never refers to 16 bit OSs in in the docs, and provides easy to use APIs for common operations. Using the native API does not take more than a day or two to setup and forever after allows exact control and the ability to debug platform-specific bugs.

Once exception is the OSX Gamepad APIs which are unforgivably convoluted. Reassembly uses SDL to handle gamepads on OSX.

Command Line Builds

Xcode allows building projects from the command line. This is frequently convenient from e.g. emacs or as part of a release script.

xcodebuild -scheme "Reassembly Develop NoResources" \
    -workspace Reassembly.xcodeproj/project.xcworkspace \
    -jobs 4 -parallelizeTargets ONLY_ACTIVE_ARCH=YES

Note that, unlike with visual studio, xcode command line builds do not build the same target as the IDE and so you can’t do a command line build then switch to the IDE for debugging without doing an IDE build. Set the command line config for the non-root project in Project Settings -> Info.

For better interoperability with non-Xcode tools you can set the output directory of the compile .app in Xcode settings (command-,) under the Locations tab (see Derived Data).

Nested Projects

Screen Shot 2015-01-25 at 4.19.36 AM
A little-known feature of Xcode is the ability to add .xcodeproj files as subprojects to your main project file. This is analogous to adding various visual studio projects to the solution file. I generally find it easier to integrate 3rd party code as source vs. using static libs, dlls, dylibs, .so, etc. Creating or adding a separate .xcodeproj per library allows me to specify different compile time options for this code.

Crash Handling

OSX is a POSIX compatible OS which means we can use unix signals to handle Segmentation Faults and other joyous occurrences. Automatically handling game crashes and uploading stack traces to my webserver has been a powerful tool in increasing Reassembly stability, and probably a topic for another article in and of itself. Most of my OSX crash handling code was shamelessly copied from this article on Atomic Objects. Make sure to disable the signal handlers in debug/develop builds or the debugger won’t break on crashes.

The standard backtrace and backtrace_symbols calls work from signal handlers on OSX as expected. backtrace_symbols works even on release binaries and unlike Linux does not require the -rdynamic linker option.

Here is code for printing out some relevant registers on OSX or Linux after catching a SIGSEGV or similar:

    const ucontext_t *ctx = (ucontext_t*)context;
    const mcontext_t &mcontext = ctx->uc_mcontext;

#if __APPLE__
    const uint ecode = mcontext->__es.__err;
#else
    const greg_t ecode = mcontext.gregs[REG_ERR];
#endif
    string msg0 = str_format("Invalid %s to %p", (ecode&4) ? "Exec" : (ecode&2) ? "Write" : "Read", siginfo->si_addr);
    ReportPOSIX(msg0);
    message += msg0 + "\n";

    string mmsg;
#if __APPLE__
#ifdef __LP64__
    mmsg = str_format("PC/RIP: %#llx SP/RSP: %#llx, FP/RBP: %#llx",
                      mcontext->__ss.__rip, mcontext->__ss.__rsp, mcontext->__ss.__rbp);
#else
    mmsg = str_format("PC/EIP: %#x SP/ESP: %#x, FP/EBP: %#x",
                      mcontext->__ss.__eip, mcontext->__ss.__esp, mcontext->__ss.__ebp);
#endif
#else
#ifdef __LP64__
    mmsg = str_format("PC/RIP: %#llx SP/RSP: %#llx, FP/RBP: %#llx",
                      mcontext.gregs[REG_RIP], mcontext.gregs[REG_RSP], mcontext.gregs[REG_RBP]);
#else
    mmsg = str_format("PC/EIP: %#x SP/ESP: %#x, FP/EBP: %#x",
                      mcontext.gregs[REG_EIP], mcontext.gregs[REG_ESP], mcontext.gregs[REG_EBP]);
#endif
#endif
    ReportPOSIX(mmsg);

Threading

By default c++11 std::threads created under OSX have some ridiculously tiny amount of stack space. I use pthreads directly to create threads to overcome this – std::mutex and friends work fine with pthread threads (std::thread wraps pthreads anyway under OSX).

You can also set the current thread name such that it will show up in the Xcode thread view via a non-standard pthread API.

Creating Pthread with increased stack size

#if __APPLE__

typedef pthread_t OL_Thread;
#define THREAD_IS_SELF(B) (pthread_self() == (B))
#define THREAD_ALIVE(B) (B)

OL_Thread thread_create(void *(*start_routine)(void *), void *arg)
{
    int            err = 0;
    pthread_attr_t attr;
    pthread_t      thread;

    err = pthread_attr_init(&attr);
    if (err)
        ReportMessagef("pthread_attr_init error: %s", strerror(err));
    err = pthread_attr_setstacksize(&attr, 8 * 1024 * 1024);
    if (err)
        ReportMessagef("pthread_attr_setstacksize error: %s", strerror(err));
    err = pthread_create(&thread, &attr, start_routine, arg);
    if (err)
        ReportMessagef("pthread_create error: %s", strerror(err));
    return thread;
}


void thread_join(OL_Thread &thread)
{
    if (!thread)
        return;
    int status = pthread_join(thread, NULL);
    ASSERTF(status == 0, "pthread_join: %s", strerror(status));
}

#else

typedef std::thread OL_Thread;
#define THREAD_IS_SELF(B) (std::this_thread::get_id() == (B).get_id())
#define THREAD_ALIVE(B) ((B).joinable()) 

OL_Thread thread_create(void *(*start_routine)(void *), void *arg)
{
    return std::thread(start_routine, arg);
}

void thread_join(OL_Thread& thread)
{
    if (!thread.joinable())
        return;
    try {
        thread.join();
    } catch (std::exception &e) {
        ASSERT_FAILED("std::thread::join()", "%s", e.what());
    }
}

#endif

Naming threads

You can use the pthreads API to set the current thread name, and this name will show up in the Xcode debugger which is extremely helpful. Unfortunately the non-standard pthreads API for this is different across OSX and Linux (and of course windows) – code for all platforms follows, this works in GDB and Visual Studio.

#if _WIN32
//
// Usage: SetThreadName (-1, "MainThread");
//
const DWORD MS_VC_EXCEPTION = 0x406D1388;

#pragma pack(push,8)
typedef struct tagTHREADNAME_INFO
{
    DWORD dwType; // Must be 0x1000.
    LPCSTR szName; // Pointer to name (in user addr space).
    DWORD dwThreadID; // Thread ID (-1=caller thread).
    DWORD dwFlags; // Reserved for future use, must be zero.
} THREADNAME_INFO;
#pragma pack(pop)

void SetThreadName(DWORD dwThreadID, const char* threadName)
{
    THREADNAME_INFO info;
    info.dwType = 0x1000;
    info.szName = threadName;
    info.dwThreadID = dwThreadID;
    info.dwFlags = 0;

    __try
    {
        RaiseException(MS_VC_EXCEPTION, 0, sizeof(info) / sizeof(ULONG_PTR), (ULONG_PTR*)&info);
    }
    __except (EXCEPTION_EXECUTE_HANDLER)
    {
    }
}

#endif

void thread_setup(const char* name)
{
    uint64 tid = 0;
#if _WIN32
    tid = GetCurrentThreadId();
    SetThreadName(tid, name);
#elif __APPLE__
    pthread_threadid_np(pthread_self(), &tid);
    pthread_setname_np(name);
#else // linux
    tid = pthread_self();
    int status = 0;
    // 16 character maximum!
    if ((status = pthread_setname_np(pthread_self(), name)))
        ReportMessagef("pthread_setname_np(pthread_t, const char*) failed: %s", strerror(status));
#endif
}

Github

All of the code above is part of Reassembly and is available in context on my Outlaws core github project – see the os/osx directory for mac specific code.

This one macro trick for easy data definitions

Commonly cited advantages of dynamic languages like Javascript or Python over C/C++ are the ability to easily mix data definitions with code in source files and to introspect classes and other code structures as if they were data. Standard C++ syntax also often requires duplicate and otherwise annoyingly verbose syntax when declaring many similar objects.

Thankfully, as with many other problems, we can overcome these limitations by creatively using the C preprocessor. Contrary to prevailing C++ doctrine, I believe that cpp macros can be the most concise, easy to understand, and high performance way to solve certain problems. With a little functional macro programming we can define reflective structs and enums with zero overhead.

Let’s say we are writing a game about building spaceships out of blocks. We have a class to represent the persistent data for each block, and we want to be able to serialize and deserialize that class, but we also read the fields of this class a gazillion times each frame and need them to be normal struct members – we can’t use a hash table or something instead. We also add or remove members frequently and don’t want to have to change the code in six places every time we do this. Here is a simple way to accomplish this.

(Note: This is simplified code from Reassembly – I hate iostreams and am not actually using them in Reassembly but for the sake of exposition it was the simplest way.)

#define SERIAL_BLOCK_FIELDS(F)                                   \
    F(uint,              ident,                   0)             \
    F(float2,            offset,                  float2(0.f))   \
    F(float,             angle,                   0.f)           \
    F(uchar,             blockshape,              0)             \
    F(uchar,             blockscale,              1)             \
    F(FeatureEnum,       features,                0)             \
    ...

#define SERIAL_TO_STRUCT_FIELD(TYPE, NAME, DEFAULT) \
    TYPE NAME = DEFAULT;
#define SERIAL_WRITE_STRUCT_MEMBER(_TYPE, NAME, DEFAULT) \
    if ((NAME) != (DEFAULT)) os << #NAME << "=" << NAME << ",";

struct SerialBlock {
    SERIAL_BLOCK_FIELDS(SERIAL_TO_STRUCT_FIELD);
    std::ostream& operator<<(std::ostream &os)
    {
        os << "{";
        SERIAL_BLOCK_FIELDS(SERIAL_WRITE_STRUCT_MEMBER);
        os << "}";
        return os;
    }
    ...
};

That’s it. Now we can add and remove block fields without having to update the serialization routine. We can manipulate the struct fields any way we want by writing new macros and passing them into the field declaration macro. Useful examples include parsing, resetting fields to default, operator==, or listing struct fields for purposes of tab completion.

Another useful application is reflective enums/bitfields.

#define BLOCK_FEATURES(F)                         \
    F(COMMAND,        uint64(1)<<0)             \
    F(THRUSTER,       uint64(1)<<1)             \
    F(GENERATOR,      uint64(1)<<2)             \
    F(TURRET,         uint64(1)<<3)             \

#define SERIAL_TO_ENUM(X, V) X=V,
enum FeatureEnum { BLOCK_FEATURES(SERIAL_TO_ENUM) }

At the risk of pissing off both the Effective C++ crowd and the hardcore C crowd I will introduce an elaboration using templates and the visitor design pattern for better generality. We define a function called getField(object, name) that returns the value of a reflective struct field given by name. We can use the same technique to parse/serialize arbitrary structs, list members, generate UI for editing structs, etc.

The actual game code for this is on my github repo, together with some convenient macros for defining reflective structs and enums.

#define SERIAL_VISIT_FIELD_AND(TYPE, NAME, DEFAULT) \
    vis.visit(#NAME, (NAME), TYPE(DEFAULT)) &&

struct SerialBlock {
    ...
    template <typename V>
    bool accept(V& vis)
    {
        return SERIAL_BLOCK_FIELDS(SERIAL_VISIT_FIELD_AND) true;
    }
};

template <typename T>
struct GetFieldVisitor {
    T*           value;
    const string field;

    GetFieldVisitor(const char* name) : value(), field(name) {}

    bool visit(const char* name, T& val, const T& def=T())
    {
        if (name != field)
            return true;
        value = &val;
        return false;
    }

    template <typename U>
    bool visit(const char* name, U& val, const U& def=U())
    {
        return true;
    }
};

// get a reference to a field in OBJ named FIELD (the same as getattr in Python).
// will crash if field does not exist or type is slightly wrong
// use like getField(block, "angle")
template <typename T>
U& getField(T& obj, const char* field)
{
    GetFieldVisitor<U> vs(field);
    obj.accept(vs);
    return *vs.value;
}

How to fix color banding with dithering

Update: Since found a much more in-depth presentation on this topic titled banding in games by Mikkel Gjøl.

Color Banding

Computers represent colors using finite precision – 24 bits per pixel (bpp) is standard today, with 8 bits for each of red, green, and blue. This gives us 16 million total colors, but only 256 shades for any single hue. This is typically not a problem with photographs, but the discontinuities between representable colors can become jarringly visible on gradients of a single color. This artifact is called color banding.

Space games often have dark gradient backgrounds and thus suffer from visible color banding. Games following in the tradition of Homeworld 2’s gorgeous vertex color skyboxes are particularly afflicted compared to games with texture art because the gradient is mathematically perfect and there is no noise to obscure the color bands.

Here are some screenshots from a few games showing the effect. Make sure to click through to the full size image and verify that the color bands are indeed typically only one bit apart (DigitalColor Meter on OSX is great for this). The color banding is easier to see in a dark room.

Homeworld 2 (R.E.A.R.M.)

Homeworld 2 (R.E.A.R.M.)

Obviously these games are still incredibly beautiful! I just wanted to point out that many popular games exhibit visible color banding despite the existence of well understood solutions. Color banding in Reassembly bothered me enough to fix, and I thought that the solution was simple and effective enough that it should be more widely known.

Dithering

As mentioned above, color banding is caused by 24 bit color being unable to perfectly represent a gradient – the limit of color resolution. We can increase color resolution at the expense of spacial resolution via a process called Dithering. Since we are just trying to draw a smooth gradient and don’t care about spacial resolution, this is great. Dithering takes advantage of the fact that a grid of alternating black and white pixels looks grey. Please read the wikipedia article for a full explanation, and see also Pointalism for an early application.

dither

Bayer Matrix

There are a lot of fancy dithering algorithms but I chose to implement Ordered Dithering via a Bayer Matrix because it can be done efficiently in the fragment shader. The basic idea is to add a small value to every pixel right before it is quantized (i.e. converted from the floating point representation used in the shader to 8 bits per channel in the framebuffer). The idea is that the least significant bits of the color that would ordinarily get thrown out are combined with this added value and cause the pixel to have a chance of rounding differently than nearby pixels. Bayer Dithering takes these values from an 8×8 matrix which is tiled across the image.

I store the Bayer Matrix in a texture which I sample at the end of my fragment shader. Here is the code to generate the texture. Note that we enable texture wrapping and nearest neighbor sampling and are using a one channel texture.

static const char pattern[] = {
    0, 32,  8, 40,  2, 34, 10, 42,   /* 8x8 Bayer ordered dithering  */
    48, 16, 56, 24, 50, 18, 58, 26,  /* pattern.  Each input pixel   */
    12, 44,  4, 36, 14, 46,  6, 38,  /* is scaled to the 0..63 range */
    60, 28, 52, 20, 62, 30, 54, 22,  /* before looking in this table */
    3, 35, 11, 43,  1, 33,  9, 41,   /* to determine the action.     */
    51, 19, 59, 27, 49, 17, 57, 25,
    15, 47,  7, 39, 13, 45,  5, 37,
    63, 31, 55, 23, 61, 29, 53, 21 };

GLuint tex_name = 0;
glGenTextures(1, &tex_name);
glBindTexture(GL_TEXTURE_2D, tex_name);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, 8, 8, 0, GL_LUMINANCE,
             GL_UNSIGNED_BYTE, pattern);

Then at the end of the fragment shader add the scaled dither texture to the fragment color. I don’t fully understand the 32.0 divisor here – I think 64 is the correct value but 32 (or even 16) looks much better.

gl_FragColor += vec4(texture2D(dither_sampler, gl_FragCoord.xy / 8.0).r / 32.0 - (1.0 / 128.0));

That’s it.

It’s important that this happens in a shader where the full gradient precision is available – if you do it in a post processing shader reading from a 24 bit color buffer it won’t work. In Reassembly I actually do it in two different places – in the tonemapping shader which reads from a floating point render texture and in the shader that draws the Worley background.

[Reassembly]_Screenshot_(20140808)(03.59.40.AM)[606x500]

Background halo with color banding

[Reassembly]_Screenshot_(20140808)(03.59.59.AM)[606x500]

background halo with dithering

[Reassembly]_Screenshot_(20140808)(04.00.35.AM)[606x500]

shield with color banding

[Reassembly]_Screenshot_(20140808)(04.00.45.AM)[606x500]

shield with dithering

How to prevent dangling pointers to deleted game objects in C++

Early versions of Reassembly were plagued with crashes due to game object lifetime problems. For example, the object representing the AI for a first spaceship would have a pointer to the enemy spaceship it was targeting. If the targeted spaceship was destroyed (and the object was deleted) the first spaceship could cause a crash the next time its AI ran. Alternatively, various parts of the user interface reference the player’s spaceship object and then crash when the player spaceship is destroyed and its object deleted.

There are a lot of potential solutions to this problem. We can delay deletion of doomed objects for a few frames and any code that references these objects can check if they are still alive before using them. We can traverse any objects that might reference a game object every time a game object is deleted and NULL any pointers to the deleting object. When there are many types of game objects and they all reference each other in erratic and complicated ways this can quickly become burdensome.

My eventual solution was inspired by an article on Coding Wisdom recommending the use of “Watchers” and arguing against reference counting smart pointers. To quote from the article:

  1. Create a base class “Watchable” that you derive from on objects that should broadcast when they’re being deleted.  The Watchable object keeps track of other objects pointing at it.
  2. Create a “Watcher” smart pointer that, when assigned to, adds itself to the list of objects to be informed when its target goes away.

This is the sort of thing that is probably common knowledge among AAA game programmers but was not obvious to me. I implemented this suggestion and it has worked out really well.

Implementation

My version is available on on my github in stl_ext.h

  1. Game objects that will be pointed to should inherit from Watchable.
  2. Objects that store a reference to game objects should declare the pointer as watch_ptr<GameObject> m_ptr;. m_ptr will automatically become NULL when the pointee is deleted. watch_ptr is a smart pointer so use is the same as GameObject* m_ptr
  3. watch_ptrs are only used when the pointed object may be deleted while the pointer is stored.

It works by adding each watch_ptr to a doubly linked list, with the list pointers stored in the watch_ptr itself. When the watch_ptr destructs, it removes itself from the list. When the Watchable object destructs, it traverses the list and NULLs all the pointers.

Threading

Obviously the doubly linked list traversal is not thread safe. Reassembly uses two main threads, a render thread and a simulation/event thread, with a few more workers. There are a few rules for safe usage.

  1. When the update thread is done with the object, NULL all references to it (call nullReferencesTo()). This will prevent any further references to the object from that thread.
  2. Only delete game objects that need to be referenced by the render thread from the render thread. This is often necessary anyway because destroying these objects can also delete OpenGL buffers, which must be done from the render thread. I push ready-to-delete objects in the update thread to a queue, then delete at the end of the render thread frame.
  3. Always copy the watch_ptr to a normal pointer before checking NULLness when using from the render thread. This will prevent the update thread from NULLing the pointer after the NULL check but before the render thread is done with it. Since the render thread will not delete the object until the end of the frame, we don’t have to worry about referencing free’d memory

Github

I published some of the utility code for Gamma Void on github under the MIT License. This is not intended to be a packaged library, but rather a collection of useful snippets that would have saved me a lot of time if I had found them in a github repository somewhere. The repository contains about 10k lines of code, about a fifth of the total Gamma Void source, and pretty much all the code that is not game-specific.

https://github.com/manylegged/outlaws-core

Highlights:

  • polygon intersections, interpolation, vector helpers, ternary digits, and more in Geometry.h
  • fast 2d spacial hash in SpacialHash.h
  • a class ‘lstring’ for fast symbol manipulation and easy to use string utils in Str.h
  • ‘copy_ptr’ and ‘watch_ptr’ smart pointers for managing sparse structs and automatically nulling object references on deletion, respectively, in stl_ext.h
  • C++ wrappers for OpenGL buffers and polygon drawing code in Graphics.h
  • Fast shader powered particle system in Particles.h
  • Color transformation utilities in RGB.h
  • mac/win/linux platform layer in Outlaws.h (implementations in ‘os’ directory)