Marbling It Up

@MarbleItUp has gone public. We will be launching soon on Switch. I wanted to talk a little bit about the game and how it came to be… and share some sweet GIFs (in high framerate black and white for download size reasons).

riding_dat_spline.gif
An extremely early baby shot of the game. Almost everything here – including the code – was later replaced. And that new stuff runs at 60hz on Switch in handheld mode.

Just over a year ago, Mark Frohnmayer came to me with a proposal – let’s build a sweet marble game and put it on Switch. The Switch at that time was a brand new platform, but the potential was immediately obvious to both of us. I agreed to partner with Mark and all of a sudden – after several years away – I was back in the game development world.

I didn’t know it on day one, but making Marble It Up! was going to be a rare privilege. I am incredibly proud of this game – the team, the levels, the visuals and the gameplay are all amazing, and I count myself privileged to be involved. This is really a great game.

As we near release, I feel confident it will be a hit. I am just hoping it will be blockbuster Michael Bay hit, not a indie niche Kevin Smith hit. Anyway, on with the story…

The Team

Like Danny Ocean, we had to get the old crew together. I already had Miha, an awesome coder, on my team from past projects, so he and I got rolling (hah) right away. Meanwhile, we immediately reached out to Alex, original level designer for Marble Blast, and Todd, one of the most talented (and hardest working) artists I know. We also pulled in the incredible Solovox to make an awesome soundtrack and Mike Jones to do the rest of the audio.

Along the way we ran into long time fans of our work on the original Marble Blast series. We brought in the team behind QuiVR, Jonathan and Fletcher, as well as long time community members, speed runners and developers Matan and Conor.

With the team thus completed, we went into full production. Development slowed down a little bit when a few months into development my Loom technology was acquired and I had to devote my focus to a roll (ho ho ho) with the acquiring company. This ended up being short lived and I was able to refocus my attention after about 10 months of distraction (so to speak).

All of our development was self funded. This gave us total control over direction, but also required all of us to really commit to the game. We’re using a studio model for the game itself – it has its own entity that holds the IP and pays out royalties. Maybe I’ll talk about the basic structure we used for that some day down the road.

Development Path

For this project, we used Unity 2017. It comes free with Switch developer access, and it looked like the best path forward on short notice. I briefly considered using Torque, the first commercial game engine I worked on. Because it was long in the tooth, it would require a solid porting effort before development could begin in earnest. We just didn’t have that luxury.

Although older versions of Unity were shaky, it has really matured in recent years and 2017 was easy to work with. Very quickly we had something up, and I started taking little progress GIFs (in black and white so they’d show up in Slack quickly). We got a ball rolling around in very short order:

InertialBall.gif
Very early rolling behavior.

The movement took a while to get right (more on that later). But ProBuilder made it super easy to throw a level together, and with a little work we got elevators going.

marble_on_elevator.gif
One of the first elevators.
collapsing_platform.gif
And collapsing platforms!

 

Being able to easily tie any geometry into a mover was a huge benefit for level design. It did require some discipline with prefabs to create consistent reusable elevators, but one-off showpieces like those found in Cog Valley or Staying Alive were super easy to make.

The physics simulation was very detailed, and one of the elements needed for both physics networking was a capacity for rewind. Instead of only allowing for a tick or two of rewind, I generalized to allow for arbitrary rewind. This made level testing MUCH easier:

rewind.gif
Check out the hook while my DJ revolves it.

And it also set the stage for two other features, ghost racing and replay. (I love ghost races way more than anyone else on the team, but I’m the lead programmer so they had to ship it.) Being able to watch and race against other players routes, as well as compete against yourself, adds loads of replay value to the game.

ghost_race.gif
What better challenger for you to face…. than YOURSELF!

Not to say everything was wine and roses. We quickly found that Unity’s physics were completely inadequate for our needs. In the end, about half of the game code was for our custom marble physics engine. I feel comfortable claiming that no other marble game comes close to our level of fidelity. Part of the reason we have our own physics is in order to make them unrealistic, too – sometimes bending the laws of physics in juuust the right way makes things way more fun.

GravitySurfaceRoll.gif
An extreme case of gravity surfaces.

We also found that the default Unity renderer was really not appropriate for the game. Shadows provide important depth cues, but Unity’s dynamic shadows were fuzzy and required very careful level design and lighting to be present in all gameplay situations. In the end, a simple projected blob shadow gave us a crisp look and perfect visibility to support the gameplay. It also saved us many man hours tuning levels.

Particles were also a huge pain – especially mesh particles. Unity does horrible slow things to these and cost us massive amounts of performance on the CPU-light Switch hardware. In the end, we had to make small systems to handle the costliest cases in order to meet our performance goal of 60hz in handheld mode.

rainbowMeshAnimation.gif
Without custom rendering, this effect could bring the Switch to its knees.

We also provide a full reflection on the marble. When I implemented this for Marble Blast Ultra, it took me a week or so to modify the scene graph to re-render the level in all six directions. Performance took some care, but ultimately it ran fine using more-or-less the default engine code. This was on the Xbox 360, which has way less RAM, a slower GPU and weaker CPU. Under Unity, on the more powerful Switch, we ended up having to write our own custom CommandBuffer based renderer and our own set of lightweight shaders in order to hit 60hz. This was an unexpected surprise late in development.

In general, Unity was excellent for prototyping and able to be coaxed into viability for production.

Conclusion

Man, I am stoked to see this game go out into the market. It has been a year of my life, some great challenges, some great fun, and hopefully, a game that brings a little bit of joy into the world. Let’s marble it up!

Code Caving

Lately, we’ve been doing a lot of code caving at The Engine Co. Expert code explorers, like Fabien Sanglard, have an amazing ability to take very complex codebases and understand their core concepts. They can extend them and make them turn all kinds of tricks.

While you shouldn’t miss Fabien’s analysis of Quake 3, you don’t have to be an expert game developer working with major engines to gain huge benefits from a little code literacy. You are far more likely to work within the confines of a big codebase than you are to be working on a blue sky project with no dependencies or baggage.

Being able to dig through layers of code and understand complex existing systems is a huge benefit to your capabilities as a coder. So let me lay out my guidelines for becoming an awesome code caver able to dig through the muddiest of codebases to get things done. Starting with…

Look Deeper Than The Surface

“But wait,” you say, “I’m a JavaScript coder. My language is so simple and elegant! All I have to know is jQuery and I’m set.”

You poor, poor fool. What you aren’t seeing when you declare your very clever closure-based MVC microframework is the 5 million lines of C++ and 1500 man-years that went into the browser that is running it. The JIT that converts your template parser to machine code so it runs fast. The highly tuned GC that lets you kind of forget about memory management. And the vast cross platform rendering codebase that actually lets you draw “Hello World” and funny mustache pictures on screen.

And that browser sits on top of the operating system, which is ten times bigger in lines of code and years of effort. Now hopefully you don’t have to worry about the operating system. But suppose you have a neat little node.js server, and you run a neat little JS powered web service with it, and you get a neat little denial of service attack from some jerk in Eastern Europe. Now all of a sudden those unimportant details like how the operating system handles sockets become vastly important.

Tools of the Trade

How do you cut through a huge codebase like butter? How can you hone in on the insights that make you look like a genius? How can you do all this without breaking a sweat?

grep to grok

The first essential tool is grep. grep looks for a string in a set of files. There are many equivalents to grep (WinGrep, find-in-files in a favorite IDE, find on Windows/DOS), but the key is you can give it a folder or set of files to search and a regular expression to look for, and get back a list of results and the files that contained them. I like to use find-in-files in Sublime Text
3
for this because I can click search results and jump to them.

Grepping is super powerful. Suppose you get some weird error message – “Floorblop invalid.” – and nothing else from your app or service. You might be left with some questions. What’s a floorblop? Why is it invalid? Well, it might not matter. Just grep the error string and look at the surrounding code to determine a cause. Sometimes errors are generated dynamically so you might have to look for part of the string like ‘ invalid.”‘ if Floorblop was determined at runtime. With a little bit of cleverness, you can almost always reduce the number of potential sites for the error in the codebase to a manageable number – then just inspect the search results manually to find the place triggering the error.

Now you don’t care about the text of the error, but instead can start looking for functions that might be failing and causing the error to be emitted. In effect you can think of the error string as a unique identifier for some situation rather than a human-relevant string. (And in many codebases, error messages are cryptic at best.)

In the course of investigating your error message, you might encounter some variables or functions that seem to be related. Grep comes to the rescue once again: grep for them! Generally symbols are unique in a project, so you can quickly find all the code relating to a variable or function. You can also take advantage of the language’s syntax to help narrow down your results – for instance, in C++ function implementations in a class or namespace often look like this:

void Foo::bar()

So if you want to find the definition of bar, you can grep for “::bar”, or if it’s a global function, use a return type, ie, “void bar” and go right to the definition. Think about fragments that would match lines you want to see – you don’t have to get an exact match, just one that’s close enough you can find what you want quickly.

It should be obvious that these techniques can apply to any language, not just C/C++ – in Ruby you might use “def” in your search strings, in JS you might use “function”, in assembly, colons. The goal is to just filter down the number of results you have to consider, not take them to zero.

Binary Spelunking with strings

It’s easy to think of a binary as an opaque file. So it can be frustrating when you have a codebase with binaries in it. But fear not: DLLs, EXEs, .SOs, and a.out files have a ton of useful information in them. They have to – otherwise the operating system wouldn’t be able to run them!

One of the first tools to pull out is strings. Strings simply scans any binary file looking for bytes that look like they might be a string (displayable ascii characters ending with \0). When it finds a match, it prints it out. You can store the results in a file, or run them through grep or another tool, to help narrow results.

So suppose you have a function for which you cannot find the implementation – but you notice that the codebase includes a binary .so or DLL. You can check if the function is prepackaged in the binary by running strings and grepping for the name. If you get some hits there and nowhere else, you now have a good hint that you might not have all the source code.

Pulling Back the Hood: strace and wireshark

Maybe you’re getting a weird failure when interacting with your app. The first thing to do is establish exactly what’s happening! strace is a super powerful tool on Linux that dumps every system call a program makes. With this, you can see when and what is being read from disk, what network activity is happening, if it’s requesting memory from the system, and so on. All of a sudden a cryptic dynamic linker error will be easy to diagnose – you can see if libraries are being loaded and if symbols are being found or not. Or you can see what OS call is hanging the process. Super useful, if a bit tedious.

On Windows, SysInternals offers similar tools to inspect process activity.

Cross platform Wireshark is a packet sniffer that can really help you understand network activity. Suppose you are getting a cryptic error relating to some HTTP service. You might not be sure if the service is up, if you have a bad header, if the connection is being dropped… Wireshark will let you tell right away if data is making it to the server and what it is exactly, as well as what the server is sending back to you. It will identify things like corrupt TCP packets that are very rare failures but can totally break your app.

Once you’ve established the lowest levels are working properly, it’s easy to move up the stack to HTTP inspection tools like Charles or Telerik Fiddler. These can inspect HTTPS traffic and give you a higher level view of your app’s behavior. Chrome and Firefox also have built in tools that are similar.

Abusing Your Profiler and Debugger

You can also abuse profiling tools like Instruments to find out what an app is doing. On Windows, Windows Performance Analyzer and xperfview are the tools to try first. These tools allow you to attach to a running process on your system and see the callstacks and where time is spent in them.

In other words, they show you HOW the code is running in various situations and which are most important (due to being at top of call stack or being called often). It’s like a self-guided tour through an unknown codebase – how convenient!

You can also attach a debugger and use the time honored trigger of hitting pause a lot to see where time is spent during execution – or stepping through function calls to see what calls what. This is a bit more tedious but also very useful in understanding a large codebase.

Study the Fundamentals

The best way to learn to understand big codebases is to… wait for it… study big codebases. While every project is unique, after a while you start to internalize common layouts and development philosophies. As you learn to recognize patterns and see how bigger systems fit together, you gain building blocks that let you map out unfamiliar systems and build your own systems to higher standards.

A few books I’ve found educational in this area:

  • Operating Systems Design and Implementation – A walkthrough of Minix, a small self contained POSIX environment. This goes through every detail of a “pocket” operating system and explains it in detail. Great introduction for the next three books…
  • Mac OS Internals – A great guide to OS X’s structure with detailed examples and code walkthroughs. Understand the fundamentals of OS X while building valuable code analysis skills.
  • Windows Internals, Part 1 (6th Edition) – The same idea, but for Windows. A completely different take on OS design with a different set of constraints on development. In depth and fascinating.

Conclusions: Fighting Dirty & Getting to the Bottom

At the highest level of mastery, you should be prepared to reason about the behavior of any program from its highest levels of UI down to the behavior of the CPU. Every abstraction is leaky, and even though you want to preserve abstractions to keep development manageable, you shouldn’t let that limit your understanding. Get to the bottom of the stack first, then work your way back.

Don’t be afraid to fight dirty. Do you think something is important? Do you have an assumption about how some code works? Try breaking it to see how it fails. You’d be amazed how many problems can be solved by asking “Is it plugged in?”. Use version control or backups to make sure you can safely make dangerous changes and roll back to a safe state.

Finally – optimize for your sanity and development time. Don’t spend 12 hours when you can spend 1. Think about how you can narrow down your problem and check your assumptions quickly so you can focus on gaining knowledge and making the system do what you need. There are a universe of tools to make it possible to wrangle complex systems. Use them!

Flatten Your Conditionals!

Deep nesting is a pet peeve of mine. I’m going to show you what deeply nested code is and discuss some strategies for keeping things tidy. It’s my opinion that deep nesting is a sign of sloppy code.

You know, code like this (with my condolences to the author):

    if (productId != nil) {

        NSLog(@"EBPurchase requestProduct: %@", productId);

        if ([SKPaymentQueue canMakePayments]) {
            // Yes, In-App Purchase is enabled on this device.
            // Proceed to fetch available In-App Purchase items.

            // Initiate a product request of the Product ID.
            SKProductsRequest *prodRequest = [[SKProductsRequest alloc] initWithProductIdentifiers:[NSSet setWithObject:productId]];
            prodRequest.delegate = self;
            [prodRequest start];
            [prodRequest release];

            return YES;

        } else {
            // Notify user that In-App Purchase is Disabled.

            NSLog(@"EBPurchase requestProduct: IAP Disabled");

            return NO;
        }

    } else {

        NSLog(@"EBPurchase requestProduct: productId = NIL");

        return NO;
    }

This code is hard to understand. It’s hard to understand because error handling is distant from the error checks (for instance, the check for nil is at the beginning but the error and return are at the end!). It’s hard to understand because the important parts are deeply indented, giving you less headroom. If you want to add additional checks, it’s hard to know where to add them – and you have to touch lots of unrelated lines to change indent level. And there are many exit points scattered throughout. GROSS.

Whenever I see code like this I cringe. When I get the chance, I like to untangle it (or even catch it in code review). It’s soothing, simple work. To be sure, the functionality of the code is fine – it’s purely how it is written that annoys me.

There’s a key thing to be aware of in the structure of this code – it has a bunch of early outs related to error handling. This is a common pattern so it’s worth walking through the cleanup process. Let’s pull the first block out:

    if(productId == nil)
    {
        NSLog(@"EBPurchase requestProduct: productId = NIL");
        return NO;
    }

    NSLog(@"EBPurchase requestProduct: %@", productId);

    if ([SKPaymentQueue canMakePayments] == YES)
    {
        // Initiate a product request of the Product ID.
        SKProductsRequest *prodRequest = [[SKProductsRequest alloc] initWithProductIdentifiers:[NSSet setWithObject:productId]];
        prodRequest.delegate = self;
        [prodRequest start];
        [prodRequest release];

        return YES;
    }
    else
    {
        // Notify user that In-App Purchase is Disabled.
        NSLog(@"EBPurchase requestProduct: IAP Disabled");
        return NO;
    }

    // Never get here.
    return NO;

It’s a LOT better, but now we have a return that can never be run. Some error handling code is still far from the error detecting code. So still a little messy. Let’s do the same cleanup again on the second block:

    if(productId == nil)
    {
        NSLog(@"EBPurchase requestProduct: productId = NIL");
        return NO;
    }

    NSLog(@"EBPurchase requestProduct: %@", productId);

    if ([SKPaymentQueue canMakePayments] == NO)
    {
        // Notify user that In-App Purchase is Disabled.
        NSLog(@"EBPurchase requestProduct: IAP Disabled");
        return NO;
    }

    // Initiate a product request of the Product ID.
    SKProductsRequest *prodRequest = [[SKProductsRequest alloc] initWithProductIdentifiers:[NSSet setWithObject:productId]];
    prodRequest.delegate = self;
    [prodRequest start];
    [prodRequest release];

    return YES;

See how much cleaner that is? Beyond saving indents, it also exposes the structure of the algorithm a great deal more clearly – check it out:

  1. Check for nil productId; bail if absent.
  2. Log productId if it is present.
  3. Check if we can make payments/IAP is active; bail if not.
  4. Submit the product info request.
  5. Return success!

The code and its “flowchart” now match up nicely, and if you modify one, it’s easy to identify the change in the other. This might seem like a little thing, but I find it shows that the purpose + structure of the function is well set up. And if you can’t write the function without violating this rule, it’s often a very solid clue you need to introduce some more abstraction – tactics such as breaking stuff up into helper methods, reorganizing your data structures a little bit, centralizing lookups/checks, and so on.

Something to keep in mind next time you find yourself hitting tab more than a couple times – flatten your conditionals!

Some Thoughts on Build Systems

Note: You might also want to read Some Thoughts on Build Servers, which discusses software packages for running automated builds on a shared server.

SmokestacksThe hardest part of software development is often the road from code in a repo to an artifact in the user’s hands.

There are a million ways you can ruin yourself along that long and winding path. You miss a DLL or dependency. You mark a flag wrong and it won’t run on a non-developer system. A setting gets changed on the system doing the compile and builds mysteriously fail. On multiple platforms (and who isn’t on at least a couple?), you forget to test on all 5 of your platforms and find out the build is broken – obviously or subtly – on one of them.

One building bock that helps cut down on this pain is a build tool – a tool to manage what files are built in what order and with what settings. If you can fire off your build from a single command line command, it dramatically reduces the risk of breakage due to outside factors – and helps a lot with setting up build boxes. At this point I’ve worked with nearly every option: make, msbuild, xcodebuild, rake, Maven, Ant, CMake, premake, qmake, and even a couple of home brew systems. Here are my thoughts on each of them:

GNU Make. The granddaddy of all build tools. Cryptic syntax, most widely used on POSIX-compatible environments like Mac or Linux. It can do anything you want it to, if you’re willing to dive deep enough into it. Provides very little hand holding. Tools like automake and autoconf expand capabilities quite a bit, but they are anything but intuitive, and if your goal isn’t a UNIX command line tool, they may be frustrating to work with. Makefiles are generally shippable if you are willing to put enough smarts in them (since they are fundamentally built on top of the shell). Make files are easy to generate, and many tools exist to programmatically do so (more on those later).

MSBuild. The successor to nmake (with a brief detour to devenv.exe), it owes a lot of its legacy to make. However, it’s fully integrated with Visual Studio, so if you have a Visual Studio project, it’s easy to drive. In general, vcprojs are pretty easy to programmatically generate, and also easy ship to other Windows developers, which is a big bonus. No viability for sharing cross platform, except possibly in the context of Mono development.

XCodeBuild. The command line tool for compiling XCode projects. It works just like XCode does, minus the goofy UI. Great for doing OSX/iOS builds, nothing doing for any other platforms. XCode project files are relatively easy to ship to people, although there can sometimes be little subtleties that screw you up. Once nice thing about XCode’s build model is that it’s fairly easy to call your own scripts at various points in the build process. The downside is that xcodeproj’s are finicky and hard to generate.

Rake. Ruby is pretty sweet, and Rake builds on it in the Ruby way – that is, with a domain specific language tailored to the task at hand. The downside is that the docs are inscrutable – you pretty much need to be prepared to look at a lot of examples and dive the code to understand it well. But it responds well to hacking and generally gets the job done. Since Rake just sequences commands it works great for non-Ruby projects – it’s basically a much better Make.

Maven. I have seen Maven used very well in real Java projects, and abused heavily in non-Java scenarios. If you grok the Maven way and are prepared to conform to its view of the world, you can get good results. But in general I think it is much more trouble than it’s worth in anything but enterprise Java contexts.

Ant. I’ve used Ant several times on non-Java projects, to good results. Ant is powerful and has some nice capabilities for filtering/selecting actions. However, it also has an obtuse XML syntax that becomes cumbersome in complex build scenarios, and it can be finicky to set up all your Ant tasks properly.

CMake. CMake is ugly, but it’s an effective kind of ugly. The CMake language is gross and its codebase is complex, with important features often being driven by subtle combinations of settings. But the docs are pretty decent, the community is large, and it has good update velocity. It also generates pretty good project files for most IDEs. And it is pretty easy to hook arbitrary commands into key points in the build process, which is a big win. CMake is bad if you want to do a lot of file processing or complex logic, but good for making and running project files that work across many platforms – including iOS and OSX.

premake. Of all these technologies, I most want premake to rock. It uses Lua, which is an easy and low-dependency language, and it has a pretty good set of modules for emitting different projects. Most of the time, projects can be shipped, which is big, too. However, the core generators are finicky, and we had compatibility issues. And development velocity isn’t as high as we’d like. So we ultimately had to drop it. However, I think it’s worth a look again in the future.

QMake. QMake is mostly associated with QT development, and exists to facilitate the preprocessing that QT requires to generate all of its binding + convenience features. It takes a simple configuration language, and can be effective. However, its support for mobile platforms appears to be rudimentary and it does not produce project files – just sequences build commands.

Homebrew. My main experience here was a custom project generation tool I developed at GarageGames. (Ultimately, many others have touched it, and I believe that it is still in use as of this writing.) We decided to go the homebrew route because we needed to ship high quality project files to our customers. None of the existing tools could produce these (premake is now probably the closest). And our existing process of hand-tweaking projects resulted in a lot of broken releases. We ended up using PHP to process hand-written project file templates. It worked because we had a large enough team to be able to spend a few man-months refining it until it was good enough. The main take away from that experience was that it’s not as hard to do as you’d think – it’s just matter of patience and groveling through exemplar build files to learn the format. The real cost is maintaining on-going compatibility with all the different versions of all the IDEs. I hope that someday GarageGames releases this tool as open source.

So, with all those out there to consider – what am I using today? Well, we are using a hybrid of Rake and CMake. We use CMake for all the project generation + compilation, while Rake deals with sequencing calls to CMake and make or xcodebuild or what have you – mostly for the build box’s benefit. Our project targets iOS, Android, Mac, and Windows, and so far this combination has worked out well.

Ultimately, you want a tool that builds from a single command and doesn’t require user intervention to produce final build artifacts. Otherwise, you will be constantly chasing your tail as you move from developer to developer or platform to platform. Any of these tools can achieve this, so it’s a question of choosing the tool or combination of tools that fit your situation the best. Good luck!

PushButton Labs Profiler

Do you want your Flash content to perform well?

Have you struggled with the tools built into Flash Builder?

Well, if all you want is a basic functioning profiler, let me introduce the PBLabs Profiler! It shows you where your time is going:

It’s available under the GPL with source and binaries on GitHub. You can also post on the PBLabs Profiler forums. There is more information on the profiler in the forums and on the github site.

I Wrote A Book: Video Game Optimization

More precisely, Eric Preisz and I wrote a book!

The book is called Video Game Optimization, and it covers everything you need to know to get maximum performance from any software project – but especially games. If you’re struggling with getting a great framerate out of your game, I highly recommend checking it out. 😉

Video Game Optimization goes all the way from high-level concepts like planning for performance in your project’s timeline, to determining which broad area of your system is a bottleneck, down to specific tips and tricks for optimizing for space, time, and interactivity. Based on the course that Eric Preisz taught at Full Sail University on optimization, it isn’t the only book you’d ever want to read on the subject, but we think it is a great introduction!

The journey from that initial conversation where our mutual friend Jay Moore introduced us and suggested I would be a good co-author, to the day when we finished and shipped the book was a long but rewarding trek. Eric moved across the country from Florida to Nevada, as he moved from teaching at Full Sail University to running the Tech and Tools group at InstantAction. He also became a father with the arrival of his son, Grant. I left after 5 years at GarageGames and helped build a new company, PushButton Labs.

A lot has changed while we wrote it, but it still felt really good to arrive at GDC, visit the book store outside the exhibition hall, and finding a big stack of Video Game Optimization sitting front and center. 🙂

3D in Flash 10 & Git

picture-1

I spent a little time with Flash 10’s 3d features recently. Since Flash 10.1 is imminent and FP10 has been at 90%+ penetration for a while now, it’s probably safe to start looking at using FP10 stuff in my projects. 🙂

I also used this as an opportunity to try out git. It was easy to get git installed on OSX (I used the command line version, installed from git-osx-installer) and put my code up on Github. You can browse my test code at http://github.com/bengarney/garney-experiments/tree/master/exploringFlash3D/.

My main concern was the transformation pipeline – I think there might be some benefits to using 3d positions for the rendering pipeline in PBE. So I wanted to do a brief survey of the built-in 3d capabilities, then look more closely at the transformation routines.

My first test was making a DisplayObject rotate in 3d (minimal source code). It runs well, and if you turn on redraw region display, you can see that it’s properly calculating the parts of the screen that need to be modified.

This was easy to write, but it revealed the primary flaws with the built-in Flash 3d capabilities. First, look closely – the edges of the shape were sharp, but the interior detail is aliased. This is because whenever the 3D rendering path is engaged, it turns on cacheAsBitmap. This is fine for simple scenarios (say, taking a single element in a UI and giving it a nice transition) but not for more complex situations.

Which brings us to the second and bigger flaw. I added a thousand simple particle sprites at different 3d positions (source code). This runs extremely slowly because of an issue described by Keith Peters involving nested 3d transforms. Nested 3d objects cause excessive bitmap caching, dramatically reducing performance. You might end up with bitmap-caching in action on every 3d object and every DisplayObject containing 3d objects.

In addition, because it’s cached, moving objects further/closer from the camera results in an upsampled/downsampled image. So you tend to get mediocre visual results if your objects move much.

My next step was to stop using the 3d capabilities of DisplayObjects, and just position them in x/y based on their 3D position (source code, notice it is two files now). This gave a massive performance gain. At low quality, 1000 particles runs at 1440×700 at acceptable FPS. Most of the overhead is in the Flash renderer, not the code to update DisplayObject positions, but it still takes a while to do all the transformation, and it causes a lot of pressure on the garbage collector from all the 1000 temporary Vector3D instances that are created every frame. (600kb/second or so – not insignificant.)

Next I figured it would be helpful to make my camera move around (sample code).

This required that I understand the coordinate space all this operated in. What are the coordinate spaces? According to the official docs, screen XY maps to world XY. So forward is Z+, up is Y-, right is X+. Once I figured that out, I had to prepare a worldMatrix with the transform of the camera, then append the projectionMatrix. The PerspectiveProjection class always seems to assume screen coordinate (0,0) is the center of the projection so you will have to manually offset. Maybe I was not using the projection right, since the docs imply otherwise.

There were two other details to sort out. First, I had to reject objects behind the camera, and second, I had to scale objects correctly so they appeared to have perspective. The solution revolved around the same information – the pre-projection Z. By hiding all objects with Z < 0 and scaling by focalLength / preZ, I was able to get it to behave properly.

Next up is Matrix3D.transformVector… which is slow. Transforming 1000 vectors eats 3ms in release build! This is really slow in absolute terms (Ralph Hauwert has a good example of the same functionality running much much faster). I didn’t really want to introduce Alchemy for this project. But we can use AS3 code that avoids the allocations, saving us GC overhead and getting us an incremental improvement in performance.

Andre Michelle has some interesting thoughts on the problem of temporary objects related to transformations (see http://blog.andre-michelle.com/2008/too-late-very-simple-but-striking-feature-request-for-flash10-3d/). I did notice that Utils3D.projectVectors had some options for avoiding allocations, but it did not seem to run significantly faster (even in release build). (sample code for using projectVectors)

In the end, I settled on my own implementation of transformVectors, as it seemed to give the best balance between performance and ease of us. There’s a final version of the sample app where you can toggle between transformVector and the AS3 version by commenting out line 105/106 up on github, so you can test it for yourself. The transform function took some effort to get right, so here it is to save you the pain of implementing it yourself. It transform i by m and stores it in o.

        final public function transformVec(m:Matrix3D, i:Vector3D, o:Vector3D):void
        {
            const x:Number = i.x, y:Number = i.y, z:Number = i.z;
            const d:Vector. = m.rawData;

            o.x = x * d[0] + y * d[4] + z * d[8] + d[12];
            o.y = x * d[1] + y * d[5] + z * d[9] + d[13];
            o.z = x * d[2] + y * d[6] + z * d[10] + d[14];
            o.w = x * d[3] + y * d[7] + z * d[11] + d[15];
        }

Time for some conclusions. I think that the 3D capabilities built into DisplayObject are OK, but very focused on light-weight graphic design use. Building a significant 3D application requires you write your own rendering code built on Flash’s 2D capabilities (either DisplayObjects or drawTriangles and friends). The 3d math classes are ok, but immature. Some things are very handy (like the prepend/append versions of all the methods on Matrix3D), but the tendency for Flash APIs to implicitly allocate temporary objects limits the usefulness of some the most central API calls. In addition, important assumptions like the order of the values in Matrix3D.rawData were not documented, leading to frustrating trial and error. I am excited to see Flash’s 3d capabilities mature.

Adobe, Please Buy HaXe.

haxe_and_neko If you’re in the Flash space, you’ve probably seen a lot of coverage of HaXe lately. HaXe targets Flash 9 and 10, as well as JavaScript, PHP, NekoVM, and most recently C++. You can write code once and easily retarget it to different platforms, which is very exciting. This has been used to good effect by Hugh Sanderson, who is developing an iPhone path.

HaXe is cool not only for platform compatibility with things like the iPhone (and who knows – maybe other platforms like Android or Windows Mobile), but also for its language features. They deployed support for the new Flash 10 high-performance memory intrinsics before the official Adobe compiler. HaXe has generics/templates, decent conditional compilation, inlining, advanced typing a lot more. The biggest claim to fame HaXe has is a big performance margin over AS3, as well as better memory management options.

And performance is a big deal when you’re an interpreted language trying to run on Netbooks and televisions.

Since my day (and sometimes nights, too) job is working on a Flash game framework, HaXe looms large on my radar. AS3 and HaXe can already interoperate so if you want to use a HaXe physics library from your ActionScript 3 code, there’s no problem. But should we switch the core engine over? Having a seamless way to target iPhone would be big. Having an easy way to run your server code without requiring an integration with Tamarin would also be nice.

Ralph Hauwert over at unitzeroone summarizes the HaXe situation very well:

The language is great and has many features ActionScript lacks. It baffles me how one man army nicolas can single handedly write a language and a multi target compiler, and in many respects stay on top of the ActionScript compiler.

My first problem with Haxe is just that. The one man army. Should nicolas decide he doesn’t feel like it anymore, the platform is dead. Sure, there’s more people working on it, but in it’s core it seems to be that nicolas is the driving force. My second problem is that although there is a huge pool of ActionScript developers, which can work together within the industry standard called ActionScript. There’s not even near that many people who’ve even touched haxe, let alone the number of developers who are able to call themselves “Haxe Professionals”.

HaXe is very cool. But it doesn’t have the ecosystem and infrastructure to justify targeting it exclusively. As a small startup, we only get to target one platform for our technology, and AS3, although it leaves performance on the table, wins in terms of the pool of potential developers and the accessibility of the toolchain. How many people out there are using PHP or Java vs. x86 assembly?

If the basic technology is good enough, ease of use always wins out over performance. So for now we are going to stick with ActionScript 3 for PushButton Engine. You can write compelling content on it. If you really need HaXe for something, the option is there to link it in.

But I want to put the plea out to Adobe. I love you guys and all you have done in the Flash space. But please update your compiler technology. At least give us the compiler optimization wins like inlining and generics. It’s simply embarassing that one guy’s work can outperform you on your own VM!

Adobe, you have smart people working for you. You have smart people in your community. You have pretty decent tools. None of us want to jump over to a third-party technology, but if HaXe keeps beating you at your own game, it might be necessary. Buy HaXe if you must, or else match it with your in-house tech, but either way – I know you can solve this problem.

The web is catching up with you on the player front. You could dominate on the tools front and make the world better for everyone. Please. As a middleware developer, I really don’t want to have to support anything other than a single mainstream language and platform. 🙂

Technical Notes on O3D

Google released O3D, a web plugin for 3d rendering, today. It’s a pretty sweet piece of work. Definitely check it out if you have any interest in 3d rendering, the web, or JavaScript.

There are a couple of cool pieces to this puzzle, and I wanted to call them out to other people who might be reviewing this technology. The two sentence review: I am blown away. They got this right.

(FYI: This post isn’t meant to be a full walkthrough, just a quick indication of the cool jumping off points in the codebase.)

Stop number one is the graphics API abstraction. This is rooted in o3d/core/cross/renderer.h, which provides an abstract interface for performing render operations. This is sensibly designed, oriented for SM2.0 through SM4.0 level hardware. It will also deal with DX11 class hardware, although it won’t expose all of the niceties DX11 gets you. It wraps GL (o3d/core/cross/gl/) and D3D9 (o3d/core/win/d3d9). The purpose of all this is to provide a common ground for all the higher level code to issue draw commands from. It is not directly accessible from JavaScript (although some of the related classes like Sampler and State are.)

As an aside, writing a Pixomatic or Larrabee backend would not be hard. At GarageGames, we had a rendering API similar to this (unimaginatively called GFX), and it was fantastically useful. Last time I checked, there were DX8, DX9, DX10/11, GL, and Pixomatic backends for it in varying states of usefulness. We even had one guy write a backend that would stream draw commands over a socket, which was pretty cool. In the context of O3D, they have an implementation of Renderer that queues everything into command_buffer, then streams it to a command buffer server running in a separate thread, which issues the actual draw commands. It’s unclear whether this is used in the current version of the plugin based on casual inspection, but it’s a great example of what good design can get you.

The next piece is the DrawList, which is where most of the heavy lifting for drawing happens. JavaScript, of course, is not really desirable to have in your inner rendering loops, so you queue everything you want to draw (DrawElements) into a DrawList. This is wrapped by higher levels, of course, but it represents the lowest level API that’s available to JS code. All the sorting and state management to get stuff on screen happens in this area.

Alongside the DrawList stuff, you find the material system, which is pretty slick. You write your shaders in their shading language, and it converts to HLSL SM2.0 and Cg (arbvp1/arbfp1). This is enough to do almost anything you might want to (as fantastic as SM3.0+ is it’s not really necessary for most rendering). There’s a full SAS system so you can interface programmatically with your shaders.

Above this, you get into the scenegraph. Now, I subscribe to Tom Foryth’s views on scenegraphs, which is that they are basically a bad idea, but I think the Google guys were smart and set up their API so intelligent developers can avoid being screwed by the scenegraph. The scenegraph-esque stuff they do have (they call it a rendergraph) is powerful enough you can do most rendering without going nuts, and lets a lot of the heavy lifting stay in native code, where it will be fast. You end up doing retained mode-ish things, but since JS is slow, and most JS developers come from a DOM background, it works better than you might expect.

There are a lot of cool miscellaneous features, too. They embed the fast V8 JS engine right in the plugin so you can have consistently fast JS execution. They support falling back to an error texture (via Client.SetErrorTexture) when you fail to bind a sampler. You can group objects via Pack objects for easier management and stream things from archives, too. There are debugging aids and libraries for math and quaternions.

You should check out the samples. There’s a lot of impressive stuff. There’s the beach demo in the video at the top of this post, but they also do some cool stuff with animation, HUDs, text rendering, picking, and stenciled teapots. And everything is done in JavaScript right in the web page. Even the shaders are embedded in <script> tags!

O3D is a great piece of technology, and I hope it thrives. I’m excited to see what people build on this (and I’ve got a few ideas myself ;).

Tweaking your game with Google Spreadsheets

tweaksheet Our latest game, Grunts: Skirmish, has 200 tweakable parameters. There are 9 player units with three levels of upgrade, and another 9 enemy units. Each unit has between three and ten parameters that can be altered.

We tried a few approaches – hand-editing a large XML file (but it was too large and spread out) and an in-game tweaking UI (but it was too much work to get the UI to be friendly to use). The old standby of having the designer and artist file bug reports to have the programmer update the game wasn’t getting us very far, either.

“Well,” says I, “We’re some sort of Web 2.0 startup, right? And we’re developing a Flash game aren’t we? And Flash can talk to websites, can’t it? And don’t we use Google Docs for everything?”

It turns out there is an XML feed from public Google spreadsheets. And ActionScript 3 supports E4X, so you can directly manipulate XML without any extra work. Now we tweak our game using a shared spreadsheet up on Google Docs.

I wrote a parser for their format:

// Extract the entries. It's namespaced, so deal with that.
var xmlns:Namespace = new Namespace("xmlns", "http://www.w3.org/2005/Atom");
tweakXML.addNamespace(xmlns);

// Parse into a dictionary.
var cellDictionary:Dictionary = new Dictionary();
for each(var entryXML:XML in tweakXML.xmlns::entry)
{
   cellDictionary[entryXML.xmlns::title.toString()] = entryXML.xmlns::content.toString();
}

And wrote a quick component that would fetch the spreadsheet feed, parse it, and stuff it into the right places on named objects or template data. Now I have a little entry in our level file that looks like:

<object id="googleTweaker">
   <component class="com.pblabs.debug.GoogleSpreadsheetTweaker">
     <SpreadsheetUrl>http://spreadsheets.google.com/feeds/cells/0d8somekey848x/od6/public/basic</SpreadsheetUrl>
     <Config>
        <!--  Grunt Level 1 tweaks -->
        <_><Cell>B3</Cell><Property>#TheGruntProxy.creator.WarPointCost</Property></_>
        <_><Cell>C3</Cell><Property>!Grunt.health.mMaxHealth</Property></_>
        <_><Cell>D3</Cell><Property>!Grunt.ai.AttackSearchRadius</Property></_>
     </Config>
  </component>
</object>

Each line maps a cell in the spreadsheet to a property on a template or active game object. Some properties have to be set several places, which the system deals with automatically.

The biggest wrinkle was Google’s crossdomain.xml policy. Basically they do not allow random Flash apps to access their site. So I had to write a small proxy script, which sits on our development server next to the game and fetches the data for it. Figuring out I had to do this took more time than any other step.

The main difference between the snippet and the full code is the version in our repository is 220 lines long. I only have around 150 of the full set of 200 parameters hooked up, but after a hard afternoon’s work, the process for tweaking the game has become:

  1. Open Google Docs.
  2. Edit a clearly labeled value – like level 1 grunt health.
  3. Restart the game, which is running in another tab in your browser.

This takes you about a minute between trials. Not too bad. Before this, the process was:

  1. Get the game source from SVN.
  2. Find the right XML file – there are several.
  3. Find the right section in the XML – altogether we have 200kb of the stuff for Grunts!
  4. Change the value.
  5. Commit the change.
  6. Wait 5-15 minutes for the build system to refresh the live version of the game.

Ten minutes per tweak is not a good way to develop.

I’ve heard about developers using Excel spreadsheets for tweaking, but can’t find anything about using Google Docs to do it. But Google Spreadsheet is obviously a better choice. It has built-in revision tracking. You can edit it simultaneously with someone else. You can access live data in XML either publicly (like we did) or privately via their authentication API. It’s absolutely worth the half-day of your time it will take to add Google Spreadsheet-based tweaking to your game – even if it’s a non-Flash game, downloading and parsing XML is pretty easy with the right libraries.

I strongly suspect this feature will find its way into the next beta of the PushButton Engine. Which, by the way, you should sign up for if you are interested in developing Flash games. We’re bringing people in from the signup form starting this week. If you want more information, or just like looking at cool websites, click below to check out the new version of the PBEngine site, which has a bunch of information on the tech. Tim did an awesome job on the site design.

betashot

Edit: Patrick over on the GG forums asked about the proxy script. It’s actually ludicrously simple. Not very secure either so I wouldn’t recommend deploying it on a public server. I got my script from a post on Aden Forshaw’s blog. In the real world you would want to have some security token to limit access to your proxy script… but since this is for tweaking a game that is in development I didn’t sweat it.

%d bloggers like this: