Thursday, September 8, 2011

Make like Voltron!

Because as awesome as it is to have a lion robot, if you want to swing the really big sword, you've got to tack a bunch of them together.

What I actually want to talk about this time is combination, specifically file combination. That is, taking a large number of GR2 files that represent your character, animation set, level data, etc., and turning them into a single file.

The Granny file format has a lot of powerful features that even clients who have been using the library for years don't use all that often. If I had to make a list of "Most Underused Features of Granny", the ability to create custom file formats would be running away at #1. I'm going to give you an example of how you can squeeze a tremendous amount of space and time out of your existing assets. Bear with me for a bit, because there's going to be quite a bit of foundational "how" material before we get to "why".

Let's talk a bit about the way Granny represents data, since that will become important later on. Granny files are implemented with a two-layer system. The first layer is the underlying file engine which handles compression, endian and pointer-size conversions, and data-graph flattening. It also provides a mechanism for describing the data to be persisted. The second layer is the actual format (such as GR2/granny_file_info), which is described in terms of the first layer's data description language. The second layer is where we specify what we're working with, as well as manipulate the file format over time. (Versioning happens at this level.)

If you're already a Granny user, you might have wondered why it takes two steps to load a Granny file from disk.
  granny_file* TheFile = GrannyReadEntireFile("my_anim.gr2");
  granny_file_info* Info = GrannyGetFileInfo(TheFile);
The two-layer system is why. By the time you have a granny_file object in hand, all of the pointer fixing, endian-marshalling, and pointer-size conversions have been done, but as far as your application is concerned, a granny_file is just a collection of raw bytes with a type description attached. GrannyGetFileInfo does the work of comparing the type stored in the file to the datatype expected by your game. Let's take a quick look into GrannyGetFileInfo in pseudocode. The function essentially boils down to:
  if (File->Header.TypeTag == GrannyCurrentGRNStandardTag)
  {
      return (granny_file_info*)File->RootObject;
  }
  else
  {
      if (File->ConvertedObject == 0)
      {
          // Log Warning about conversion operation
  
          File->ConvertedObject =
               GrannyConvertTree(File->RootType,
                                 GrannyFileInfoType,
                                 File->RootObject);
      }
      return (granny_file_info*)File->ConvertedObject;
  }
Again, if you're a Granny user, and you have a granny_log_callback registered, then you've probably seen logs like: "File has run-time type tag of 0xabcdef, which doesn't match this version of Granny (0xabcdf0)." Here's where they come from! GrannyConvertTree is fairly efficient, but it's still doing a walk of the type tree for the file and the library. We'd much rather be in that first branch of the conditional where we just cast a pointer and return it.

Stepping back to the original topic of combining Granny files, you probably have noticed that all of the functions in GrannyGetFileInfo are available to you in the Granny API. We have GrannyConvertTree, and that's basically it. GrannyFileInfoType is just an array of granny_data_type_definition structures, which are also user-creatable. So why not just make our own format, and call GetMySpecialInfo() instead of GrannyGetFileInfo?
  my_file_database*
  GetMyFileDatabase(granny_file* File)
  {
      if (File->Header.TypeTag == MyCurrentTag)
          return (my_file_database*)File->RootObject;
      else
      {
          if (File->ConvertedObject == 0)
          {
              // Log Warning about conversion operation
      
              File->ConvertedObject =
                   GrannyConvertTree(File->RootType,
                                     MyFileDatabaseType,
                                     File->RootObject);
          }
          return (my_file_database*)File->ConvertedObject;
      }
  }
Once we've figured out what we're going to put into my_file_database, that's pretty much all it takes! Let's design a containing structure that we can stuff a bunch of granny_file_infos into, along with some lookup information so we can pull them out in our game later. Here are our structure declarations:
  struct my_filedb_gr2_entry
  {
      char const*       Filename;
      granny_file_info* FileInfo;
  };

  struct my_file_database
  {
      granny_int32             GR2FileCount;
      granny_filedb_gr2_entry* GR2Files;
  };
  extern granny_data_type_definition MyFileDatabaseType[];
So we have my_file_database, which contains an array of my_filedb_gr2_entry structures, each of which is just a filename and a pointer to a granny_file_info. Next we have to describe those two structures to Granny. A full tutorial on granny_data_type_definitions isn't really in scope for this post, so I'm just going to give you the code, and you can check out the tutorials in the SDK documentation on the complete system.
  granny_data_type_definition GR2EntryType[] =
  {
    { GrannyStringMember,    "Filename" },
    { GrannyReferenceMember, "FileInfo", GrannyFileInfoType },
    { GrannyEndMember }
  };
  
  granny_data_type_definition MyFileDatabaseType[] =
  {
    { GrannyReferenceToArrayMember, "GR2Files", GR2EntryType },
    { GrannyEndMember }
  };
That's all. Notice that GrannyReferenceMember allows you to use other structures described with granny_data_type_definition arrays, so including the granny_file_info type in our structure just requires that one entry. In fact, in the latest version of Granny, this exact structure (plus some extra bells and whistles) is provided to you. The preprocessor can create large collections of GR2 files at tool-time, so loading a character or even a whole level is one simple function call.

See? I promised we'd get to the "why" eventually. Not only is loading a character with
  // Loading:
  granny_file* TheFile = GrannyReadEntireFile("my_character.fdb");
  my_file_database* CharacterDB = GetMyFileDatabase(TheFile);
  
  // Freeing:
  CharacterDB = 0;
  GrannyFreeFile(TheFile);
much simpler than:
  // Loading:
  for (int i = 0; i < NumCharFiles; ++i)
  {
      granny_file* TheFile = GrannyReadEntireFile(CharFiles[i]);
      // anim01.gr2, anim02.gr2, etc.
  
      granny_file_info* CharInfo = GrannyGetFileInfo(TheFile);
  
      // Do something with CharInfo
      // Store TheFile in FilesArray for deletion later
  }
  
  // Freeing:
  for (int i = 0; i < NumCharFiles; ++i)
  {
      GrannyFreeFile(FilesArray[i]);
  }
it's vastly more efficient as well in both space used on disk and in memory, as well as in the time it takes to load the file at runtime.

Let's work out where we're saving space first of all. Roughly speaking, a Granny file contains the following elements:
  • The data itself (granny_file_info, my_file_database, etc.)
  • The type information describing the data (granny_data_type_definition arrays, basically)
  • Housekeeping information (pointer fixups, etc.)
When you have 50 GR2s on disk, they are unable to refer directly to each other, and so you are guaranteed to have 50 copies of the same granny_data_type_definition arrays lying around. These can be jettisoned at runtime with GrannyFreeFileSection, but they are required to satisfy the requirement that each individual Granny file be a distinct, versionable, self-contained unit. Combining all 50 granny_file_infos into one my_file_database means that we have only 1 copy of the type information around. Win.

The second major space savings comes from the string pooling. Again, because Granny files must be self-contained units, and cannot contain a priori references to other files, granny_track_group animation tracks are matched against granny_skeleton bones by name at runtime. Granny does things this way so you can create new models for old animations, or vice versa. Those strings are referred to in every granny_animation, every granny_model, and every granny_mesh. When those entities are in separate files, you have 50 copies of those as well. (Unless you are using a granny_string_table, in which case, well done!) That doesn't sound like much, but let's do some math. Say your character has about 100 bones. Typical names like "Bip01 Spine1", "Bip01 Head", etc., run about 10 bytes per name, let's say. Each GR2 file in our example contains about 1k of string data for the bone names, in this imagined example that gives us 50k worth of duplicated strings. That waste goes away in the combined scenario.

Finally, we come to the big win, runtime speed. For me, this is huge, because game loading times are a personal pet peeve. I am deeply ticked off by loading screens. With modern hardware, there is almost never a technical reason your game cannot load nearly instantly. Combining 50 files into 1 gives us the following effects on runtime cost:
                         N files       1 Combined File

 fopen (or equivalent):  N             1
                 fread:  O(N)          O(1)
      total bytes read:  X bytes       < X bytes
On every single dimension, we're doing much better. Let's look at some real numbers. The following numbers are from a client's data for a single game level. They were working on the Nintendo 3DS, a platform that has relatively low bandwidth to storage and high latency for operations like fopen(). I've done the tests on both the 360 and 3DS, just to get a look at the numbers across two very different platforms. The results are much more dramatic on the 3DS, which uses flash storage instead of reading from a hard drive, but even on the 360, it's a fairly impressive speed up.

Here is our test data set.
                  #Files    Bytes (compressed)  Bytes (raw)

 Separate    GR2: 487       8,000,624
             GSF:  33         126,992
                  520       8,127,616

 Combined           1       4,138,564 (51%)     10,541,056 (131%)
Notice that the space savings from string sharing, type info sharing, and elimination of redundant housekeeping data mean that the combined file is just a little bit larger than the original files, even after turning off file compression! The combined file is nearly half the size of the original data. Frankly, that's a testament to how much effort that the client had put into optimizing the size of their files in the first place. All of the data was already squeezed as tight as possible before we came through with the combination pass, which addressed things that aren't sensitive to the normal space-saving techniques. (Changing mesh formats, altering animation tolerances, etc.). Normally the size of the win would not be so large in relative terms.

Here is the time it takes to load this data set under 3 scenarios on the Nintendo 3DS and the Xbox 360.
                          Nintendo 3DS           Xbox 360

        Separate Files:   70.84s (1.0x)          3.57s (1.0x)
        Combined (raw):    3.03s (!! 23.3x !!)   0.89s (4.0x)
 Combined (compressed):   10.08s (7.0x)          1.84s (1.9x)
It's not often that you get to describe optimization wins with numbers like "23 times faster" in a game that's already been tuned. The Xbox does, frankly, a fairly astonishing job of keeping up in the separate file case, but even there, we cut the time required to load the dataset by more than 75%.

Now we're swinging the big sword! Dropping this into your existing Granny game should be a piece of cake. Because we've maintained the filename in the combined structure, you simply have to replace the bit in your code that goes out to disk with GrannyReadEntireFile or GrannyReadEntireFileFromMemory with a lookup in the shared structure. You get back the same granny_file_info structure, but a lot faster.

Tuesday, June 21, 2011

Memory

One of the odd things about doing support for a large number of clients at the same time is that you get strange bursts of similar requests. This month, I had two requests for an "intention" parameter for memory allocations, after receiving zero requests over the previous five years. So hey, let's talk about a couple of changes to the memory subsystem in Granny that went into the 2.8.50.0 release!

First, the API as it existed until this month. Granny provides a method for talking over all allocations made by the library: GrannySetAllocator. You pass this function two function pointers, one for the allocation:

  GRANNY_CALLBACK(void *)
  granny_allocate_callback(char const * File,
                           granny_int32x Line,
                           granny_uintaddrx Alignment,
                           granny_uintaddrx Size);

and one for the deallocation:

  GRANNY_CALLBACK(void)
  granny_deallocate_callback(char const * File,
                             granny_int32x Line,
                             void * Memory);

You get to see exactly where in Granny each allocation is coming from so when a memory leak occurs, you can walk it back to the source. "OK, the leaked allocation comes from granny_file.cpp, I bet I forgot a GrannyFreeFile somewhere."

What you don't get is information on what each allocation is for or an indication of it's expected lifetime. For instance, allocations from granny_file.cpp might be temporary buffers used for decompression, or more permanent allocations representing file sections. So now granny_allocate_callback looks like:

  GRANNY_CALLBACK(void *) 
  granny_allocate_callback(char const * File,
                           granny_int32x Line,
                           granny_uintaddrx Alignment,
                           granny_uintaddrx Size, 
                           granny_int32x AllocationIntent);


Where AllocationIntent will be one of the values from the enum granny_allocation_intent. In our granny_file.cpp example, the section data will be allocated with an intent of GrannyAllocationFileData, while temporary buffers send GrannyAllocationTemporary, 'natch. So if you're paying extra close attention to memory fragmentation, you can route the temporary allocs to a heap that won't put holes in your main memory layout, while the section data can be sent to a heap for allocs that will be around for a while. Handy, and pretty simple on your side and mine!

While I was in there, I made a couple of other changes. Granny now has an analogue to the _CrtSetBreakAlloc API. In debug mode, the library allows you to set a breakpoint that will trigger when a specific allocation occurs: GrannySetBreakAllocation. The debug memory logging (check your docs for information on the GrannyBeginAllocationCheck API) now prints an allocation number with each leaked block of memory. If your game or application is deterministic, just plug this into the break alloc API in the next run, and presto! You've got your leak in the debugger.

Granny is getting some interesting Telemetry integration that will add even more debugging features, but we'll get into those in an upcoming post.

Thursday, April 28, 2011

Pushing Files Around

In the latest release, I've reorganized the way we ship source code for our clients, so I that while the "what" is contained in the documentation, the "why" would make for a perfect inaugural blog post. Shipping Granny presents some interesting challenges.  As of this moment, Granny ships for 9 different platforms, and we have legacy support for several more. (Xbox, GameCube, etc.)  In addition, every version of Granny ships with out of the box support for 28 different versions of Max, Maya, and XSI.

None of that would be too onerous, but one of the selling points of Granny is that we give you full access to the source code, allowing you to customize our code at any level. So we have to ship all of this in way that at gives our clients a way to build each platform.

For most products, we could simply give you a copy of our own source tree, along with a Visual Studio project, and call it done. That won't work for Granny. Since we have to build for Linux, MacOSX, and some console platforms that don't have native VS integration, we use a custom build system in order to support all of our platforms from a single build environment.  The list of files included for each platform/processor combination we support is determined automatically (so we can't screw it up!), and routed to the appropriate build tool from the command-line. So we can't just hand your our VCPROJ files; we don't have any!

Up until very recently, the solution to this problem was simply to take that dynamically determined list of files, and dump it into the "source" directory of each platform.  This has a couple of very nice points.  First, you don't have any files you don't need for that version of Granny.  Second, you can simply drag all of those files into your own projects, and you're ready to go!  You don't have anything you don't need, and you aren't missing anything that you require.

That's worked well, but in the last few years, there's been a shift in the way most of our client ship. It's become far more common for Granny customers to be supporting 2 or 3 platforms, and we've been hearing comments that the "one source directory per platform" layout leaves customers with a bad choice.  Either they can combine platform source directories, in which case they no longer have a clean list of files required for each platform, or they can leave them separate, which makes it difficult to manage modifications across platforms.

So: new layout! I've modified the build system to create source trees rather than flat source directories now.  There will still be a "source" directory in each distribution, but any platform or application specific files will now be contained in subdirectories. Now when you copy source distributions on top of each other, file collisions are only possible between identical files, and specific platform files will be kept separate in their own subdirectory.  This lets you share modifications across platforms, while still making it possible to determine exactly which files you need to build a specific version of the runtime, or the exporter.

Tuesday, April 26, 2011

Welcome!

Following Brian's lead at the Telemetry blog, I'll be posting updates about Granny3D here. If you've stumbled on this blog by accident, here's a brief introduction to what we're all about.

Granny3D is a middleware product from RAD Game Tools. Granny handles data export from Max, Maya, and XSI, and runtime animation playback on every gaming platform with a floating point unit and a C compiler. The Granny Animation Studio makes it easy to assemble complicated character behavior graphically, without requiring a round-trip through the programming team.  She also contains a large array of tools for conditioning your data to make it ready for shipping and many, many other utilities. Most of our clients are in the games and entertainment fields, but Granny has been used to help develop everything from education software, to training simulations, to slot machines.

Your humble author is Dave Moore, I handle the development and support of Granny here at RAD. The idea of starting the RAD Granny blog is to provide a place to talk about tips, tricks, and updates that don't quite belong in the Granny documentation and are too long for @granny3d. Thanks for reading!