Thursday, September 8, 2011

Make like Voltron!

Because as awesome as it is to have a lion robot, if you want to swing the really big sword, you've got to tack a bunch of them together.

What I actually want to talk about this time is combination, specifically file combination. That is, taking a large number of GR2 files that represent your character, animation set, level data, etc., and turning them into a single file.

The Granny file format has a lot of powerful features that even clients who have been using the library for years don't use all that often. If I had to make a list of "Most Underused Features of Granny", the ability to create custom file formats would be running away at #1. I'm going to give you an example of how you can squeeze a tremendous amount of space and time out of your existing assets. Bear with me for a bit, because there's going to be quite a bit of foundational "how" material before we get to "why".

Let's talk a bit about the way Granny represents data, since that will become important later on. Granny files are implemented with a two-layer system. The first layer is the underlying file engine which handles compression, endian and pointer-size conversions, and data-graph flattening. It also provides a mechanism for describing the data to be persisted. The second layer is the actual format (such as GR2/granny_file_info), which is described in terms of the first layer's data description language. The second layer is where we specify what we're working with, as well as manipulate the file format over time. (Versioning happens at this level.)

If you're already a Granny user, you might have wondered why it takes two steps to load a Granny file from disk.
  granny_file* TheFile = GrannyReadEntireFile("my_anim.gr2");
  granny_file_info* Info = GrannyGetFileInfo(TheFile);
The two-layer system is why. By the time you have a granny_file object in hand, all of the pointer fixing, endian-marshalling, and pointer-size conversions have been done, but as far as your application is concerned, a granny_file is just a collection of raw bytes with a type description attached. GrannyGetFileInfo does the work of comparing the type stored in the file to the datatype expected by your game. Let's take a quick look into GrannyGetFileInfo in pseudocode. The function essentially boils down to:
  if (File->Header.TypeTag == GrannyCurrentGRNStandardTag)
  {
      return (granny_file_info*)File->RootObject;
  }
  else
  {
      if (File->ConvertedObject == 0)
      {
          // Log Warning about conversion operation
  
          File->ConvertedObject =
               GrannyConvertTree(File->RootType,
                                 GrannyFileInfoType,
                                 File->RootObject);
      }
      return (granny_file_info*)File->ConvertedObject;
  }
Again, if you're a Granny user, and you have a granny_log_callback registered, then you've probably seen logs like: "File has run-time type tag of 0xabcdef, which doesn't match this version of Granny (0xabcdf0)." Here's where they come from! GrannyConvertTree is fairly efficient, but it's still doing a walk of the type tree for the file and the library. We'd much rather be in that first branch of the conditional where we just cast a pointer and return it.

Stepping back to the original topic of combining Granny files, you probably have noticed that all of the functions in GrannyGetFileInfo are available to you in the Granny API. We have GrannyConvertTree, and that's basically it. GrannyFileInfoType is just an array of granny_data_type_definition structures, which are also user-creatable. So why not just make our own format, and call GetMySpecialInfo() instead of GrannyGetFileInfo?
  my_file_database*
  GetMyFileDatabase(granny_file* File)
  {
      if (File->Header.TypeTag == MyCurrentTag)
          return (my_file_database*)File->RootObject;
      else
      {
          if (File->ConvertedObject == 0)
          {
              // Log Warning about conversion operation
      
              File->ConvertedObject =
                   GrannyConvertTree(File->RootType,
                                     MyFileDatabaseType,
                                     File->RootObject);
          }
          return (my_file_database*)File->ConvertedObject;
      }
  }
Once we've figured out what we're going to put into my_file_database, that's pretty much all it takes! Let's design a containing structure that we can stuff a bunch of granny_file_infos into, along with some lookup information so we can pull them out in our game later. Here are our structure declarations:
  struct my_filedb_gr2_entry
  {
      char const*       Filename;
      granny_file_info* FileInfo;
  };

  struct my_file_database
  {
      granny_int32             GR2FileCount;
      granny_filedb_gr2_entry* GR2Files;
  };
  extern granny_data_type_definition MyFileDatabaseType[];
So we have my_file_database, which contains an array of my_filedb_gr2_entry structures, each of which is just a filename and a pointer to a granny_file_info. Next we have to describe those two structures to Granny. A full tutorial on granny_data_type_definitions isn't really in scope for this post, so I'm just going to give you the code, and you can check out the tutorials in the SDK documentation on the complete system.
  granny_data_type_definition GR2EntryType[] =
  {
    { GrannyStringMember,    "Filename" },
    { GrannyReferenceMember, "FileInfo", GrannyFileInfoType },
    { GrannyEndMember }
  };
  
  granny_data_type_definition MyFileDatabaseType[] =
  {
    { GrannyReferenceToArrayMember, "GR2Files", GR2EntryType },
    { GrannyEndMember }
  };
That's all. Notice that GrannyReferenceMember allows you to use other structures described with granny_data_type_definition arrays, so including the granny_file_info type in our structure just requires that one entry. In fact, in the latest version of Granny, this exact structure (plus some extra bells and whistles) is provided to you. The preprocessor can create large collections of GR2 files at tool-time, so loading a character or even a whole level is one simple function call.

See? I promised we'd get to the "why" eventually. Not only is loading a character with
  // Loading:
  granny_file* TheFile = GrannyReadEntireFile("my_character.fdb");
  my_file_database* CharacterDB = GetMyFileDatabase(TheFile);
  
  // Freeing:
  CharacterDB = 0;
  GrannyFreeFile(TheFile);
much simpler than:
  // Loading:
  for (int i = 0; i < NumCharFiles; ++i)
  {
      granny_file* TheFile = GrannyReadEntireFile(CharFiles[i]);
      // anim01.gr2, anim02.gr2, etc.
  
      granny_file_info* CharInfo = GrannyGetFileInfo(TheFile);
  
      // Do something with CharInfo
      // Store TheFile in FilesArray for deletion later
  }
  
  // Freeing:
  for (int i = 0; i < NumCharFiles; ++i)
  {
      GrannyFreeFile(FilesArray[i]);
  }
it's vastly more efficient as well in both space used on disk and in memory, as well as in the time it takes to load the file at runtime.

Let's work out where we're saving space first of all. Roughly speaking, a Granny file contains the following elements:
  • The data itself (granny_file_info, my_file_database, etc.)
  • The type information describing the data (granny_data_type_definition arrays, basically)
  • Housekeeping information (pointer fixups, etc.)
When you have 50 GR2s on disk, they are unable to refer directly to each other, and so you are guaranteed to have 50 copies of the same granny_data_type_definition arrays lying around. These can be jettisoned at runtime with GrannyFreeFileSection, but they are required to satisfy the requirement that each individual Granny file be a distinct, versionable, self-contained unit. Combining all 50 granny_file_infos into one my_file_database means that we have only 1 copy of the type information around. Win.

The second major space savings comes from the string pooling. Again, because Granny files must be self-contained units, and cannot contain a priori references to other files, granny_track_group animation tracks are matched against granny_skeleton bones by name at runtime. Granny does things this way so you can create new models for old animations, or vice versa. Those strings are referred to in every granny_animation, every granny_model, and every granny_mesh. When those entities are in separate files, you have 50 copies of those as well. (Unless you are using a granny_string_table, in which case, well done!) That doesn't sound like much, but let's do some math. Say your character has about 100 bones. Typical names like "Bip01 Spine1", "Bip01 Head", etc., run about 10 bytes per name, let's say. Each GR2 file in our example contains about 1k of string data for the bone names, in this imagined example that gives us 50k worth of duplicated strings. That waste goes away in the combined scenario.

Finally, we come to the big win, runtime speed. For me, this is huge, because game loading times are a personal pet peeve. I am deeply ticked off by loading screens. With modern hardware, there is almost never a technical reason your game cannot load nearly instantly. Combining 50 files into 1 gives us the following effects on runtime cost:
                         N files       1 Combined File

 fopen (or equivalent):  N             1
                 fread:  O(N)          O(1)
      total bytes read:  X bytes       < X bytes
On every single dimension, we're doing much better. Let's look at some real numbers. The following numbers are from a client's data for a single game level. They were working on the Nintendo 3DS, a platform that has relatively low bandwidth to storage and high latency for operations like fopen(). I've done the tests on both the 360 and 3DS, just to get a look at the numbers across two very different platforms. The results are much more dramatic on the 3DS, which uses flash storage instead of reading from a hard drive, but even on the 360, it's a fairly impressive speed up.

Here is our test data set.
                  #Files    Bytes (compressed)  Bytes (raw)

 Separate    GR2: 487       8,000,624
             GSF:  33         126,992
                  520       8,127,616

 Combined           1       4,138,564 (51%)     10,541,056 (131%)
Notice that the space savings from string sharing, type info sharing, and elimination of redundant housekeeping data mean that the combined file is just a little bit larger than the original files, even after turning off file compression! The combined file is nearly half the size of the original data. Frankly, that's a testament to how much effort that the client had put into optimizing the size of their files in the first place. All of the data was already squeezed as tight as possible before we came through with the combination pass, which addressed things that aren't sensitive to the normal space-saving techniques. (Changing mesh formats, altering animation tolerances, etc.). Normally the size of the win would not be so large in relative terms.

Here is the time it takes to load this data set under 3 scenarios on the Nintendo 3DS and the Xbox 360.
                          Nintendo 3DS           Xbox 360

        Separate Files:   70.84s (1.0x)          3.57s (1.0x)
        Combined (raw):    3.03s (!! 23.3x !!)   0.89s (4.0x)
 Combined (compressed):   10.08s (7.0x)          1.84s (1.9x)
It's not often that you get to describe optimization wins with numbers like "23 times faster" in a game that's already been tuned. The Xbox does, frankly, a fairly astonishing job of keeping up in the separate file case, but even there, we cut the time required to load the dataset by more than 75%.

Now we're swinging the big sword! Dropping this into your existing Granny game should be a piece of cake. Because we've maintained the filename in the combined structure, you simply have to replace the bit in your code that goes out to disk with GrannyReadEntireFile or GrannyReadEntireFileFromMemory with a lookup in the shared structure. You get back the same granny_file_info structure, but a lot faster.