Virtual File System – Part 1

Download Source Code

Let’s say you’re working on a commercial type game to be released for Xbox 360. That being said, your artists are happily making hundreds and thousands of high-quality assets(could be textures, models, audio, etc.) to be used in your game. Everything’s going well until one day your artist rushes in your office panicking that your game has reached its 4.7GB (dvd’s storage capacity) limit and your game is still lacking a thousand more assets that needs to be in included your game.

You have 2 options (that i know of) in order to solve this problem:

  1. Cut features and functionality by simplifying levels or removing parts of your game. This means that you need to remove the associated or least important assets that needs to be cut-off in order to save storage space which could possibly make your game a little bit less fun.
  2. OR you could come up with a system that applies compression on your resources files to effectively lessen storage consumption so that you could push in more resources for your game. By doing this,  at least you can perform option 1 as a last resort if your uber compression has reached its maximum compression capabilities.

Compressing raw data, depending on the compression algorithm, can potentially take less storage space

This topic will extensively cover option number 2 or what we call Virtual File Systems. We’ll also be making our own simple virtual file system as we go along discussing the details of this system. Now that you have some idea on what this topic is all about, let’s get started!

What is a Virtual File System(or VFS)?
A virtual File System is a system that creates an abstraction layer between your application and the concrete files your application uses; or what we call our resource files.

Basically a File System is similar to what Operating Systems use like NTFS or FAT32 on Windows and other formats on other operating systems. The only difference is that VFS are built on top of solid File Systems.

* Before we move on, for consistency of terms and definitions, we’ll be referring to a Virtual File System simply as VFS and Packed Files as PAK

So what does it do exactly
With the definitions provided above, it means that VFS provides us our own virtual storage for our applications to use THAT only our application can understand. If this sounds familiar to you, it is because that another good example of a VFS are archived files like zip or rar files. The key to this is archiving. We archive or pack our resource files by wrapping it into ONE HUGE FILE.

If you haven’t noticed yet, try taking a look at one of your favorites triple A game’s directory and try to find if you can find any resources used in the game itself. Chances are you won’t find any *.tga, *.ogg or *.mp3, or any mesh files used in game. But you may see some insanely huge files with weird file extensions like .pak or .WAD(used in Doom). That game’s resources are actually inside those files.

Awesome! So what are the benefits? –  VFS advantages
Here are some of using a VFS:

  • Security : Our "proprietary" format makes it harder for  amateur hackers to easily parse and vandalize or steal our assets.
  • Fast access time : Repetitive opening and closing of small resources files for loading takes quite some time. We can make this faster by using fewer, huge files, contained in VFS, than many smaller ones.
  • Less system resource consumption : Fewer files to manage means fewer file handles which also means using less system resources.
  • Automatic Filter type handling : With a proper pluggable architecture, VFS can automatically handle Compression and Decompression behind the scene. Let it be a zip, rar, tar, gar, meow, woof or any other compressed file format, VFS will automatically decompress or compress it for you.


The cake was a lie? – VFS disadvantages
Unfortunately, nothing comes for free. We have to sacrifice some functionalities in order to gain some.

  • Very slow to add/edit files : Because of the way we structure and pack our files inside a VFS, it is harder to edit, add, and erase files. This is due to the necessary re-compaction, re-arrangement, and updating of important information as an overhead. But these disadvantage are carefully weighed for our needs in games (or other) applications. Like even though that it is fatally not advisable to edit the resources inside the VFS during run-time, there’s no real reason to do so in the first place. This SHOULD only be used as an offline tool.
  • Hard to debug : Since files inside a VFS are no longer recognized by the OS, we have no way of easily checking a file for errors. We have to either extract the file and save it as a separate file, then repack the entire VFS or come with a utility that supports your VFS format and dynamically the assets for your users for easy viewing/editing which can be costly and could take some considerable amount of time to develop. One good example of such tool is Epic’s Generic Browser.

Unreal Editor’s Generic Browser

Features for our VFS
Before we start discussing the intricacies of a VFS, we’ll go through some basic features that our VFS will be having.

  • Pack an entire directory : Our VFS can parse a directory path then pack all the files and folders inside that directory producing a PAK file.
  • Filter callback : Our VFS provides a callback functionality so that we can specify a filter to use when it is about to process a file.
  • Unpack VFS contents : We can extract all of the packed file’s contents and save it in a specified output path.
  • zlib support : As a demonstration of using our pluggable filter architecture, we’ll be using the infamous deflate and inflate algorithm for compressing and decompressing our resource files.
  • Load in memory : Once a PAK file is loaded, we can load a specified file in memory and use it normally as if we load an external file outside the PAK file.

The VFS class
Most VFS samples or tutorials in the internet provides either a very basic, very limited, and often mostly hacked, implementations of the system. I found another good article but it was written in pure C (see references at the very bottom). The problem with this is that most of the unnecessary methods and internal data are exposed to the user which may cause some errors if not used properly. In order to relieve from this problem, we’ll be implementing a simple Object Oriented design just to wrap all the methods and data and only expose what needs to be exposed to the user.

Here is the public interface for the File System class called FileSystem then we’ll discuss a brief overview of the methods in this class:

FileSystem Interface
  1. class FileSystem
  2. {
  3. public:
  5.     enum FILTER_TYPE {
  6.         FILTER_TYPE_DEFAULT = 0x00000000,
  7.         FILTER_TYPE_ZLIB_TEST = 0x00000001,
  8.         // NOTE: Add additional filters here.
  9.         //    Each filter takes one bit of space. 0, 1, 2, 4, 8…
  10.         FILTER_TYPE_FORCE_DWORD = 0x7fffffff,
  11.     };
  13.     typedef FILTER_TYPE (*OnProcessFileCallback)( const CHAR *pFileName, const CHAR *pFilePath );
  14.     typedef DWORD                FilterFlag;
  15.     typedef Optional<PTR_T>        OHFile;
  17.     FileSystem( void );
  18.     ~FileSystem( void );
  20.     BOOL Open( const CHAR *pFileName, BOOL deleteTableEntries = TRUE );
  21.     void Close( void );
  22.     BOOL IsOpen( void ) const;
  24.     OHFile        FindFile( const CHAR *pPath ) const;
  25.     SIZE_T        GetFileSize( const OHFile *pOHFile ) const;
  26.     SIZE_T64    GetFileSize64( const OHFile *pOHFile ) const;
  27.     BOOL        ReadFile( const OHFile *pOHFile, BYTE *pBuff );
  29.     BOOL Create( const CHAR *pDirectory, const CHAR *pOutputPath, OnProcessFileCallback pOnProcessFileCallback = NULL, FilterFlag fBitFlag = FILTER_TYPE_DEFAULT );
  30.     BOOL Unpack( const CHAR *pPAKPath, const CHAR *pOutputPath );
  32. };
  • BOOL Open( const CHAR *pFileName, BOOL deleteTableEntries = TRUE ) : Opens up a PAK file specified in *pFileName and loads the internal directory and file tables. After the directory and file table has been loaded, it generates a File Map out of those tables for fast file searching when calling Find(). The second parameter deleteTableEntries is a Boolean variable whether you want to delete the file and directory table to save a little bit of memory. If you’re not planning on extracting and saving all of its contents outside the PAK file then the tables aren’t needed and can be safely deleted which is the default behavior if you don’t pass a value in the 2nd parameter. This method returns true if loading was successful. Otherwise, false. This is one of the three methods that does most of the high-level heavy lifting.
  • void Close( void ) : Cleans all of its internal resources generated by the Open() method and closes the File Stream handle associated with the PAK file.
  • BOOL IsOpen( void ) : Returns a Boolean whether a PAK file is opened or not.
  • OHFile FindFile( const CHAR *pPath ) : *pPath specifies the internal path including the filename of the file we’re searching for inside a PAK file. The return type takes a little bit of an explanation. Basically OHFile is a handle to the file found in the loaded PAK file but there’s a little bit more to it than just a handle. So you can just safely think of it as a handle for now.
  • SIZE_T GetFileSize( const OHFile *pOHFile ) : This method returns the size of the file specified by the file handle, *pOHFile, returned by the FindFile method. The return value, SIZE_T, is a 32-bit unsigned integer. That means that the maximum size value it can return is 4,294,967,295 bytes or approximately 4GB. If the size of the file is greater than this value, it returns 0 instead. In order to alleviate this problem, another GetSize method, called GetSize64(), is implemented that returns a 64bit unsigned integer or SIZE_T64. See next method below.
  • SIZE_T64 GetFileSize64( const OHFile *pOHFile ) : Same as GetFileSize() but returns a 64-bit unsigned integer.
  • BOOL ReadFile( const OHFile *pOHFile, BYTE *pBuff ) : This method extracts a file, specified by *pOHFile, and loads it in memory by *pBuff. If the file is compressed, the VFS will automatically handle the decompression process as it will go through the required filters before it writes the final raw data to *pBuff. As a note, it is really important that the size of *pBuff SHOULD BE AT LEAST the size of the file we’re loading given by GetFileSize() or GetFileSize64() methods. If the method successfully loaded the file into memory, it returns true. Otherwise false.
  • BOOL Create( const CHAR *pDirectory, const CHAR *pOutputPath, OnProcessFileCallback *OnProcessFileCallback = NULL,
    FilterFlag fBitFlag = FILTER_TYPE_DEFAULT ) :
    This method packs the entire directory specified by *pDirectory. The algorithm recursively goes through each folder inside the directory and processes each file it finds in each folder. The output file can be specified in *pOutputPath. The last two parameters, pOnProcessFileCallback and fBitFlag, specifies the filter options that you want to apply on all or selected files when it gets processed. pOnProcessFileCallback is a function pointer typedefed as:
        typedef FILTER_TYPE (*OnProcessFileCallback)( const CHAR *pFileName, const CHAR *pFilePath );

    VFS will iteratively call this function if the user has specified a Callback. *pFileName is the name of the file to be processed and *pFilePath  is its path relative to *pDirectory as the root path. For example, if we’re packing a directory in “C:\User\Downloads\” and VFS found a file called “funkymusic.mp3” in “C:\User\Downloads\Music\” then *pFilename will be “funkymusic.mp3” and *pFilePath will be “Downloads\Music\”.
    The last parameter in Create() is fBitFlag. In order to use the filters that we want we need to let our VFS know what Filters we’ll be using so that it can initialize the only filters that we’ll be using and thus save memory space instead of initializing all filters that would end up not being used. The default is set to FILTER_TYPE_DEFAULT which is a very basic filter that does nothing but copies each file bit by bit. This is the second of the three methods that does most of the high-level heavy lifting.

  • BOOL Unpack( const CHAR *pPAKPath, const CHAR *pOutputPath ) : Extracts the entire contents of a PAK file specified in *pPAKPath and saves it in *pOutputPath directory.

These public methods uses a number of internal (or private) methods to aid in the process of their work. But before we take a look at these methods, let us take this opportunity to take a look at the internal data structures and typedefs first in order to have some idea on how we will structure our data.

See Virtual File Systems – Part 2 for the continuation of this article.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s