Virtual File System – Part 4

Alright! Starting from here on, we’ll now be covering the FileSystem’s public and private methods which will require a rather lengthy explanation. But as i said before, don’t worry because most of the code are trivial thanks to how we abstracted out most of the work through delegation(private utility methods) and composition(internal data structures).

  1. FileSystem::FileSystem( void )
  2.     :    m_pHeader( NULL ), m_pGenDirs( NULL ), m_pGenFiles( NULL ),
  3.     m_pGenFileList( NULL ), m_pFileMap( NULL ), m_pPulseFile( NULL ),
  4.     m_bLoaded( FALSE )
  5. {
  6.  
  7. }
  8.  
  9. FileSystem::~FileSystem( void )
  10. {
  11.     ReleaseResources();
  12. }

The FileSystem’s constructor simply sets the default value to NULL or FALSE as their default value while the destructor calls the ReleaseResources() method to cleanup and release the allocated memory.

FileSystem method – Create()

Next, we’ll take a look at the Create() method. By looking at this first, we’ll have a better understanding on how our internal PAK file structure will be organized.

First we check if the directory path specified by the user is valid. Then before we start allocating memory, make sure we don’t mess anything up by releasing already allocated resources by calling ReleaseResources(). If we don’t call ReleaseResources() here and the user calls Open() then create, we would leak memory and that is very very bad. Next we initialize our filters specified in fBitFlag so that we can use them later when we individually encode our files. The next few lines of code follows the allocation and setting up the necessary data members for creating our PAK file like m_pHeader, m_pGenDirs, m_pGenFiles. After we have allocated the necessary resources, we set up our header information. We set the header to have the right signature, ID, version, and the filters used values.

  1. BOOL FileSystem::Create( const CHAR *pDirectory, const CHAR *pOutputPath, OnProcessFileCallback pOnProcessFileCallback /* = NULL */, FilterFlag fBitFlag /*= FILTER_TYPE_DEFAULT*/ )
  2.     {
  3.         String    directory    = pDirectory;
  4.         BOOL    bReturn        = TRUE;
  5.  
  6.         // Make sure pDirectory isn’t empty and the path exists
  7.         if ( !pDirectory || !System::IsDirectoryExist( pDirectory ) )
  8.             return FALSE;
  9.  
  10.         ReleaseResources();
  11.  
  12.         InitializeFilters( fBitFlag );
  13.  
  14.         m_pOnProcessFile = pOnProcessFileCallback;
  15.  
  16.         m_pHeader    = new FileHeader;
  17.         m_pGenDirs    = new PAKGenDirList;
  18.         m_pGenFiles    = new PAKGenFilePairList;
  19.  
  20.         PSX_Assert( m_pHeader && m_pGenDirs && m_pGenFiles, "Failed to allocate memory." );
  21.  
  22.         // Setup header information
  23.         m_pHeader->m_signature = FileSystem::SIGNATURE;
  24.         PSX_StrCpy( m_pHeader->m_ID, "pfs", 4 );
  25.         m_pHeader->m_version = 0×000100; // TODO:  Create macro to generate pfs version
  26.         m_pHeader->m_filterBitField = fBitFlag;

Error checking and initialization code.

Next. You have to take note that all the files processed and packed in the PAK file will have all their absolute paths invalidated. What is important is their relative path instead. The file’s path should have the path relative to pDirectory as their root path. Let us say that pDirectory is “c:\music\” and we are currently in “c\music\michael jackson\”. Then pPAKPath is “music\michael jackson\”.

  1. // Get the relative root directory ( choosen directly )
  2. INT len = PSX_StrLen( directory.GetCString() );
  3. CHAR *pPtr = const_cast< CHAR*>( directory.GetCString() + len );
  4. CHAR *pRootSeparator = const_cast< CHAR*>(directory.GetCString() + 2); // get the first separator( i.e c:\ )
  5. String pPAKPath;
  6.  
  7. // Look for a seperator that’s not the root
  8. while ( pPtr != pRootSeparator && *pPtr != PSX_String(‘\\’) )
  9.     –pPtr;
  10.  
  11. // If not the root separator then we need to remove the last separator in the string.
  12. if ( pPtr != pRootSeparator && PSX_StrLen( pPtr + 1 ) == 0 )
  13. {
  14.     directory[ pPtr - directory.GetCString() ] = PSX_String();
  15.  
  16.     // Look again for the next seperator that’s not the root
  17.     while ( pPtr != pRootSeparator && *pPtr != PSX_String(‘\\’) )
  18.         –pPtr;
  19. }
  20. // Move one char forward starting either to the first char of dir name or null(empty).
  21. ++pPtr;
  22.  
  23. // There could be no directory.
  24. // Then pPAKPath is empty
  25. if ( PSX_StrLen( pPtr ) )
  26.     pPAKPath = pPAKPath + pPtr;

Extracting the relative path by making pDirectory as the root path.

Finally, after we have all the necessary information set up, we can now start generating the table entries then create the PAK file.

  1.     // Generate PAK folder and file entries then create the Pulse PAK File
  2.     if ( !_GenerateTableEntries( directory.GetCString(), pPAKPath.GetCString() ) )
  3.     {
  4.         bReturn = FALSE;
  5.         goto EndCreatePAK;
  6.     }
  7.  
  8.     if ( !_CreatePulseFile( pOutputPath ) )
  9.     {
  10.         bReturn = FALSE;
  11.         goto EndCreatePAK;
  12.     }
  13.  
  14. EndCreatePAK:
  15.     ReleaseFilters();
  16.     PSX_SafeDelete( m_pGenFiles );
  17.     PSX_SafeDelete( m_pGenDirs );
  18.     PSX_SafeDelete( m_pHeader );
  19.  
  20.     return bReturn;
  21. }

Generate table entries by calling _GeneratetableEntries then creating the PAK file by calling _CreatePulseFile(). If one of these functions fail, it sets bReturn to false then immediately jumps to EndcreatePAK: where it makes sure we don’t leak any memory before we pop out of this function.

We’ll take a look at _GenerateTableEntries() then _CreatePulseFile() next.

FileSystem method – _GenerateTableEntries()

_GenerateTableEntries() is the crème of the crop of our algorithm. This method is the one responsible for recursively going through each directory and files generating directory and file entries for our _CreatePulseFile() method to use.

As usual, we declare our needed temporary variables to use first.

    1. BOOL FileSystem::_GenerateTableEntries( const CHAR *pFolderPath, const CHAR *pPAKPath )
    2.     {
    3.         // Iterate through each file in the folder
    4.         DirEntryPointer pNewDir;
    5.         WIN32_FIND_DATA nextFile;
    6.         HANDLE            hFind;
    7.         DirList            tempFolderPaths;    // Temporarily store found folders here   
    8.         String            dirBuff;
    9.         String            newDirPAKPath;
    10.         String            newDirFolderPath;
    11.         DirPathPointer    fileDirPath;        // File entries needs this when we’re about to parse each file later
    12.         BOOL            bAddSeparator = TRUE;
    13.         BOOL            bReturn = TRUE;
    14.  
    15.         fileDirPath =  new DirPath;
    16.         *fileDirPath = pFolderPath;
  • DirEntryPointer pNewDir : pNewDir will be used to hold our generated Directory Entry.
  • WIN32_FIND_DATA nextFile : This is a win32 specific data structure that holds information about the found file.
  • HANDLE hFind : This find handle is used by windows to keep track of what file or directory we’re currently traversing.
  • DirList tempFolderPaths : tempFolderPaths may look confusing (which it is) but this is basically just a string contained in a smartpointer that is contained in a list. This will be used later on as a temporary storage for every directory that we find.
  • String dirBuff : temporary string buffer used for formatting the string path for file and directory searching.
  • String newDirPAKPath : temporary string buffer used to generate the relative path (from pDirectory as root) for recursively calling _GenerateTableEntries()
  • String newDirFolderPath : temporary string buffer used to generate the absolute path for recursively calling _GenerateTableEntries()
  • DirPathPointer fileDirPath : temporary string buffer stored in a SmartPointer<> used to hold pFolderPath. We use SmartPointer so that there will only be one copy of it per directory. And we also don’t have to worry about deleting it later.
  • BOOL bAddSeperator : pFolder path could end with a separator ( ‘\’ for windows and ‘/’ for linux ). This makes sure that we don’t double appened our separator for generating our paths.
  • BOOL bReturn :  Is simple used as a return value to indicate if successful(true) or not(false).

We first check the passed in pFolder path if it ends with a separator or not. This could happen if, for example, the user could pass in a root drive path like “c:\”. In that case, we don’t need to add another separator because it would look like “c:\\” which is not valid. Remember that this function is responsible for generating the directory and file tables. Since we have pPAKPath wich is the path relative to pDirectory(Create()) we need to store it into the list (m_pGenDirs).

  1. if ( *(pFolderPath + PSX_StrLen( pFolderPath ) – 1) == PSX_String( ‘\\’ ) )
  2.     bAddSeparator = FALSE;
  3.  
  4. // Insert new directory
  5. pNewDir = new DirEntry;
  6.  
  7. if ( pPAKPath && PSX_StrLen( pPAKPath ) )
  8. {
  9.     pNewDir->m_PAKData.m_nameLen = PSX_StrLen( pPAKPath );
  10.     pNewDir->m_name                 = pPAKPath;
  11. }
  12. else
  13. {
  14.     // Insert anyway since we’re now using path index for file entries.
  15.     pNewDir->m_PAKData.m_nameLen = 0;
  16.     pNewDir->m_name                 = PSX_String("");
  17. }
  18.  
  19. // Put in the list then Increment num directories
  20. m_pGenDirs->PushBack( pNewDir );
  21. ++m_pHeader->m_numDirs;

Checking for a separator and creating a new directory entry.

Next we now prepare for finding each file and directory in pFolderPath. We do this by first making sure that the pFolderPath string has an * appended at the end of the string indicating we’re searching for all items in the current directory path. To begin searching, we’ll be using the help of the Win32 API function called FindFirstFile() and FindNextFile() functions. We only need to call FindFirstFile() once to let windows know we’re starting from the start. Then we enter into a loop calling FindNextFile() until it returns 0 indicating we’ve finished searching(or a fatal error has occurred). Inside the loop, it simply checks if the found object meets the attributes. It can’t be itself(.), a previous directory(..), a system file, or hidden. If it passes, then we’re sure that it could either be a directory or a file. If it is a, directory we simply add it into a temporary list for processing later. If it is a file then we generate a new file entry and insert it into the list(m_pGenFiles).

  1. // Prepare string path to allow for file searching
  2. dirBuff = String(pFolderPath) + PSX_String("\\*");
  3. PSX_ZeroMem( &nextFile, sizeof( WIN32_FIND_DATA ) );
  4. hFind = FindFirstFile( dirBuff.GetCString(), &nextFile );
  5.  
  6. if ( hFind == INVALID_HANDLE_VALUE )
  7.     return FALSE;
  8. do
  9. {
  10.     // Don’t include the following attributes and directories
  11.     if ( PSX_StrCmp( nextFile.cFileName, PSX_String(".") ) == 0 ||
  12.         PSX_StrCmp( nextFile.cFileName, PSX_String("..") ) == 0 ||
  13.         (nextFile.dwFileAttributes & FILE_ATTRIBUTE_SYSTEM)        ||
  14.         (nextFile.dwFileAttributes & FILE_ATTRIBUTE_HIDDEN) )
  15.         continue;
  16.  
  17.     // If a directory…
  18.     if ( nextFile.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY )
  19.     {
  20.         // Store Folder name and traverse them later
  21.         DirPointer newFolder = new Directory;
  22.  
  23.         *newFolder = nextFile.cFileName;
  24.         tempFolderPaths.PushBack( newFolder );
  25.     }
  26.     else // If a file…
  27.     {
  28.         // New File entry
  29.         FileEntryPointer    pNewFileEntry    = new DirFileEntry;
  30.         DirPathPointer        pnewFileDirPath    = new DirPath;
  31.  
  32.         // Copy File entry info
  33.         pNewFileEntry->m_PAKData.m_size         = nextFile.nFileSizeLow;
  34.         pNewFileEntry->m_PAKData.m_size         += nextFile.nFileSizeHigh; // Zero if not greater than DWORD
  35.         pNewFileEntry->m_name                 = nextFile.cFileName;
  36.         pNewFileEntry->m_PAKData.m_nameLen     = pNewFileEntry->m_name.GetLength();
  37.         pNewFileEntry->m_PAKData.m_pathIndex = m_pHeader->m_numDirs – 1;    // Zero based index
  38.  
  39.         // Store pointer to the file’s directory path
  40.         pnewFileDirPath = fileDirPath;
  41.  
  42.         // Finally insert in the list
  43.         m_pGenFiles->PushBack( FileEntryPair( pNewFileEntry, pnewFileDirPath ) );
  44.  
  45.         // Update header information
  46.         ++m_pHeader->m_numFiles;
  47.     }
  48.  
  49. } while ( FindNextFile( hFind, &nextFile ) != 0 );
  50.  
  51. // If not ERROR_NO_MORE_FILES then something bad happened.
  52. if ( GetLastError() != ERROR_NO_MORE_FILES )
  53. {
  54.     FindClose( hFind );
  55.     bReturn = FALSE;
  56.     goto EndGenerateTable;
  57. }
  58.  
  59. // This find handle isn’t needed anymore.
  60. FindClose( hFind );

Looking for files and directories inside the specified directory path.

The last remaining code simply traverses through tempFolderPaths list then calls itself(_GenerateTableEntries()), with proper path formatting, until it reaches the last item in the tempFolderPaths list. After its done traversing though the list, it simply clears out the allocated memory for tempFolderPaths and returns the value of bReturn.

  1.     // Now traverse through each of the folders
  2.     DirPathList::Iterator iter        = tempFolderPaths.IteratorBegin();
  3.     DirPathList::Iterator iterEnd    = tempFolderPaths.IteratorEnd();
  4.  
  5.     while ( iter != iterEnd )
  6.     {
  7.         // Prepare the included directory
  8.         // Don’t add separator if there’s already one at the end.
  9.         if ( bAddSeparator )
  10.         {
  11.             newDirFolderPath    = String( pFolderPath ) + PSX_String("\\") + (*iter)->GetCString();
  12.             newDirPAKPath        = String( pPAKPath ) + PSX_String("\\") + (*iter)->GetCString();
  13.         }
  14.         else
  15.         {
  16.             newDirFolderPath    = String( pFolderPath ) + (*iter)->GetCString();
  17.             newDirPAKPath        = String( pPAKPath ) + (*iter)->GetCString();
  18.         }
  19.  
  20.         if ( !_GenerateTableEntries( newDirFolderPath.GetCString(), newDirPAKPath.GetCString() ) )
  21.         {
  22.             bReturn = FALSE;
  23.             goto EndGenerateTable;
  24.         }
  25.  
  26.         ++iter;
  27.     }
  28.  
  29. EndGenerateTable:
  30.     tempFolderPaths.Clear();
  31.  
  32.     return bReturn;

See Virtual File System – Part 5 for the continuation of this article.

Virtual File System – Part 3

Private Utility Methods

FileSystem Interface
  1. class FileSystem
  2. {
  3. public:
  4.  
  5.     /* Public method interface */
  6.  
  7. private:
  8.  
  9.     /* Internal class declarations */
  10.  
  11.     /* Internal typedefs */
  12.  
  13.     // Internal functions used to manage Pulse file data
  14.     BOOL InitializeFilters( FilterFlag fFlag );
  15.     void ReleaseFilters( void );
  16.     BOOL _GenerateTableEntries( const CHAR *pFolderPath, const CHAR *pPAKPath );
  17.     BOOL _CreatePulseFile( const CHAR *pOutputPath );
  18.     BOOL ReadPAKInfo( const CHAR *pPAKFilePath, PulseFileHeader *pPAKHeader, PAKGenDirList *pPAKDirList, PAKGenFileList *pPAKFileList );
  19.     BOOL VerifyPulseHeader( struct PulseFileHeader *pHeader );
  20.     BOOL VerifyHeaderSignature( const Signature *pSig );
  21.     BOOL VerifyHeaderFormat( const CHAR *pFormat );
  22.     void Encode( PulseDirFileEntry *pFileEntry, IReader *pReader, IWriter *pWriter );
  23.     void Decode( PulseDirFileEntry *pFileEntry, IReader *pReader, IWriter *pWriter );
  24.     void ReleaseResources( void );
  25.  
  26. };

  • BOOL InitializeFilters( FilterFlag fFlag ) : Initializes the filters to be used specified by fFlag. Methods like Create(), Unpack(), and ReadFile() calls this function in order to properly filter the file to be processed.
  • void ReleaseFilters( void ) : Simply releases the allocated resources for the filters.
  • BOOL _GenerateTableEntries( const CHAR *pFolderPath, const CHAR *pPAKPath ) : This method is used by the Create() method to recursively go through each file and directory in the specified directory path in Create() then generate a file and or a directory entry table. This function returns true if successfully. Otherwise, false.
  • BOOL _CreatePulseFile( const CHAR *pOutputPath ) : Inside the Create() methid, after generating the table entries by calling _GenerateTableEntries() the _CreatePulseFile() method follows. This method is the one responsible for finally creating the PAK File containing all the packed data together with the directory and file table entries then finally the header. This function returns true if successfully. Otherwise, false.
  • BOOL ReadPAKInfo( const CHAR *pPAKFilePath, PulseFileHeader *pPAKHeader, PAKGenDirList *pPAKDirList, PAKGenFileList *pPAKFileList ) : This method is used by the Open() and Unpack() methods. *pPAKFilePath is the path where the path file is located. Then you need to pass in the addresses of PulseFileHeader, PAKGenDirList and PAKGenFileList instances to store information about the PAK file. Returns true if successful. Otherwise, false.
  • BOOL VerifyPulseHeader( struct PulseFileHeader *pHeader ) : This method is internally called by ReadPAKInfo(). After reading in the FileHeader, it verifies the signature, file format ID and version(check not implemented) if this is a valid PAK file format. Returns true if the PAK FileHeader is valid. Otherwise, false.
  • BOOL VerifyHeaderSignature( const Signature *pSig ), BOOL VerifyHeaderFormat( const CHAR *pFormat ) : Used by verifyPulseHeader() to verify the signature and header format. True if it verifies it successfully. Otherwise, false.
  • void Encode( PulseDirFileEntry *pFileEntry, IReader *pReader, IWriter *pWriter ), void Decode( PulseDirFileEntry *pFileEntry, IReader *pReader, IWriter *pWriter ) : Abstraction layer method that handles the needed filters to use for encoding or decoding files.
  • void ReleaseResources( void ) : Simply releases all the allocated resources.

Great! Now that we’ve managed to get the interface and internal data structures out of the way, let’s now take a look at the magic on where it all happens! Hold on to your sits because this is going to be a relatively long discussion.

The Implementation

Instead of presenting the entire source code i’ll be splitting this up into seperate functions then talk about each of them. This makes it easier for me, and for you the reader too, to talk about a specific implementation w/o keeping on moving the pages up and down a thousand times just to check the code then back to the explanation again. OR, if you prefer, you can download the source here as a reference as you go along reading this article here:
http://www.codaset.com/codesushi/pulse-tec/source/FileSystemArticle

The following code below defines the Read and Write data methods of FileHeader, DirEntry, and DirFileEntry methods.

  1. void FileSystem::FileHeader::WriteData( FileSystem::IWriter *pWriter )
  2.     {
  3.         pWriter->Write( (BYTE*)this, sizeof( FileHeader ) );
  4.     }
  5.    
  6.     void FileSystem::FileHeader::ReadData( FileSystem::IReader *pReader )
  7.     {
  8.         pReader->Read( (BYTE*)this, sizeof( FileHeader ) );
  9.     }
  10.  
  11.     void FileSystem::DirEntry::WriteData( FileSystem::IWriter *pWriter )
  12.     {
  13.         // Write PAKData info
  14.         pWriter->Write( reinterpret_cast< BYTE *>(&m_PAKData), sizeof( PAKData ) );
  15.         // Then the remaining other info like strings and stuff
  16.         pWriter->Write( (BYTE *)m_name.GetCString(), PSX_StrSize( m_name.GetCString() ) );
  17.     }
  18.  
  19.     void FileSystem::DirEntry::ReadData( FileSystem::IReader *pReader )
  20.     {
  21.         CHAR tempStr[ PSX_MAX_PATH ];
  22.  
  23.         // Read PAKData info
  24.         pReader->Read( reinterpret_cast< BYTE *>(&m_PAKData), sizeof( PAKData ) );
  25.         // Then remaining other info like strings
  26.         pReader->Read( reinterpret_cast< BYTE *>(tempStr), m_PAKData.m_nameLen );
  27.         tempStr[ m_PAKData.m_nameLen ] = PSX_String( );
  28.  
  29.         m_name = tempStr;
  30.     }
  31.  
  32.     void FileSystem::DirFileEntry::WriteData( FileSystem::IWriter *pWriter )
  33.     {
  34.         pWriter->Write( reinterpret_cast< BYTE * >(&m_PAKData), sizeof( PAKData ) );
  35.         pWriter->Write( (BYTE*)m_name.GetCString(), PSX_StrSize( m_name.GetCString() ) );
  36.     }
  37.  
  38.     void FileSystem::DirFileEntry::ReadData( FileSystem::IReader *pReader )
  39.     {
  40.         CHAR tempStr[ PSX_MAX_PATH ];
  41.  
  42.         // Read PAKData info
  43.         pReader->Read( reinterpret_cast< BYTE *>(&m_PAKData), sizeof( PAKData ) );
  44.         // Then remaining other info like strings
  45.         pReader->Read( reinterpret_cast< BYTE *>(tempStr), m_PAKData.m_nameLen );
  46.         tempStr[ m_PAKData.m_nameLen ] = PSX_String( );
  47.  
  48.         m_name = tempStr;
  49.     }

 

The WriteData() methods simply uses the *pWriter which could be a pointer to a memory OR a filestream then calls the IWriter::Write() method to write the data. If you take a look at DirEntry or DirFileEntry’s WriteData() method, you’ll see that i’m writting the entire m_PAKData in one call. Then after that follows the string name which we need to get the CString equivalent or the pointer that actually points to the character array that contains the string. The ReadData() is almost the same, except that it reads data from either a file or memory and stores its data in their respective data structures.

Now we move on to filters. We’ll first take a look at the filter’s default behavior then the derived zlibtest filter.

  1. BOOL FileSystem::DataFilter::Encode( FileSystem::IReader *pReader, FileSystem::IWriter *pWriter )
  2.     {
  3.         PSX_Assert( pReader && pWriter, "Parameter is NULL." );
  4.         BYTE byte;
  5.  
  6.         while( !pReader->IsDone() )
  7.         {
  8.             pReader->Read( &byte, 1 );
  9.             pWriter->Write( &byte, 1 );
  10.         }
  11.  
  12.         return TRUE;
  13.     }
  14.  
  15.     BOOL FileSystem::DataFilter::Decode( FileSystem::IReader *pReader, FileSystem::IWriter *pWriter )
  16.     {
  17.         PSX_Assert( pReader && pWriter, "Parameter is NULL." );
  18.         BYTE byte;
  19.  
  20.         while( !pReader->IsDone() )
  21.         {
  22.             pReader->Read( &byte, 1 );
  23.             pWriter->Write( &byte, 1 );
  24.         }
  25.  
  26.         return TRUE;
  27.     }

The default filter doesn’t do much but copy every file passed to it bit by bit. Inside the while loop. Preader->IsDone() is used to check whether we’ve hit an EOF if we’re reading from a file or a certain read size limit then it returns true. Inside the while loop simply copies one bit of data by calling Read() then writing it to the destination by calling write().

The zlibtest encode and decode code is based on the zlib site’s “sample usage” site . Please check www.zlib.com for more information about zlib.

  1. FileSystem::ZLibTest::ZLibTest( void )
  2.     {
  3.         m_pIn  = new BYTE[ MEMORY_CHUNK ];
  4.         m_pOut = new BYTE[ MEMORY_CHUNK ];
  5.  
  6.         PSX_Assert( m_pIn || m_pOut, "Out of memory." );
  7.     }
  8.  
  9.     FileSystem::ZLibTest::~ZLibTest( void )
  10.     {
  11.         delete [] m_pOut;
  12.         delete [] m_pIn;
  13.     }
  14.  
  15.     BOOL FileSystem::ZLibTest::Encode( FileSystem::IReader *pReader, FileSystem::IWriter *pWriter )
  16.     {
  17.         INT ret, flush;
  18.         unsigned have;
  19.  
  20.         // Allocate deflate state
  21.         m_strm.zalloc = MemoryManager::zlibAlloc;
  22.         m_strm.zfree  = MemoryManager::zlibFree;
  23.         m_strm.opaque = Z_NULL;
  24.  
  25.         ret = deflateInit( &m_strm, LEVEL );
  26.         PSX_Assert( ret == Z_OK, "Failed to initialize ZlibTest." );
  27.  
  28.         // Compress
  29.         do
  30.         {
  31.             m_strm.avail_in = pReader->Read( m_pIn, MEMORY_CHUNK );
  32.             // TODO: error check here…
  33.  
  34.             flush = pReader->IsDone() ? Z_FINISH : Z_NO_FLUSH;
  35.             m_strm.next_in = m_pIn;
  36.  
  37.             // Run deflate on input until output buffer not full,
  38.             // finish compression if all of source has been read in
  39.             do
  40.             {
  41.                 m_strm.avail_out = MEMORY_CHUNK;
  42.                 m_strm.next_out = m_pOut;
  43.                 ret = deflate( &m_strm, flush );    // no bad return value
  44.                 PSX_Assert( ret != Z_STREAM_ERROR, "Error executing deflate compression." );
  45.                 have = MEMORY_CHUNK – m_strm.avail_out;
  46.                 if ( pWriter->Write( m_pOut, have ) != have /* || pWriter->IsError()(not implemented) */ )
  47.                     goto zlibEncodeFail;
  48.  
  49.             } while ( m_strm.avail_out == 0 );
  50.             PSX_Assert( m_strm.avail_in == 0, "Compression failed." ); // All input should be used
  51.  
  52.             // Done when the last data in file is processed
  53.         } while ( flush != Z_FINISH );
  54.         PSX_Assert( ret == Z_STREAM_END, "Error executing deflate compression." );    // Steam should be complete
  55.  
  56.         deflateEnd( &m_strm );
  57.         return TRUE;
  58.  
  59.     zlibEncodeFail:
  60.  
  61.         deflateEnd( &m_strm );
  62.         return FALSE;
  63.     }
  64.  
  65.     BOOL FileSystem::ZLibTest::Decode( class IReader *pReader, class IWriter *pWriter )    // Inflate
  66.     {
  67.         INT  ret;
  68.         UINT have;
  69.  
  70.         m_strm.zalloc = MemoryManager::zlibAlloc;
  71.         m_strm.zfree  = MemoryManager::zlibFree;
  72.         m_strm.opaque = Z_NULL;
  73.         m_strm.avail_in = 0;
  74.         m_strm.next_in = Z_NULL;
  75.  
  76.         ret = inflateInit( &m_strm );
  77.         PSX_Assert( ret == Z_OK, "Failed to initialize ZlibTest." );
  78.  
  79.         // decompress until stream is done or EOF
  80.         do
  81.         {
  82.             m_strm.avail_in = pReader->Read( m_pIn, MEMORY_CHUNK );
  83.             // TODO: Do error check here
  84.             if ( m_strm.avail_in == 0 )
  85.                 break;
  86.  
  87.             m_strm.next_in = m_pIn;
  88.  
  89.             // Run inflate() on input until output buffer not full
  90.             do
  91.             {
  92.                 m_strm.avail_out = MEMORY_CHUNK;
  93.                 m_strm.next_out = m_pOut;
  94.  
  95.                 ret = inflate( &m_strm, Z_NO_FLUSH );
  96.                 PSX_Assert( ret != Z_STREAM_ERROR, "Error executing inflate()." );
  97.                
  98.                 switch( ret )
  99.                 {
  100.                 case Z_NEED_DICT:
  101.                     ret = Z_DATA_ERROR;    // And fall through
  102.                 case Z_DATA_ERROR:
  103.                 case Z_MEM_ERROR:
  104.                     goto zlibDecodeFail;
  105.                 }
  106.  
  107.                 have = MEMORY_CHUNK – m_strm.avail_out;
  108.  
  109.                 if ( pWriter->Write( m_pOut, have ) != have /* || pWriter->IsError() */ )
  110.                     goto zlibDecodeFail;
  111.  
  112.             } while ( m_strm.avail_out == 0 );
  113.  
  114.         } while ( ret != Z_STREAM_END );
  115.  
  116.         inflateEnd( &m_strm );
  117.         return ret == Z_STREAM_END ? TRUE : FALSE;
  118.  
  119.     zlibDecodeFail:
  120.  
  121.         inflateEnd( &m_strm );
  122.         return FALSE;
  123.     }

Besides from the esoteric nature of the encode and decode code,we allocate a memory chunk for the deflate and inflate algorithms to use in the constructor. Then after we’re done using the zlibtest filter the allocated memory chunk is deallocated in the destructor. Although ideally, it is not advisable to do your initialization in the constructor because you don’t have any way of knowing if it successfully initialized or not. I recommend placing your initialization code in a seperate Initialize() method that returns some type of error or return code to indicate if it was initialized successfully or not.

Last of the internal data structures and its  implementation is the derived Reader and Writer classes. Just to freshen up your memory, the IReader and IWriter interface classes are included below.

  1. class FileSystem::IReader
  2. {
  3. public:
  4.     virtual ~IReader( void ) { }
  5.     virtual SIZE_T Read( BYTE *pBuffer, SIZE_T size ) = 0;
  6.     virtual BOOL IsDone( void ) = 0;
  7.     virtual SIZE_T64 BytesLeft( void ) = 0;
  8. };
  9.  
  10. class FileSystem::IWriter
  11. {
  12. public:
  13.     virtual ~IWriter( void ) { }
  14.     virtual SIZE_T Write( BYTE *pBuffer, SIZE_T size ) = 0;
  15. };
  16.  
  17. class FileSystem::FileReader : public IReader
  18. {
  19. public:
  20.  
  21.     FileReader( void );
  22.     virtual SIZE_T Read( BYTE *pBuffer, SIZE_T size );
  23.     virtual BOOL IsDone( void );
  24.     void SetFileStream( FileIO *pFile );
  25.     void SetReadLimit( SIZE_T64 byteSize );
  26.     virtual SIZE_T64 BytesLeft( void );
  27.  
  28. private:
  29.  
  30.     SIZE_T64 m_fileSize;
  31.     SIZE_T64 m_bytesRead;
  32.     BOOL    m_bLimitRead;
  33.     FileIO    *m_pFile;
  34.  
  35. };
  36.  
  37. class FileSystem::MemoryReader : public IReader
  38. {
  39. public:
  40.  
  41.     virtual SIZE_T Read( BYTE *pBuffer, SIZE_T size );
  42.     void SetBuffer( BYTE *pBuffer ) { m_pBuffer = pBuffer; }
  43.  
  44.     // TODO: Not implemented. Fix This!!!
  45.     virtual SIZE_T64 BytesLeft( void ) { return 0; }
  46.  
  47. private:
  48.  
  49.     BYTE    *m_pBuffer;
  50.     // TODO: Implement methods to keep track of buffer; counter and size…
  51.     //SIZE_T
  52. };
  53.  
  54. class FileSystem::FileWriter : public IWriter
  55. {
  56. public:
  57.  
  58.     virtual SIZE_T Write( BYTE *pBuffer, SIZE_T size ) { return m_pFile->Write( pBuffer, size ); }
  59.     void SetFileStream( FileIO *pFile ) { m_pFile = pFile; }
  60.  
  61. private:
  62.  
  63.     FileIO    *m_pFile;   
  64. };
  65.  
  66. class FileSystem::MemoryWriter : public IWriter
  67. {
  68. public:
  69.  
  70.     void SetBuffer( BYTE *pBuffer ) { m_pBuffer = pBuffer; m_curPos = 0; }
  71.     virtual SIZE_T Write( BYTE *pBuffer, SIZE_T size ) { PSX_MemCopy( m_pBuffer + m_curPos, pBuffer, size ); m_curPos += size; return size; }
  72.  
  73. private:
  74.  
  75.     BYTE    *m_pBuffer;
  76.     POS_T   m_curPos;
  77. };
  78.  
  79. FileSystem::FileReader::FileReader( void )
  80. : m_fileSize( 0 ), m_bytesRead( 0 ), m_bLimitRead( FALSE ), m_pFile( 0 )
  81. {
  82.  
  83. }
  84.  
  85. SIZE_T FileSystem::FileReader::Read( BYTE *pBuffer, SIZE_T size )
  86. {
  87.     if ( m_bLimitRead && m_bytesRead >= m_fileSize )
  88.         return 0;
  89.  
  90.     // Fix size if it is over the limit
  91.     if ( m_bLimitRead && size > (m_fileSize – m_bytesRead) )
  92.         size = static_cast<SIZE_T>(m_fileSize – m_bytesRead);
  93.  
  94.     m_bytesRead += size;
  95.     return m_pFile->Read( pBuffer, size );
  96. }
  97.  
  98. BOOL FileSystem::FileReader::IsDone( void )
  99. {
  100.     if ( m_bLimitRead && m_bytesRead >= m_fileSize )
  101.         return TRUE;
  102.  
  103.     return m_pFile->IsEOF();
  104. }
  105.  
  106. void FileSystem::FileReader::SetFileStream( FileIO *pFile )
  107. {
  108.     m_pFile = pFile;
  109.     m_bLimitRead = FALSE;
  110. }
  111.  
  112. void FileSystem::FileReader::SetReadLimit( SIZE_T64 byteSize )
  113. {
  114.     m_fileSize = byteSize;
  115.     m_bytesRead = 0;
  116.     m_bLimitRead = TRUE;
  117. }
  118.  
  119. SIZE_T64 FileSystem::FileReader::BytesLeft( void )
  120. {
  121.       returnm_bLimitRead ? m_fileSize – m_bytesRead : 0;
  122. }

A rather lengthy but trivial block of code. You may have notice that the MemoryReader class is not implemented. It’s because it is not currently used by the provided public interface. Consider it as a homework for you to implement on your own VFS. :) As you can see, the derived Reader and Writer classes simply acts as an abstraction for its source whether it is from a file or memory. The interface is exactly the same and the methods simply calls the write functions of its data members. But for the File and Memory Reader we had to add a simple additional feature. Which is the SetReadLimit. We had to do this because if we’re reading from a pak file and only extract a file from it, we need to find a way to figure out the total size we’ve read. And the End-Of-File option is not going to work because it’ll keep on reading until the end of the PAK file.

See Virtual File System – Part 4 for the continuation of this article.

Virtual Fule System – Part 2

Internal Data Structures

FileSystem Interface
  1. class FileSystem
  2. {
  3. public:
  4.  
  5.     /* Public method interface */
  6.  
  7. private:
  8.  
  9.     // Internal file system data types.
  10.     struct FileHeader;
  11.     struct DirEntry;
  12.     struct DirFileEntry;
  13.     class DataFilter;
  14.     class ZLibTest;    // Test filter using zlib
  15.     class IReader;
  16.     class IWriter;
  17.     class FileReader;
  18.     class MemoryReader;
  19.     class FileWriter;
  20.     class MemoryWriter;
  21.  
  22.     // Internal bookeeping typedefs for generating a PAK file.
  23.     typedef SmartPointer< File >                            FilePointer;
  24.     typedef SmartPointer< DataFilter >                        FilterPointer;
  25.     typedef String                                            FilePath;
  26.     typedef String                                            DirPath;
  27.     typedef String                                            Directory;
  28.     typedef SmartPointer< FilePath >                        FilePathPointer;
  29.     typedef SmartPointer< DirPath >                            DirPathPointer;
  30.     typedef SmartPointer< Directory >                        DirPointer;
  31.     typedef List< FilePathPointer >                            FilePathList;
  32.     typedef List< DirPathPointer >                            DirPathList;
  33.     typedef List< DirPointer >                                DirList;
  34.     typedef SmartPointer< DirFileEntry >                FileEntryPointer;
  35.     typedef PSX_Pair< FileEntryPointer, DirPathPointer >    FileEntryPair;
  36.     typedef SmartPointer< DirEntry >                    DirEntryPointer;
  37.     typedef    List< DirEntryPointer >                            PAKGenDirList;
  38.     typedef List< FileEntryPair >                            PAKGenFilePairList;
  39.     typedef List< FileEntryPointer >                        PAKGenFileList;
  40.     typedef PSX_Pair< FILTER_TYPE, FilterPointer >            FilterPair;
  41.     typedef Map< FILTER_TYPE, FilterPointer >                FilterMap;
  42.     typedef PSX_Pair< String, FileEntryPointer >            FileMapPair;
  43.     typedef Map< String, FileEntryPointer >                    FileEntryMap;
  44.  
  45.     // Internal functions used to manage Pulse file data
  46. };

Okay, I think I can hear your voice screaming now! Calm down! Take a deep breath and just relax… because most of these internal data structures are just small containers with one or two methods consisting of a few, straight-forward lines of code. First, let’s concentrate on the internal data structures.

  • FileHeader : This structure contains all the important high-level information of a PAK file.
  1.     // NOTE: This is exactly 56 bytes in size. We can get away with this w/o
  2.     // doing a #pragma pack(1).
  3.     struct FileSystem::FileHeader
  4.     {
  5.         Signature    m_signature;        // 16 bit GUID signature check.
  6.         Char        m_ID[4];            // 3 letter file format for format check.
  7.         DWORD        m_version;            // Version of this Pulse File.
  8.         WORD        m_numDirs;            // Number of directories.
  9.         WORD        m_numFiles;            // Number of files.
  10.         I32            m_filterBitField;    // Max of 32 possible filter algorithms to choose from.
  11.         SIZE_T64    m_size;                // Size of the file.
  12.         POS_T64        m_dirDiskStart;
  13.         POS_T64        m_fileDiskStart;
  14.  
  15.         FileHeader( void )
  16.         {
  17.             PSX_ZeroMem( this, sizeof( FileHeader ) );
  18.         }
  19.  
  20.         void WriteData( FileSystem::IWriter *pWriter );
  21.         void ReadData( FileSystem::IReader *pReader );
  22.     };

 

 

 

 

 

 

 

 

 

 

 

 

Most of the data members are self explanatory. The Signature 16-bit size data type is just a structure that stores a GUID or Global Unique Identifier. This is used as a signature check to make sure that the file being opened is truly our version of a PAK file. You can read more information about Microsoft’s GUID here:
http://msdn.microsoft.com/en-us/library/aa373931(VS.85).aspx
http://en.wikipedia.org/wiki/Globally_Unique_Identifier

  • DirEntry : The PAK file internal format contains two tables. The first one stores the directory entries. While the second one stores the file entries. DirEntry simply stores the directory name for now. I can’t think of anything that we need to add in here right now.
  1. struct FileSystem::DirEntry
  2. {
  3.     struct PAKData
  4.     {
  5.         WORD m_nameLen;
  6.     };
  7.  
  8.     String m_name;
  9.     PAKData m_PAKData;
  10.  
  11.     void WriteData( FileSystem::IWriter *pWriter );
  12.     void ReadData( FileSystem::IReader *pReader );
  13. };

There is one odd thing with this though. We have m_name that stores the name and m_nameLen that stores the length of the directory name WHICH is interestingly encapsulated inside a structure called PAKData. The reason we want to encapsulate this inside a separate structure has something to do with how we will read/write our data from/into a PAK file. We want to minimize read/write calls by simply writing or reading all of the bits if possible. m_name isn’t included because of its internal data structures. The String class dynamically allocates memory for storing and manipulating its string. If we simple read or write it directly, it would cause some problems with since its pointer would point in some random memory and could possible crash your application. Although it’s kind of redundant since there’s only one data contained in struct PAKData, this is still helpful in case we want to add some additional info in the future. The next structure, shows how struct PAKData  is effectively used in DirFileEntry.

  • DirFileEntry : This structure contains all the important information about a file stored in a PAK file.
  1. struct FileSystem::DirFileEntry
  2. {
  3.     //#pragma pack( 1 ) // Needed to pack this for direct read and write
  4.     // I am trusting the data alignment and size of this struct in the hands of the compiler.
  5.     // This struct should have a size of 50 bytes. So that we won’t suffer any performance
  6.     // from reading in memory.
  7.     struct PAKData
  8.     {
  9.         SIZE_T64    m_size;
  10.         SIZE_T64    m_compressedSize;
  11.         SIZE_T64    m_diskStart;
  12.         DWORD        m_filterBit;
  13.         DWORD        m_pathIndex;        // Points to the position of the dirpath entries
  14.         DWORD        m_nameLen;
  15.         BYTE        _padd[4];
  16.  
  17.         PAKData( void ) { PSX_ZeroMem( this, sizeof( PAKData ) ); }
  18.     };
  19.     //#pragma pack()
  20.  
  21.     String    m_name;
  22.     PAKData m_PAKData;
  23.  
  24.     void WriteData( FileSystem::IWriter *pWriter );
  25.     void ReadData( FileSystem::IReader *pReader );
  26. };

Notice all but m_name are all stored inside struct PAKData. Instead of writing each data member we could just simply do something like this
fstream.write( m&_PAKData, sizeof(PAKData) );

Here is a diagram showing how a PAK file is composed of these important data structures:
PAKFormat
Internal structure of a PAK file

  • DataFilter, ZLibtest: DataFilter is a base class for derived(or concrete) filter classes. As an example we have a test filter called ZLibtest. This filter uses the infamous deflate and inflate algorithm to compress and decompress data when needed.
  • IReader, IWriter : These interface classes serves as an abstraction layer for reading from and writing to sources. This makes our reads and writes easier by not caring about whether we’re reading from a file or memory or writing to a file or memory. FileReader, MemoryReader, FileWriter, MemoryWriter are derived classes designed to handle the reads and writes either from/to a memory or file. Below are the interfaces for the IReader and IWriter classes.
  1. class FileSystem::IReader
  2. {
  3. public:
  4.     virtual ~IReader( void ) { }
  5.     virtual SIZE_T Read( BYTE *pBuffer, SIZE_T size ) = 0;
  6.     virtual BOOL IsDone( void ) = 0;
  7.     virtual SIZE_T64 BytesLeft( void ) = 0;
  8. };
  9.  
  10. class FileSystem::IWriter
  11. {
  12. public:
  13.     virtual ~IWriter( void ) { }
  14.     virtual SIZE_T Write( BYTE *pBuffer, SIZE_T size ) = 0;
  15. };
  • A bunch of typedefs : After the class declarations, billions of typedefs follows. I really apologize for the unnecessary confusion. But while i was still  developing this system, i was experimenting with what class helpers and containers to use. The typedefs made it easier for me to quicky change from one data type to another with minimum changes. One thing you may notice aside from the normal container classes is the SmartPointer<> class. The SmartPointer<> class acts like the c++ boost’s shared_ptr. Basically, this container class will keep track anything that uses this object then automatically deletes it when no one is using it anymore. It is able to do this by using reference count. We’ll be using this to store our data structures so that we don’t have to worry manually deleting our allocated resources contained in our containers. Here is a quick example
  1. int main( void )
  2. {
  3.     // Store dynamically allocated int in SmartPointer
  4.     SmartPointer< int * > pInt1( new int );  // Internal ref is set to 1
  5.  
  6.     {
  7.         SmartPointer< int * > pInt2( pInt1 ); // Internal ref is now set to 2.
  8.         // When pInt2 gets destroyed ref is automatically deremented by 1.
  9.     }
  10.  
  11.     //pInt1 gets destroyed when it falls out of scope here… ref is 0 then it gets automatially
  12.     // delete w/o requiring us to do anything… :)
  13. }

You can learn more about shared_ptr or SmartPointers in this link http://www.boost.org/doc/libs/1_41_0/libs/smart_ptr/shared_ptr.htm

  • VFS Data Members : Now that we now know what data types we’ll be using, our VFS will be storing data members shown below. The class declaration and typedefs are also included as a reference.
FileSystem Interface
  1. class FileSystem
  2. {
  3. public:
  4.  
  5.     /* public interface */
  6.  
  7. private:
  8.  
  9.     // Internal file system data types.
  10.     struct FileHeader;
  11.     struct DirEntry;
  12.     struct DirFileEntry;
  13.     class DataFilter;
  14.     class ZLibTest;    // Test filter using zlib
  15.     class IReader;
  16.     class IWriter;
  17.     class FileReader;
  18.     class MemoryReader;
  19.     class FileWriter;
  20.     class MemoryWriter;
  21.  
  22.     // Internal bookeeping typedefs for generating a PAK file.
  23.     typedef SmartPointer< File >                            FilePointer;
  24.     typedef SmartPointer< DataFilter >                        FilterPointer;
  25.     typedef String                                            FilePath;
  26.     typedef String                                            DirPath;
  27.     typedef String                                            Directory;
  28.     typedef SmartPointer< FilePath >                        FilePathPointer;
  29.     typedef SmartPointer< DirPath >                            DirPathPointer;
  30.     typedef SmartPointer< Directory >                        DirPointer;
  31.     typedef List< FilePathPointer >                            FilePathList;
  32.     typedef List< DirPathPointer >                            DirPathList;
  33.     typedef List< DirPointer >                                DirList;
  34.     typedef SmartPointer< DirFileEntry >                    FileEntryPointer;
  35.     typedef PSX_Pair< FileEntryPointer, DirPathPointer >    FileEntryPair;
  36.     typedef SmartPointer< DirEntry >                        DirEntryPointer;
  37.     typedef    List< DirEntryPointer >                            PAKGenDirList;
  38.     typedef List< FileEntryPair >                            PAKGenFilePairList;
  39.     typedef List< FileEntryPointer >                        PAKGenFileList;
  40.     typedef PSX_Pair< FILTER_TYPE, FilterPointer >            FilterPair;
  41.     typedef Map< FILTER_TYPE, FilterPointer >                FilterMap;
  42.     typedef PSX_Pair< String, FileEntryPointer >            FileMapPair;
  43.     typedef Map< String, FileEntryPointer >                    FileEntryMap;
  44.  
  45.     /* Internal functions used to manage Pulse file data */
  46.  
  47. private:
  48.  
  49.     FileHeader                *m_pHeader;
  50.     PAKGenDirList            *m_pGenDirs;   
  51.     PAKGenFilePairList        *m_pGenFiles;        // Used in creating Pulse File
  52.     PAKGenFileList            *m_pGenFileList;    // Used in opening Pulse File
  53.     FileEntryMap            *m_pFileMap;        // Used in loading Pulse File
  54.     FileIO                    *m_pPulseFile;        // Used in reading loaded Pulse File
  55.     BOOL                    m_bLoaded;
  56.     FilterMap                m_filters;
  57.     OnProcessFileCallback    m_pOnProcessFile;    // Callback for selecting a filter when a file is about to be processed
  58.  
  59. };

 

Before we move on to the internal utility methods, i suggest taking the time to get familiarized with the insane amount of typedefs so you won’t get confused when we get to the actual implementations. :) Most of them are just simple data types stored in SmartPointers then contained in a map or list.

For the continuation of this article, see Virtual Fule System – Part 3

Virtual File System – Part 1

Download Source Code

Introduction
Let’s say you’re working on a commercial type game to be released for Xbox 360. That being said, your artists are happily making hundreds and thousands of high-quality assets(could be textures, models, audio, etc.) to be used in your game. Everything’s going well until one day your artist rushes in your office panicking that your game has reached its 4.7GB (dvd’s storage capacity) limit and your game is still lacking a thousand more assets that needs to be in included your game.

You have 2 options (that i know of) in order to solve this problem:

  1. Cut features and functionality by simplifying levels or removing parts of your game. This means that you need to remove the associated or least important assets that needs to be cut-off in order to save storage space which could possibly make your game a little bit less fun.
  2. OR you could come up with a system that applies compression on your resources files to effectively lessen storage consumption so that you could push in more resources for your game. By doing this,  at least you can perform option 1 as a last resort if your uber compression has reached its maximum compression capabilities.

f1.1  
Compressing raw data, depending on the compression algorithm, can potentially take less storage space

This topic will extensively cover option number 2 or what we call Virtual File Systems. We’ll also be making our own simple virtual file system as we go along discussing the details of this system. Now that you have some idea on what this topic is all about, let’s get started!

What is a Virtual File System(or VFS)?
A virtual File System is a system that creates an abstraction layer between your application and the concrete files your application uses; or what we call our resource files.

Basically a File System is similar to what Operating Systems use like NTFS or FAT32 on Windows and other formats on other operating systems. The only difference is that VFS are built on top of solid File Systems.

* Before we move on, for consistency of terms and definitions, we’ll be referring to a Virtual File System simply as VFS and Packed Files as PAK

So what does it do exactly
With the definitions provided above, it means that VFS provides us our own virtual storage for our applications to use THAT only our application can understand. If this sounds familiar to you, it is because that another good example of a VFS are archived files like zip or rar files. The key to this is archiving. We archive or pack our resource files by wrapping it into ONE HUGE FILE.

If you haven’t noticed yet, try taking a look at one of your favorites triple A game’s directory and try to find if you can find any resources used in the game itself. Chances are you won’t find any *.tga, *.ogg or *.mp3, or any mesh files used in game. But you may see some insanely huge files with weird file extensions like .pak or .WAD(used in Doom). That game’s resources are actually inside those files.

Awesome! So what are the benefits? -  VFS advantages
Here are some of using a VFS:

  • Security : Our "proprietary" format makes it harder for  amateur hackers to easily parse and vandalize or steal our assets.
  • Fast access time : Repetitive opening and closing of small resources files for loading takes quite some time. We can make this faster by using fewer, huge files, contained in VFS, than many smaller ones.
  • Less system resource consumption : Fewer files to manage means fewer file handles which also means using less system resources.
  • Automatic Filter type handling : With a proper pluggable architecture, VFS can automatically handle Compression and Decompression behind the scene. Let it be a zip, rar, tar, gar, meow, woof or any other compressed file format, VFS will automatically decompress or compress it for you.

But….

The cake was a lie? – VFS disadvantages
Unfortunately, nothing comes for free. We have to sacrifice some functionalities in order to gain some.

  • Very slow to add/edit files : Because of the way we structure and pack our files inside a VFS, it is harder to edit, add, and erase files. This is due to the necessary re-compaction, re-arrangement, and updating of important information as an overhead. But these disadvantage are carefully weighed for our needs in games (or other) applications. Like even though that it is fatally not advisable to edit the resources inside the VFS during run-time, there’s no real reason to do so in the first place. This SHOULD only be used as an offline tool.
  • Hard to debug : Since files inside a VFS are no longer recognized by the OS, we have no way of easily checking a file for errors. We have to either extract the file and save it as a separate file, then repack the entire VFS or come with a utility that supports your VFS format and dynamically the assets for your users for easy viewing/editing which can be costly and could take some considerable amount of time to develop. One good example of such tool is Epic’s Generic Browser.


Unreal Editor’s Generic Browser

Features for our VFS
Before we start discussing the intricacies of a VFS, we’ll go through some basic features that our VFS will be having.

  • Pack an entire directory : Our VFS can parse a directory path then pack all the files and folders inside that directory producing a PAK file.
  • Filter callback : Our VFS provides a callback functionality so that we can specify a filter to use when it is about to process a file.
  • Unpack VFS contents : We can extract all of the packed file’s contents and save it in a specified output path.
  • zlib support : As a demonstration of using our pluggable filter architecture, we’ll be using the infamous deflate and inflate algorithm for compressing and decompressing our resource files.
  • Load in memory : Once a PAK file is loaded, we can load a specified file in memory and use it normally as if we load an external file outside the PAK file.

The VFS class
Most VFS samples or tutorials in the internet provides either a very basic, very limited, and often mostly hacked, implementations of the system. I found another good article but it was written in pure C (see references at the very bottom). The problem with this is that most of the unnecessary methods and internal data are exposed to the user which may cause some errors if not used properly. In order to relieve from this problem, we’ll be implementing a simple Object Oriented design just to wrap all the methods and data and only expose what needs to be exposed to the user.

Here is the public interface for the File System class called FileSystem then we’ll discuss a brief overview of the methods in this class:

FileSystem Interface
  1. class FileSystem
  2. {
  3. public:
  4.  
  5.     enum FILTER_TYPE {
  6.         FILTER_TYPE_DEFAULT = 0×00000000,
  7.         FILTER_TYPE_ZLIB_TEST = 0×00000001,
  8.         // NOTE: Add additional filters here.
  9.         //    Each filter takes one bit of space. 0, 1, 2, 4, 8…
  10.         FILTER_TYPE_FORCE_DWORD = 0x7fffffff,
  11.     };
  12.  
  13.     typedef FILTER_TYPE (*OnProcessFileCallback)( const CHAR *pFileName, const CHAR *pFilePath );
  14.     typedef DWORD                FilterFlag;
  15.     typedef Optional<PTR_T>        OHFile;
  16.  
  17.     FileSystem( void );
  18.     ~FileSystem( void );
  19.  
  20.     BOOL Open( const CHAR *pFileName, BOOL deleteTableEntries = TRUE );
  21.     void Close( void );
  22.     BOOL IsOpen( void ) const;
  23.  
  24.     OHFile        FindFile( const CHAR *pPath ) const;
  25.     SIZE_T        GetFileSize( const OHFile *pOHFile ) const;
  26.     SIZE_T64    GetFileSize64( const OHFile *pOHFile ) const;
  27.     BOOL        ReadFile( const OHFile *pOHFile, BYTE *pBuff );
  28.  
  29.     BOOL Create( const CHAR *pDirectory, const CHAR *pOutputPath, OnProcessFileCallback pOnProcessFileCallback = NULL, FilterFlag fBitFlag = FILTER_TYPE_DEFAULT );
  30.     BOOL Unpack( const CHAR *pPAKPath, const CHAR *pOutputPath );
  31.  
  32. };
  • BOOL Open( const CHAR *pFileName, BOOL deleteTableEntries = TRUE ) : Opens up a PAK file specified in *pFileName and loads the internal directory and file tables. After the directory and file table has been loaded, it generates a File Map out of those tables for fast file searching when calling Find(). The second parameter deleteTableEntries is a Boolean variable whether you want to delete the file and directory table to save a little bit of memory. If you’re not planning on extracting and saving all of its contents outside the PAK file then the tables aren’t needed and can be safely deleted which is the default behavior if you don’t pass a value in the 2nd parameter. This method returns true if loading was successful. Otherwise, false. This is one of the three methods that does most of the high-level heavy lifting.
  • void Close( void ) : Cleans all of its internal resources generated by the Open() method and closes the File Stream handle associated with the PAK file.
  • BOOL IsOpen( void ) : Returns a Boolean whether a PAK file is opened or not.
  • OHFile FindFile( const CHAR *pPath ) : *pPath specifies the internal path including the filename of the file we’re searching for inside a PAK file. The return type takes a little bit of an explanation. Basically OHFile is a handle to the file found in the loaded PAK file but there’s a little bit more to it than just a handle. So you can just safely think of it as a handle for now.
  • SIZE_T GetFileSize( const OHFile *pOHFile ) : This method returns the size of the file specified by the file handle, *pOHFile, returned by the FindFile method. The return value, SIZE_T, is a 32-bit unsigned integer. That means that the maximum size value it can return is 4,294,967,295 bytes or approximately 4GB. If the size of the file is greater than this value, it returns 0 instead. In order to alleviate this problem, another GetSize method, called GetSize64(), is implemented that returns a 64bit unsigned integer or SIZE_T64. See next method below.
  • SIZE_T64 GetFileSize64( const OHFile *pOHFile ) : Same as GetFileSize() but returns a 64-bit unsigned integer.
  • BOOL ReadFile( const OHFile *pOHFile, BYTE *pBuff ) : This method extracts a file, specified by *pOHFile, and loads it in memory by *pBuff. If the file is compressed, the VFS will automatically handle the decompression process as it will go through the required filters before it writes the final raw data to *pBuff. As a note, it is really important that the size of *pBuff SHOULD BE AT LEAST the size of the file we’re loading given by GetFileSize() or GetFileSize64() methods. If the method successfully loaded the file into memory, it returns true. Otherwise false.
  • BOOL Create( const CHAR *pDirectory, const CHAR *pOutputPath, OnProcessFileCallback *OnProcessFileCallback = NULL,
    FilterFlag fBitFlag = FILTER_TYPE_DEFAULT ) :
    This method packs the entire directory specified by *pDirectory. The algorithm recursively goes through each folder inside the directory and processes each file it finds in each folder. The output file can be specified in *pOutputPath. The last two parameters, pOnProcessFileCallback and fBitFlag, specifies the filter options that you want to apply on all or selected files when it gets processed. pOnProcessFileCallback is a function pointer typedefed as:
        typedef FILTER_TYPE (*OnProcessFileCallback)( const CHAR *pFileName, const CHAR *pFilePath );

    VFS will iteratively call this function if the user has specified a Callback. *pFileName is the name of the file to be processed and *pFilePath  is its path relative to *pDirectory as the root path. For example, if we’re packing a directory in “C:\User\Downloads\” and VFS found a file called “funkymusic.mp3” in “C:\User\Downloads\Music\” then *pFilename will be “funkymusic.mp3” and *pFilePath will be “Downloads\Music\”.
    The last parameter in Create() is fBitFlag. In order to use the filters that we want we need to let our VFS know what Filters we’ll be using so that it can initialize the only filters that we’ll be using and thus save memory space instead of initializing all filters that would end up not being used. The default is set to FILTER_TYPE_DEFAULT which is a very basic filter that does nothing but copies each file bit by bit. This is the second of the three methods that does most of the high-level heavy lifting.

  • BOOL Unpack( const CHAR *pPAKPath, const CHAR *pOutputPath ) : Extracts the entire contents of a PAK file specified in *pPAKPath and saves it in *pOutputPath directory.

These public methods uses a number of internal (or private) methods to aid in the process of their work. But before we take a look at these methods, let us take this opportunity to take a look at the internal data structures and typedefs first in order to have some idea on how we will structure our data.

See Virtual File Systems – Part 2 for the continuation of this article.

Galaxy Music Radio

I made an online radio station called Galaxy Music Radio using Play.it . This is a fan made radio station expressing my love and interest in good music of the 20’s, 30’s and 40’s. We just don’t hear this kind of good, relaxing, and deep music anymore. My interest in this music genre started when i played Fallout 3. The music collection played in Galaxy News Radio was just amazing! Unfortunately, the songs were just very limited. So here’s a station where you can continue to listen to great music of the good old times!

Enjoy!

- CodeSushi

Follow

Get every new post delivered to your Inbox.