Virtual File System – Part 6 (Final)

FileSystem method – ReadPAKInfo()

ReadPAKInfo is a straight forward method that simply reads the tables and header information of a PAK file. As an added bonus, this method also automatically verifies the file if its a valid PAK file.

  1. BOOL FileSystem::ReadPAKInfo( const CHAR *pPAKFilePath, FileHeader *pPAKHeader, PAKGenDirList *pPAKDirList, PAKGenFileList *pPAKFileList )
  2. {
  3.     FileIO                pakFile;
  4.     DirEntryPointer        pNewDir;
  5.     FileEntryPointer    pNewFile;
  6.     FileReader            fileReader;
  7.  
  8.     pakFile.Open( pPAKFilePath, FileIO::FILEOP_READ | FileIO::FILEOP_BINARY );
  9.     if ( pakFile.IsOpen() == FALSE )
  10.         return FALSE;
  11.  
  12.     // Read Pulse File Header first then table entries
  13.     pakFile.Seek( 0 – sizeof( FileHeader ), FileIO::SEEKOP_END );
  14.     fileReader.SetFileStream( &pakFile );
  15.     pPAKHeader->ReadData( &fileReader );
  16.  
  17.     // Do verification checks
  18.     if ( !VerifyPulseHeader( pPAKHeader ) )
  19.         return FALSE;
  20.  
  21.     // Read directory entries
  22.     pakFile.Seek64( pPAKHeader->m_dirDiskStart, FileIO::SEEKOP_BEGIN );
  23.     for ( SIZE_T i = 0; i < pPAKHeader->m_numDirs; ++i )
  24.     {
  25.         pNewDir = new DirEntry;
  26.         pNewDir->ReadData( &fileReader );
  27.         pPAKDirList->PushBack( pNewDir );
  28.     }
  29.  
  30.     // Read file entries
  31.     pakFile.Seek64( pPAKHeader->m_fileDiskStart, FileIO::SEEKOP_BEGIN );
  32.     for ( SIZE_T i = 0; i < m_pHeader->m_numFiles; ++i )
  33.     {
  34.         pNewFile = new DirFileEntry;
  35.         pNewFile->ReadData( &fileReader );
  36.         pPAKFileList->PushBack( pNewFile );
  37.     }
  38.  
  39.     pakFile.Close();
  40.     return TRUE;
  41. }

Opens a PAK file, read header, tables, and verification check.

Here’s the code for the verification check

  1. BOOL FileSystem::VerifyPulseHeader( FileHeader *pHeader )
  2. {
  3.     return VerifyHeaderSignature( &pHeader->m_signature ) && VerifyHeaderFormat( pHeader->m_ID );
  4. }
  5.  
  6. BOOL FileSystem::VerifyHeaderSignature( const Signature *pSig )
  7. {
  8.     return pSig ? *pSig == SIGNATURE : FALSE;
  9. }
  10.  
  11. BOOL FileSystem::VerifyHeaderFormat( const CHAR *pFormat )
  12. {
  13.     return PSX_StrCmp( pFormat, PSX_String( "pfs" ) ) == 0;
  14. }

FileSystem method – Open()

We’re almost done with our VFS implementation. The only thing left to discuss is the run-time functionality. The Pack and Unpack methods are used for offline tools. But in order to use our PAK file for run-time phase then we need to have some methods to open and only extract the necessary data we need. For loading our data, we’ll be loading it into the memory instead of saving it as a file. This not only saves time but also for security. It’ll be harder for hackers to steal our resources if we load our data in memory instead of an actual physical file.

The block of code below is our object initialization of the Open() method.

  1. BOOL FileSystem::Open( const CHAR *pFileName, BOOL deleteTableEntries /*= TRUE */ )
  2. {
  3.     DWORD    currPathIndex;
  4.     String    stringID;
  5.     BOOL    bAddSeparator;
  6.     PAKGenDirList::Iterator        dirIter;
  7.     PAKGenDirList::Iterator        dirIterEnd;
  8.     PAKGenFileList::Iterator     fileIter;
  9.     PAKGenFileList::Iterator     fileIterEnd;
  10.  
  11.     ReleaseResources();
  12.  
  13.     m_pHeader        = new FileHeader;
  14.     m_pGenDirs        = new PAKGenDirList;
  15.     m_pGenFileList    = new PAKGenFileList;
  16.     m_pFileMap        = new FileEntryMap;
  17.     m_pPulseFile    = new FileIO;
  18.  
  19.     PSX_Assert( m_pHeader && m_pGenDirs && m_pGenFileList, "Failed to allocate memory." );
  20.  
  21.     if ( !ReadPAKInfo( pFileName, m_pHeader, m_pGenDirs, m_pGenFileList ) )
  22.         goto FailLoad;
  23.  
  24.     if ( !InitializeFilters( m_pHeader->m_filterBitField ) )
  25.         goto FailLoad;

Initialization

After acquiring the PAK info, this should be enough and we pop out of this function. But the problem with how we store our directory and file table is that its in a list! There is now way we can easily find the files that we want fast! To fix this problem, we’ll be creating a map storing our DirFileEntries using their path appended with DirFileEntry’s file name as a string ID. This way we can have about a N(log n) lookup performance. Instead of linearly searching the file in a list. After generating our file map we simply clean our resources and return.

  1.     // Generate File Map info
  2.     currPathIndex = 0;
  3.     dirIter        = m_pGenDirs->IteratorBegin();
  4.     dirIterEnd    = m_pGenDirs->IteratorEnd();
  5.     fileIter    = m_pGenFileList->IteratorBegin();
  6.     fileIterEnd    = m_pGenFileList->IteratorEnd();
  7.  
  8.     // Root files don’t have directories prefixed
  9.     bAddSeparator = FALSE;
  10.  
  11.     while ( fileIter != fileIterEnd )
  12.     {
  13.         if ( currPathIndex != (*fileIter)->m_PAKData.m_pathIndex )
  14.         {
  15.             ++currPathIndex;
  16.             ++dirIter;
  17.  
  18.             // Error check
  19.             PSX_Assert( !(dirIter == dirIterEnd), "Error Pulse File." );
  20.  
  21.             bAddSeparator = (*dirIter)->m_name.GetLength() ? TRUE : FALSE;
  22.         }
  23.  
  24.         // Concatenate string then make this as an ID
  25.         if ( bAddSeparator )
  26.             stringID = (*dirIter)->m_name + PSX_String( "\\" ) + (*fileIter)->m_name;
  27.         else
  28.             stringID = (*fileIter)->m_name;
  29.        
  30.         m_pFileMap->Insert( FileMapPair( stringID, *fileIter ) );
  31.  
  32.         ++fileIter;
  33.     }
  34.  
  35.     if ( !m_pPulseFile->Open( pFileName, FileIO::FILEOP_READ | FileIO::FILEOP_BINARY ) )
  36.         goto FailLoad;
  37.  
  38.     m_bLoaded = TRUE;
  39.  
  40.     // We can save memory by releasing table entries
  41.     if ( deleteTableEntries )
  42.     {
  43.         //PSX_SafeDelete( m_pHeader );
  44.         PSX_SafeDelete( m_pGenDirs );
  45.         PSX_SafeDelete( m_pGenFileList );
  46.     }
  47.  
  48.     return TRUE;
  49.  
  50. FailLoad:
  51.  
  52.     ReleaseResources();
  53.  
  54.     return FALSE;
  55. }

Generating our file map.

There is a limitation on how we’ve set our Filemap. We’ll discuss this later at the end of this article.

Closing the opened PAK file is as simple as calling Close() which simply calls ReleaseResources() underneath.

FileSystem method – FindFile()

FindFile simply looks for the file in the map file and returns a handle to it. We never really discussed how this handle works earlier. So I would like to take the time to explain it here now. This method return OHFile which is a typedef of the Optional<> class. What makes this Optional so special is the way it handles return values and lets the user knows if the return value is valid or not. This avoids having unnecessary error return values and awkwardly placing the actual output in one of the parameter list.  We can simple determine the return value if it is valid or not by calling Optional::IsValid(). Then we can access the value by simply prefixing ‘*’ on the optional value. You may also notice that the typedefed Optional<> class is containing a pointer

  1. typedef Optional<PTR_T>        OHFile;

PTR_T is simply typedefed as void *. The reason we want to do this is because when we find the fileEntry in the file map, we don’t want the user to actually mess around with it. So what we want to do is remove the interface from it. As if the return value is some kind of an ID. Here is the code below.

  1. FileSystem::OHFile FileSystem::FindFile( const CHAR *pPath ) const
  2. {
  3.     FileEntryMap::Iterator iter = m_pFileMap->Find( pPath );
  4.    
  5.     if ( iter == m_pFileMap->IteratorEnd() )
  6.         return OptionalEmpty();
  7.    
  8.     return reinterpret_cast<PTR_T>((&(*(*iter).second)));
  9. }

casting the found file entry to a PTR_T(void *)

We’ll see an example later when we finish discussing the remaining run-time methods.

FileSystem method – GetFileSize()

This method returns the size of the found file which accepts a filehandle returned by FindFile. The 64-bit version of this is simple called GetFileSize64(). The reason we have a 64 bit version return type is that it is possible that the file we’re looking for can be greater than 4GB large. In that case SIZE_T can’t return a value greater than that value.

  1. SIZE_T FileSystem::GetFileSize( const OHFile *pOHFile ) const
  2. {
  3.     SIZE_T64 size = GetFileSize64( pOHFile );
  4.  
  5.     if ( size > PSX_SIZE_T_MAX )
  6.         return 0; // Call GetFileSize64()
  7.  
  8.     return static_cast< SIZE_T >( size );
  9. }
  10.  
  11. SIZE_T64 FileSystem::GetFileSize64( const OHFile *pOHFile ) const
  12. {
  13.     if ( !m_bLoaded || pOHFile->IsInvalid() )
  14.         return 0;
  15.  
  16.     DirFileEntry *pFileEntry = reinterpret_cast< DirFileEntry * >( **pOHFile );
  17.     return (*pFileEntry).m_PAKData.m_size;
  18. }

FileSystem method – ReadFile()

This method can extract a file inside a PAK file and load it to the memory. Due to the slight complexity of how we’re handling the accepted OHFIle, we need to do some type casting to convert it into a DirFileEntry pointer. Once we have the DirFileEntry pointer, we can easily access where it is located in the PAK file then load it into the memory specified by pBuff. Take note that it needs to undergo the Decode process in order to reverse the filters applied to it (like zlib compression for example).

  1. BOOL FileSystem::ReadFile( const OHFile *pOHFile, BYTE *pBuff )
  2. {
  3.     if ( !IsOpen() && pOHFile->IsInvalid() )
  4.         return FALSE;
  5.  
  6.     // Just use pointer to avoid additional processing by SmartPointer<>
  7.     FileReader reader;
  8.     MemoryWriter writer;
  9.     DirFileEntry *pFileEntry =  ((DirFileEntry *)(**pOHFile));
  10.     m_pPulseFile->Seek64( pFileEntry->m_PAKData.m_diskStart );
  11.     reader.SetFileStream( m_pPulseFile );
  12.     reader.SetReadLimit( pFileEntry->m_PAKData.m_compressedSize );
  13.     writer.SetBuffer( pBuff );
  14.    
  15.     Decode( pFileEntry, &reader, &writer );
  16.  
  17.     return TRUE;
  18. }

The End Of Our Implementation

If you have managed to read this far then congratulations! I seriously did not anticipate that it would reach this long (about 10,500+ words). But if you’ve been following along, then you now understand how a simple File System works! Noted that it still has a number of limitations but this already works for most of our needs and it wouldn’t take much time refactoring the code in order to further improve the system. I’ll be showing a quick example on how to use our File System then we’ll quickly talk about some things that we can do to further improve our system.

Using our VFS

To create a PAK file we create an instance of our FIleSystem then call Pack().

  1. // NOTE: Callback that uses FILTER_TYPE_ZLIB_TEST compression/decompresson on all files
  2. FileSystem::FILTER_TYPE SelectFilter( const Pulse::CHAR *pFileName, const Pulse::CHAR *pFilePath )
  3. {
  4.     // Use deflate on all files
  5.     return FileSystem::FILTER_TYPE_ZLIB_TEST;
  6. }
  1. Pulse::INT SampleConsole::Main( Pulse::INT argNum, Pulse::CHAR **ppArgs )
  2. {
  3.     FileSystem pulseFile;
  4.  
  5.     #define PFS_ACTION 1
  6.  
  7.     #if (PFS_ACTION == 1) // NOTE: Pack a specified directory
  8.         pulseFile.Create( "C:\\Program Files\\Adobe", "C:\\New Folder\\TestPAK.pfs",
  9.             SelectFilter, FileSystem::FILTER_TYPE_ZLIB_TEST );
  10.     #elif (PFS_ACTION == 2) // NOTE: Extract all the contenst of a PAK file
  11.         pulseFile.Unpack( "C:\\New Folder\\TestPAK.pfs", "C:\\New Folder\\Extracted Data\\" );
  12.     #elif ( PFS_ACTION == 3) // NOTE: Extract a file then save on the disk from a PAK file
  13.         if ( pulseFile.Open( PSX_String("C:\\New Folder\\TestPAK.pfs") ) )
  14.         {
  15.             cout << "Successfully opened pulse files." << endl;
  16.             FileSystem::OHFile ohFile = pulseFile.FindFile( "Sample Pictures\\Sample Music\\Kalimba.mp3" );
  17.             if ( ohFile.IsValid() )
  18.             {
  19.                 Pulse::SIZE_T size = pulseFile.GetFileSize( &ohFile );
  20.                 Pulse::BYTE *pData = new Pulse::BYTE [ size ];
  21.                 pulseFile.ReadFile( &ohFile, pData );
  22.                 FileIO file( "C:\\music.mp3", FileIO::FILEOP_BINARY | FileIO::FILEOP_WRITE );
  23.                 file.Write( pData, size );
  24.                 file.Close();
  25.                 cout << "File found." << endl;
  26.             }
  27.         }
  28.     #endif
  29.         return 0;
  30. }

Creating a PAK file

In this example we want to pack the entire contenst inside “C:\Program Files\Adobe” then save the PAK file in “C:\New Folder\” named as “TestPAK,pfs”. We’ve also passed in a function called SelectFiler and indicating that we’ll be using PulseFile::FILTER_TYPE_ZLIB_TEST filter.

packing

The screenshot shows that we have managed to compress the entire adobe folder by 218%.

I’ll now show you how easy it is to unpack all of the contents of the pack file. If you’ve seen that darkened code below PFS_ACTION == 2 then that’s all you need to do. Calling Unpack() method specifying the path of the PAK file in the first parameter and the output directory in the second parameter.

  1. Pulse::INT SampleConsole::Main( Pulse::INT argNum, Pulse::CHAR **ppArgs )
  2. {
  3.     FileSystem pulseFile;
  4.  
  5.     #define PFS_ACTION 2
  6.  
  7.     #if (PFS_ACTION == 1) // NOTE: Pack a specified directory
  8.         pulseFile.Create( "C:\\Program Files\\Adobe", "C:\\New Folder\\TestPAK.pfs",
  9.             SelectFilter, FileSystem::FILTER_TYPE_ZLIB_TEST );
  10.     #elif (PFS_ACTION == 2) // NOTE: Extract all the contenst of a PAK file
  11.         pulseFile.Unpack( "C:\\New Folder\\TestPAK.pfs", "C:\\New Folder\\Extracted Data\\" );
  12.     #elif ( PFS_ACTION == 3) // NOTE: Extract a file then save on the disk from a PAK file
  13.         if ( pulseFile.Open( PSX_String("C:\\New Folder\\TestPAK.pfs") ) )
  14.         {
  15.             cout << "Successfully opened pulse files." << endl;
  16.             FileSystem::OHFile ohFile = pulseFile.FindFile( "Sample Pictures\\Sample Music\\Kalimba.mp3" );
  17.             if ( ohFile.IsValid() )
  18.             {
  19.                 Pulse::SIZE_T size = pulseFile.GetFileSize( &ohFile );
  20.                 Pulse::BYTE *pData = new Pulse::BYTE [ size ];
  21.                 pulseFile.ReadFile( &ohFile, pData );
  22.                 FileIO file( "C:\\music.mp3", FileIO::FILEOP_BINARY | FileIO::FILEOP_WRITE );
  23.                 file.Write( pData, size );
  24.                 file.Close();
  25.                 cout << "File found." << endl;
  26.             }
  27.         }
  28.     #endif
  29.         return 0;
  30. }

Unpacking the entire contenst of a PAK file

UnpackPAK

You may have noticed a little discrepancy of the total size of the original and unpacked file. The reason for this is that there are some small system files inside the Adobe folder. Remember that we choose not to include the system and hidden files in our packing process.

And lastly, for loading a file to memory from a PAK file, you need to call FindFile() for searching for the file you are looking for. When it succeeds, it returns a valid Optional File Handle. Checking if the handle is valid is as easy as calling Optional<>::IsValid(). Once you found a valid handle, make sure you allocated enough memory space by calling GetFilzeSize, or GetFileSize64 for greater 4GB. Then finally, we call ReadFile() for reading the entire file to memory.

  1. Pulse::INT SampleConsole::Main( Pulse::INT argNum, Pulse::CHAR **ppArgs )
  2. {
  3.     FileSystem pulseFile;
  4.  
  5.     #define PFS_ACTION 3
  6.  
  7.     #if (PFS_ACTION == 1) // NOTE: Pack a specified directory
  8.         pulseFile.Create( "C:\\Program Files\\Adobe", "C:\\New Folder\\TestPAK.pfs",
  9.             SelectFilter, FileSystem::FILTER_TYPE_ZLIB_TEST );
  10.     #elif (PFS_ACTION == 2) // NOTE: Extract all the contenst of a PAK file
  11.         pulseFile.Unpack( "C:\\New Folder\\TestPAK.pfs", "C:\\New Folder\\Extracted Data\\" );
  12.     #elif ( PFS_ACTION == 3) // NOTE: Extract a file then save on the disk from a PAK file
  13.         if ( pulseFile.Open( PSX_String("C:\\New Folder\\TestPAK.pfs") ) )
  14.         {
  15.             cout << "Successfully opened pulse files." << endl;
  16.             FileSystem::OHFile ohFile = pulseFile.FindFile( "Sample Pictures\\Sample Music\\Kalimba.mp3" );
  17.             if ( ohFile.IsValid() )
  18.             {
  19.                 Pulse::SIZE_T size = pulseFile.GetFileSize( &ohFile );
  20.                 Pulse::BYTE *pData = new Pulse::BYTE [ size ];
  21.                 pulseFile.ReadFile( &ohFile, pData );
  22.                 FileIO file( "C:\\music.mp3", FileIO::FILEOP_BINARY | FileIO::FILEOP_WRITE );
  23.                 file.Write( pData, size );
  24.                 file.Close();
  25.                 cout << "File found." << endl;
  26.             }
  27.         }
  28.     #endif
  29.         return 0;
  30. }

Last thoughts

Before we say goodbye, I would like to discuss some few more things regarding our system. First of all, as you may have noticed, this system is not 100% complete! This is pretty much just a barebone File System structure. You can further enhance this by abstracting platform specific code like how we search files and directories which uses Win32 specific methods. Another thing to take note is that if you’ll be using this on a Unix platform(am i saying this right?), the separator for paths uses a forward slash’/’ instead of back-slash character. We can simply fix this by using a macro and identifying whether we’re under windows or unix platform. Another thing to take note is how we handle our bookkeeping of our files which is stored in a map. This is a good way for finding files you are looking for but very2x bad, if even possible, for querying the contents of a directory inside the PAK file. One way to fix this is to create a directory map instead that contains another map for the files in that directory.

—————————

If you have any questions, concerns, or pretty much just about anything you want to ask, feel free to email me.

I am also open to any feedbacks about this system as I will be further improving this. So definitely email me if you have a better idea for this implementation.

 

Cheers!

– CodeSushi

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s