User:DavidJCobb/Sandboxes/Oblivion file handling
Oblivion packages most of its assets in BSA files; these are Bethesda Softworks Archives. However, Oblivion also has limited support for "loose files," or files that exist outside of BSAs.
Typically, if a loose file has the same filename and relative path as an archived file, the game only loads the loose file if its Date Modified timestamp is newer than the Date Modified timestamp of the conflicting archive. Otherwise, the archived file is what gets used. However, there is a bug (see #BSA redirection) that can cause the game to always prefer archived texture files from the first texture BSA to load.
The basics of file lookups[edit]
The game doesn't load the full contents of an archive into memory. Instead, it loads the archive's metadata: for each archived folder, the game loads a 64-bit hash of its name and the number of files it contains; for each archived file, the game loads a 64-bit hash of its filename, its offset within the BSA, and its size. The archive system has no concept of "subfolders;" given files "foo/test.dds" and "foo/bar/test.dds," an archive will view these files as existing in two folders named "foo" and "foo/bar" respectively.
Folder metadata is stored in a list, sorted by the 64-bit folder name hashes; file metadata is stored in a list per-folder, sorted by the 64-bit filename hashes. This allows for fast file lookups: if the game wants to check for an archived file in a given BSA, it doesn't need to check every single folder and file; when it finds a hash that is higher than the hash of the desired folder or file, it knows that the desired folder or file isn't present.
When the game wishes to find a file, it first searches through all loaded BSAs, in order from the earliest-loaded archive to the latest-loaded archive; this means that assets in vanilla archives are preferred over conflicting assets in mod archives. If no archive supplies the file, then the game checks for a loose file. There's a catch, however: the game remembers the last archive that successfully provided a file; the next search will check that file first, out of order. This means that while the game will almost always prefer vanilla archives over mod archives, the game can rarely and unpredictably prefer mod archives over vanilla archives.
When the game wishes to check an archive for a given file, it does so by generating 64-bit hashes of the folder name and file name. It then provides these hashes and the original file path to the archive. If the archive contains the desired file, it uses the original file path (when one is provided) to check whether the file is overridden by a newer loose file; if so, the archive pretends not to have the desired file, and the search for the file continues. Archives remember which archived files have had a loose file check, so redundant checks won't be run the next time that archived file is accessed.
As such, mods that replace vanilla assets cannot package the replacement assets in a BSA, and the replacement assets must be given a Date Modified that is newer than the Date Modified of the vanilla archives. The latter is an especially notable obstacle, because Steam sets the Date Modified of the vanilla archives to the date the game was installed.
Archive invalidation[edit]
"Invalidation" means that an archive pretends that a given archived file doesn't exist.
On startup, the game checks for archive invalidation information in a text file (ArchiveInvalidation.txt by default). If this file exists, the game parses it line by line. Lines that contain backslashes are treated as folder paths: the game creates 64-bit hashes of these paths and stores them in a list of folder hashes to invalidate. Lines that do not contain backslashes are treated as filenames: the game creates 64-bit hashes of these filenames and stores them in a list of file hashes to invalidate.
Archives perform invalidation when they load. If an archive is flagged as retaining folder and file names in memory, then it checks all of its archived folders. If a folder's hash matches anything in the list of folder hashes to invalidate, then all files in the folder are invalidated. If any file's hash matches anything in the list of file hashes to invalidate, then that file is invalidated.
If an archive isn't flagged as retaining folder and file names in memory, then it does not perform ArchiveInvalidation. Instead, it loops over every loose folder and file, and invalidates any archived files that would be overridden by a loose file.
BSA redirection[edit]
The community has long believed that BSA redirection works around bugs in ArchiveInvalidation. This is incorrect; ArchiveInvalidation is a totally separate system.
When the game is loading archives, it keeps track of the first archive to load for each filetype that archives are expected to contain — the first mesh archive, the first texture archive, the first sound archive, and so on. Typically these will be the game's vanilla BSAs. There are a number of file lookups in the game's code that check these archives exclusively; these lookups will always prefer vanilla archived files over modded archived files. If these lookups fail, the game falls back to a normal file lookup.
Forms that have the MODT, MO2T, MO3T, MO4T, or NIFT subrecords are a worse situation, however. These subrecords define 64-bit folder and file name hashes, and are apparently used to pre-load textures for in-game objects. The subrecords don't contain file paths, which means that when the game uses them to search for texture files, it can't check for loose files. This very frequently causes loose texture files to always fail to override archived texture files.
BSA redirection works by slipping an empty BSA file ahead of all other BSAs that the game loads, and flagging this BSA file as containing textures. This means that the first texture BSA to load is empty, and so all lookups that target the first loaded texture BSA exclusively — including lookups triggered by MODT and friends — will fail and fall back to normal file lookups. This fixes the issue of these BSAs taking priority over all other BSAs (to no effect, since that already happens as a result of the order in which Oblivion searches BSAs), and it fixes the issue of loose texture files failing to override archived texture files even when the loose files have a more recent Date Modified.