Home > IL2CPP > Reverse Engineering Adventures: Honkai Impact 3rd (IDA Decompiler Techniques) (Part 2)

Reverse Engineering Adventures: Honkai Impact 3rd (IDA Decompiler Techniques) (Part 2)

January 19, 2021 Leave a comment Go to comments

This is a continuation of the Reverse Engineering Adventures: Honkai Impact 3rd mini-series – read part 1 first! In this article, we’ll look at comparative data deobfuscation and how to work with the IDA decompiler.

Recap

When we left off our previous exploits, we had peeled off the first layer of encryption from global-metadata.dat and found the call site which calls the decryption function. This turned out to correspond to il2cpp::vm::MetadataLoader::LoadMetadataFile from the IL2CPP source code, with an added line of code to invoke the decryption.

We can’t load the metadata file into Il2CppInspector yet though, because the header does not conform to the expected format. Extra – potentially still encrypted – data is present, and the header length is 0x158 rather than 0x110 bytes, which means that the locations of some or all of the header fields has been changed. Additionally, while most of the rest of the file looks normal, there are no string literals – which are normally present in global-metadata.dat – and a large block of presumably encrypted data right after the header.

global-metadata.dat starts with a header which is a struct called Il2CppGlobalMetadataHeader that contains a list of offsets and lengths for all of the tables in the file. The standard header corresponding to Honkai Impact’s Unity version can be found in the IL2CPP source code at libil2cpp/il2cpp-metadata.h and looks like this (comments taken directly from the source code):

typedef struct Il2CppGlobalMetadataHeader
{
    int32_t sanity;
    int32_t version;
    int32_t stringLiteralOffset; // string data for managed code
    int32_t stringLiteralCount;
    int32_t stringLiteralDataOffset;
    int32_t stringLiteralDataCount;
    int32_t stringOffset; // string data for metadata
    int32_t stringCount;
    int32_t eventsOffset; // Il2CppEventDefinition
    int32_t eventsCount;
    int32_t propertiesOffset; // Il2CppPropertyDefinition
    int32_t propertiesCount;
    int32_t methodsOffset; // Il2CppMethodDefinition
    int32_t methodsCount;
    int32_t parameterDefaultValuesOffset; // Il2CppParameterDefaultValue
    int32_t parameterDefaultValuesCount;
    int32_t fieldDefaultValuesOffset; // Il2CppFieldDefaultValue
    int32_t fieldDefaultValuesCount;
    int32_t fieldAndParameterDefaultValueDataOffset; // uint8_t
    int32_t fieldAndParameterDefaultValueDataCount;
    int32_t fieldMarshaledSizesOffset; // Il2CppFieldMarshaledSize
    int32_t fieldMarshaledSizesCount;
    int32_t parametersOffset; // Il2CppParameterDefinition
    int32_t parametersCount;
    int32_t fieldsOffset; // Il2CppFieldDefinition
    int32_t fieldsCount;
    int32_t genericParametersOffset; // Il2CppGenericParameter
    int32_t genericParametersCount;
    int32_t genericParameterConstraintsOffset; // TypeIndex
    int32_t genericParameterConstraintsCount;
    int32_t genericContainersOffset; // Il2CppGenericContainer
    int32_t genericContainersCount;
    int32_t nestedTypesOffset; // TypeDefinitionIndex
    int32_t nestedTypesCount;
    int32_t interfacesOffset; // TypeIndex
    int32_t interfacesCount;
    int32_t vtableMethodsOffset; // EncodedMethodIndex
    int32_t vtableMethodsCount;
    int32_t interfaceOffsetsOffset; // Il2CppInterfaceOffsetPair
    int32_t interfaceOffsetsCount;
    int32_t typeDefinitionsOffset; // Il2CppTypeDefinition
    int32_t typeDefinitionsCount;
    int32_t rgctxEntriesOffset; // Il2CppRGCTXDefinition
    int32_t rgctxEntriesCount;
    int32_t imagesOffset; // Il2CppImageDefinition
    int32_t imagesCount;
    int32_t assembliesOffset; // Il2CppAssemblyDefinition
    int32_t assembliesCount;
    int32_t metadataUsageListsOffset; // Il2CppMetadataUsageList
    int32_t metadataUsageListsCount;
    int32_t metadataUsagePairsOffset; // Il2CppMetadataUsagePair
    int32_t metadataUsagePairsCount;
    int32_t fieldRefsOffset; // Il2CppFieldRef
    int32_t fieldRefsCount;
    int32_t referencedAssembliesOffset; // int32_t
    int32_t referencedAssembliesCount;
    int32_t attributesInfoOffset; // Il2CppCustomAttributeTypeRange
    int32_t attributesInfoCount;
    int32_t attributeTypesOffset; // TypeIndex
    int32_t attributeTypesCount;
    int32_t unresolvedVirtualCallParameterTypesOffset; // TypeIndex
    int32_t unresolvedVirtualCallParameterTypesCount;
    int32_t unresolvedVirtualCallParameterRangesOffset; // Il2CppRange
    int32_t unresolvedVirtualCallParameterRangesCount;
    int32_t windowsRuntimeTypeNamesOffset; // Il2CppWindowsRuntimeTypeNamePair
    int32_t windowsRuntimeTypeNamesSize;
    int32_t exportedTypeDefinitionsOffset; // TypeDefinitionIndex
    int32_t exportedTypeDefinitionsCount;
} Il2CppGlobalMetadataHeader;

The precise meanings of all these tables doesn’t matter for our purposes, but to enable the metadata file to be loaded by Il2CppInspector, we either need to construct a new Il2CppGlobalMetadataHeader struct whose layout matches that of the file, or rewrite the file’s header to match the original header layout. Each method has pros and cons, but in this case it is much easier to just edit the struct and leave the file alone, and you should generally prefer non-destructive techniques where possible. We don’t know what the extra 0x48 bytes of data is yet and we might need it later.

How do we determine the correct ordering? There are two main ways, and they both suck:

  1. Compare the tables in our metadata file with the one we created for the empty project, working through each table in the obfuscated file, looking for clustered patterns of similar data in the empty project metadata file, correlating the file location against the table list in empty project metadata header to see which table it is, and adding it to the struct; the number of cross-comparisons can be cut down by also referring to the IL2CPP metadata struct definitions (see below)
  1. Reverse engineering every IL2CPP function in the game assembly that uses a previously unread part of the metadata file to determine what file offsets it uses, and correlating it with the publicly available IL2CPP source code to see which table it is (if necessary)

Yikes. Luckily, having the IL2CPP library source code available plus the ability to generate arbitrary metadata files on demand with Unity makes our task much easier, but either approach will be time-consuming and error-prone.

In this article I’m going to focus on the second approach, but for illustration purposes, let’s find one table by way of example using the first technique.

Tip: If you are analyzing workloads in Il2CppInspector with customized struct layouts, there is no need to edit the source code. You can create a plugin that defines the custom struct and call BinaryObjectStream.AddObjectMapping(typeof(Il2CppGlobalMetadataHeader), typeof(MyCustomizedIl2CppGlobalMetadataHeader)) to replace all uses of the original struct with your customized version. This can be done for any IL2CPP struct – not just the metadata header – in both global-metadata.dat and the application binary. See the miHoYo loader plugin source code for an example. The customized structs are defined in MappedTypes.cs.

Here is some toothpaste, put it back in the tube

In part 1 I listed all of the discovered table offsets, let’s pick one at random to explore this technique – I’ll take the table beginning at 0x15C0D8C, it starts like this:

and ends like this:

First we need to make some observations. Each table entry is likely to be 16 (0x10) bytes; we can see this because the layout seems to repeat every 16 bytes. If you struggle to see this, try to imagine in your mind’s eye that each entry is four 4-byte integers named a, b, c and d, then – remembering that the data is stored little-endian – you can see that table[0].b == 0x08000001, table[1].b == 0x08000002 etc., while table[0].d = table[1].d = table[2].d = 0x0B etc. That is not to say the data is actually 4-byte integers – b here might be an 8-byte number for example – but we’re not trying to understand the data format per se. The point is that each “b” seems to contain a sequential value (related data), and each “d” is often 0x0B (again related data), which gives us confidence that 16 bytes is the size of each entry. Also, the actual table size is divisible by 16, and obviously the total size must be exactly divisible by the size of one entry.

Let’s consult Il2CppGlobalMetadataHeader and see which referenced structs are 16 bytes long. Unity Technologies very kindly commented the header struct with the names of all the structs used in each table, as you saw above. All of these structs are in the same file il2cpp-metadata.h, and virtually every item in each struct is either an int32_t or typedef‘d to one, so given that int32_t is four bytes long, we just need to pick out any struct that contains four fields. Here is what we find:

typedef struct Il2CppFieldDefinition
{
    StringIndex nameIndex;
    TypeIndex typeIndex;
    CustomAttributeIndex customAttributeIndex;
    uint32_t token;
} Il2CppFieldDefinition;

typedef struct Il2CppParameterDefinition
{
    StringIndex nameIndex;
    uint32_t token;
    CustomAttributeIndex customAttributeIndex;
    TypeIndex typeIndex;
} Il2CppParameterDefinition;

Some knowledge of .NET IL metadata goes a long way here, because the token field is a dead giveaway: every IL item (assembly, type, property, method etc.) is given a metadata token when compiled that uniquely identifies it within its scope. The bottom 24 bits are an ID and the top 8 bits identify the token type, which you can see in this abbrieviated definition of CorTokenType from the .NET Metadata Unmanaged API Reference:

typedef enum CorTokenType {

    mdtModule                       = 0x00000000,
    mdtTypeRef                      = 0x01000000,
    mdtTypeDef                      = 0x02000000,
    mdtFieldDef                     = 0x04000000,
    mdtMethodDef                    = 0x06000000,
    mdtParamDef                     = 0x08000000,
...

Hold the phone: parameter tokens are always 0x08xxxxxx. The second item in each table entry above also started with 0x08! This perfectly fits the layout of Il2CppParameterDefinition where token is the second item, so we’ve identified this table as the parameter definition table!

Meme Creator - Funny Dead Giveaway Meme Generator at MemeCreator.org!

Its file offset should normally be located in Il2CppGlobalMetadataHeader.parametersOffset – which is 0x58 bytes into Il2CppGlobalMetadataHeader – and the actual offset to this table (0x15C0D8C) is found at offset 0xE0 from the start of global-metadata.dat. Since every item in Il2CppGlobalMetadataHeader is a 4-byte integer, we can deduce that for Honkai Impact, parametersOffset should be the 0xE0 / 4 + 1th = 57th item in the header, and fill it in in our new struct:

typedef struct Il2CppGlobalMetadataHeader
{
	int32_t unknown1;
	int32_t unknown2;
	int32_t unknown3;
	int32_t unknown4;
	int32_t unknown5;
	// ....
	int32_t unknown56;
	int32_t parametersOffset;
	int32_t parametersCount;
	int32_t unknown59;
	// ....
	int32_t unknown84;
	int32_t unknown85;
	int32_t unknown86;
}

(the header is 0x158 bytes long which is 86 entries)

I’ve also added in parametersCount here, which is quickly verified by taking the length specified in the header after the table offset – 0x3B4AA0 in this case – adding it to the offset and verifying that the table does in fact end at that location.

Excellent! Two down, 84 to go! Now just repeat this for all of the other tables and you’ve reconstructed the entire header 🤮 Cheer up, it could be worse: you could be playing RAID: Shadow Legends.

Tip: If you find yourself in a situation where you have to perform this kind of analysis, note that each table you resolve helps you glean information that can be used to simplify solving the remaining tables. In the example above, we now have a ton of valid indexes for the string table (nameIndex) and the type definition table (typeIndex), plus we know the maximum valid index for a parameter: 0x3B4AA0 / 0x10 = 0x3B4AA – and all of the valid parameter token values. When we come across other tables that might reference these values, we can cross-check to confirm our theses.

Note: This technique has some important caveats. There may be tables with similar looking data and structs which cannot be easily differentiated. Handle these by coming back to them later when you have resolved more tables, then examine some of the values individually to see if you can cross-reference them with other tables in a way that makes sense.

Another very important caveat is that this technique also assumes the tables have not been obfuscated. We already know that the header struct fields in this application have been reordered, and such reordering – known as data structure layout randomization – is a common form of data obfuscation. We even saw it in our exploration of League of Legends Wild Rift! If you are the victim of this, you won’t be able to reconstruct the tables by correlating the metadata file contents and the IL2CPP source code.

Get busy, child

Enough foreplay, crack open your IDAs.

The natural first place to look for accesses to Il2CppGlobalMetadataHeader is right after the file has been loaded. We already looked at the loader – il2cpp::vm::MetadataLoader::LoadMetadataFile – in part 1, so if we click on the function name and search for cross-references we’ll find the call site, il2cpp::vm::MetadataCache::Initialize. We also navigate directly to this in our empty project (which has symbols already).

The test project:

void il2cpp::vm::MetadataCache::Initialize(void)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  v0 = (const Il2CppGlobalMetadataHeader *)il2cpp::vm::MetadataLoader::LoadMetadataFile("global-metadata.dat");
  s_GlobalMetadata = (void *)v0;
  s_GlobalMetadataHeader = v0;
  s_TypeInfoTable = (Il2CppClass **)il2cpp::utils::Memory::Calloc(s_Il2CppMetadataRegistration->typesCount, 8ui64);
  s_TypeInfoDefinitionTable = (Il2CppClass **)il2cpp::utils::Memory::Calloc(
                                                s_GlobalMetadataHeader->typeDefinitionsCount / 0x68ui64,
                                                8ui64);
  s_MethodInfoDefinitionTable = (MethodInfo **)il2cpp::utils::Memory::Calloc(
                                                 s_GlobalMetadataHeader->methodsCount / 0x38ui64,
                                                 8ui64);
  s_GenericMethodTable = (const Il2CppGenericMethod **)il2cpp::utils::Memory::Calloc(
                                                         s_Il2CppMetadataRegistration->methodSpecsCount,
                                                         8ui64);
  s_ImagesCount = (unsigned __int64)s_GlobalMetadataHeader->imagesCount >> 5;
  s_ImagesTable = (Il2CppImage *)il2cpp::utils::Memory::Calloc(s_ImagesCount, 0x40ui64);
  s_AssembliesCount = s_GlobalMetadataHeader->assembliesCount / 0x44ui64;
  s_AssembliesTable = (Il2CppAssembly *)il2cpp::utils::Memory::Calloc(s_AssembliesCount, 0x60ui64);
  v1 = s_GlobalMetadataHeader;
  v2 = (char *)s_GlobalMetadata;
  v3 = (int *)((char *)s_GlobalMetadata + s_GlobalMetadataHeader->imagesOffset);
  v4 = 0;
  v5 = 0;
  v6 = s_ImagesCount;
  if ( s_ImagesCount > 0 )
// ...

Honkai Impact:

void sub_7FFF41E81660()
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  qword_7FFF43D74AD0 = il2cpp::vm::MetadataLoader::LoadMetadataFromFile();
  qword_7FFF43D74AD8 = qword_7FFF43D74AD0;
  v0 = (qword_7FFF43D74AD0 + *(qword_7FFF43D74AD0 + 0x78i64));
  v46 = (qword_7FFF43D74AD0 + *(qword_7FFF43D74AD0 + 0x78i64));
  v1 = 0;
  v2 = 0;
  if ( *(qword_7FFF43D74AD0 + 0x7Ci64) / 68ui64 )
  {
    v3 = 0i64;
    do
    {
      sub_7FFF41ECDEF0(&v0[17 * v3]);
      v3 = ++v2;
    }
    while ( v2 < *(qword_7FFF43D74AD8 + 124) / 68ui64 );
  }
  qword_7FFF43D747E8 = sub_7FFF41EE0390(*(qword_7FFF43D74A38 + 48), 8i64);
  qword_7FFF43D747F0 = sub_7FFF41EE0390(*(qword_7FFF43D74AD8 + 84) / 0x68ui64, 8i64);
  qword_7FFF43D747F8 = sub_7FFF41EE0390(*(qword_7FFF43D74AD8 + 300) >> 6, 8i64);
  qword_7FFF43D74808 = sub_7FFF41EE0390(*(qword_7FFF43D74A38 + 64), 8i64);
  dword_7FFF43D74810 = *(qword_7FFF43D74AD8 + 0x74) >> 5;
  v5 = sub_7FFF41EE0390(dword_7FFF43D74810, 0x38i64);
  qword_7FFF43D74818 = v5;
  v8 = qword_7FFF43D74AD0;
  v9 = (qword_7FFF43D74AD0 + *(qword_7FFF43D74AD8 + 0x70));
  v10 = 0;
  if ( dword_7FFF43D74810 > 0 )
// ...

Clearly, by comparing these we can quickly pick off some low-hanging fruit and rename some obvious symbols, like this:

void il2cpp::vm::MetadataCache::Initialize()
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  s_GlobalMetadata = il2cpp::vm::MetadataLoader::LoadMetadataFromFile();
  s_GlobalMetadataHeader = s_GlobalMetadata;
  v0 = (s_GlobalMetadata + *(s_GlobalMetadata + 0x78i64));
  v46 = (s_GlobalMetadata + *(s_GlobalMetadata + 0x78i64));
  v1 = 0;
  v2 = 0;
  if ( *(s_GlobalMetadata + 0x7Ci64) / 68ui64 )
  {
    v3 = 0i64;
    do
    {
      sub_7FFF41ECDEF0(&v0[17 * v3]);
      v3 = ++v2;
    }
    while ( v2 < *(s_GlobalMetadataHeader + 124) / 68ui64 );
  }
  qword_7FFF43D747E8 = il2cpp::utils::Memory::Calloc(*(qword_7FFF43D74A38 + 48), 8i64);
  qword_7FFF43D747F0 = il2cpp::utils::Memory::Calloc(*(s_GlobalMetadataHeader + 84) / 0x68ui64, 8i64);
  qword_7FFF43D747F8 = il2cpp::utils::Memory::Calloc(*(s_GlobalMetadataHeader + 300) >> 6, 8i64);
  qword_7FFF43D74808 = il2cpp::utils::Memory::Calloc(*(qword_7FFF43D74A38 + 64), 8i64);
  s_ImagesCount = *(s_GlobalMetadataHeader + 0x74) >> 5;
  local_ImagesTable = il2cpp::utils::Memory::Calloc(s_ImagesCount, 0x38i64);
  s_ImagesTable = local_ImagesTable;
  v8 = s_GlobalMetadata;
  v9 = (s_GlobalMetadata + *(s_GlobalMetadataHeader + 0x70));
  v10 = 0;
  if ( s_ImagesCount > 0 )

Right now all of the fields in s_GlobalMetadataHeader (the static location of Il2CppGlobalMetadataHeader) are referenced as pointer offsets. To progress further we should create an Il2CppGlobalMetadataHeader struct of our own and assign it as s_GlobalMetadataHeader‘s type.

You can use IDA’s struct editor to do this but it’s fiddly. An easier way is to just paste in a C type declaration in the Local Types window. We create an initial struct with the following definition:

typedef struct Il2CppGlobalMetadataHeader
{
	int32_t unknown00;
	int32_t unknown04;
	int32_t unknown08;
	int32_t unknown0C;
        // ...
	int32_t unknown14C;
	int32_t unknown150;
	int32_t unknown154;
} Il2CppGlobalMetadataHeader;

We then return to the decompilation and assign the type Il2CppGlobalMetadataHeader * to both s_GlobalMetadata and s_GlobalMetadataHeader. Make sure you include the “*” pointer address-of operator!

Now our code starts to look more readable:

void il2cpp::vm::MetadataCache::Initialize()
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  s_GlobalMetadata = il2cpp::vm::MetadataLoader::LoadMetadataFromFile();
  s_GlobalMetadataHeader = s_GlobalMetadata;
  v0 = (&s_GlobalMetadata->unknown00 + s_GlobalMetadata->unknown78);
  v46 = s_GlobalMetadata + s_GlobalMetadata->unknown78;
  v1 = 0;
  v2 = 0;
  if ( s_GlobalMetadata->unknown7C / 68ui64 )
  {
    v3 = 0i64;
    do
    {
      sub_7FFF41ECDEF0(&v0[17 * v3]);
      v3 = ++v2;
    }
    while ( v2 < s_GlobalMetadataHeader->unknown7C / 68ui64 );
  }
  qword_7FFF43D747E8 = il2cpp::utils::Memory::Calloc(*(qword_7FFF43D74A38 + 48), 8i64);
  qword_7FFF43D747F0 = il2cpp::utils::Memory::Calloc(s_GlobalMetadataHeader->unknown54 / 0x68ui64, 8i64);
  qword_7FFF43D747F8 = il2cpp::utils::Memory::Calloc(s_GlobalMetadataHeader->unknown12C >> 6, 8i64);
  qword_7FFF43D74808 = il2cpp::utils::Memory::Calloc(*(qword_7FFF43D74A38 + 64), 8i64);
  s_ImagesCount = s_GlobalMetadataHeader->unknown74 >> 5;
  local_ImagesTable = il2cpp::utils::Memory::Calloc(s_ImagesCount, 0x38i64);
  s_ImagesTable = local_ImagesTable;
  v8 = s_GlobalMetadata;
  v9 = (s_GlobalMetadata + s_GlobalMetadataHeader->unknown70);
  v10 = 0;
  if ( s_ImagesCount > 0 )

We can now start renaming the struct fields one by one as we discover their meanings. In the code above, for example:

s_ImagesCount = s_GlobalMetadataHeader->unknown74 >> 5;

We can see both by looking at the name of the assigned variable and at the empty project code that unknown74 is really imagesCount. We rename this struct field – which we can do directly from the decompilation by clicking on the field symbol – and continue working our way through the function. There is no need to understand every line, and things that don’t match the empty project code can be ignored for now. Look for lines of code that are identical in both DLLs, or places where the table lengths are divided or right-shifted to get the item counts, where the code in both DLLs divide by the same amount (indicating a matching table entry size).

Sometimes you might want to redefine an item as an array, for example to change this:

*(v11 + local_ImagesTable) = v12;

into this:

local_ImagesTable[v11] = v12;

To do so, just change the variable type (local_ImagesTable in this case) to a pointer.

Don’t try to analyze the code line by line in order. It’s often easier to skip around picking off obvious things one by one, and then when you come back to earlier code the renamed symbols will make it easier to understand.

When it comes to divisions, remember that the compiler will often substitute division for right-shift when the divisor is a power of 2. Shifting right by 5 (>> 5) is the same as dividing by 32 (0x20), shifting right by 6 (>> 6) is the same as dividing by 64 (0x40) and so on. The compiler performs this optimization because bit-shift operators execute much faster than divide instructions at the CPU level.

An address-of operator to the first field of a struct, eg. &s_GlobalMetadata->unknown00 is equivalent to s_GlobalMetadata (it takes the address of the first member of the struct, which is the address of the struct).

Not everything will be in the same place, and this is highly dependent on compiler optimizations such as inlining and holding temporary placeholder variables. Consider this code on line 16 of the test project decompilation:

s_AssembliesCount = s_GlobalMetadataHeader->assembliesCount / 0x44ui64;

This does not exist in Honkai Impact’s decompilation. However, scrolling down a hundred lines or so in the empty project reveals:

if ( v6 > 0 )
  {
    v19 = 0i64;
    v20 = &v2[v1->assembliesOffset + 24];
    while ( 1 )
    {
      v21 = &s_AssembliesTable[v19];
      v22 = *(v20 - 6);
      v23 = v22 == -1 ? 0i64 : &s_ImagesTable[v22];
      v21->image = v23;
// ...

In Honkai Impact:

for ( i = v38; v21 < v18->unknown7C / 0x44ui64; v46 += 68 )
  {
    v23 = *v0;
    if ( v23 == -1 )
      v24 = 0i64;
    else
      v24 = s_ImagesTable + 56 * v23;
    for ( j = 0i64; j < *(v24 + 24); ++j )
    {

Although this code looks very different, the key is in the division by 0x44. In the test code, the dividend field refers to assembliesCount, and so that is likely to be the case for unknown7C in Honkai Impact.

If you’re not sure about the meaning of a field, give it a name anyway, but prefix it with something like maybe. This will help you with more named symbols as you look at other code, and you may be able to find out that it’s wrong further down the line, rather than just staring at v150 and not remembering where you last saw it.

Bear in mind that data obfuscation may periodically lead to a situation where a table entry in the test code has a different size to that in the target code (meaning that the divisor will be different), including potentially the same size as an entry in a different table. Be mindful of this if a particular table doesn’t seem to make sense.

If you find a for or while loop that iterates over a set of items (like the entries in a table or array), name the loop counter with an index-style suffix and find the symbol used in conjunction with it to retrieve an item (as an array index or pointer addition) – this is the collection that is being iterated. Oftentimes the decompiler will incorrectly produce a for, while or do loop in place of one of the other loop types. Look for counters initialized immediately before a while or do block – they will usually look like this:

v31 = 0i64;
do
{

The counter – v31 here – will then typically be referenced in the loop block (although not always).

Sometimes, you might just need to roll the dice and take a guess. Consider this code in Honkai Impact:

  qword_7FFF43D747E8 = il2cpp::utils::Memory::Calloc(*(qword_7FFF43D74A38 + 0x30), 8i64);
  s_TypeInfoDefinitionTable = il2cpp::utils::Memory::Calloc(
                                s_GlobalMetadataHeader->typeDefinitionsCount / 0x68ui64,
                                8i64);
  s_MethodInfoDefinitionTable = il2cpp::utils::Memory::Calloc(s_GlobalMetadataHeader->methodsCount >> 6, 8i64);
  qword_7FFF43D74808 = il2cpp::utils::Memory::Calloc(*(qword_7FFF43D74A38 + 64), 8i64);

We have deduced some of the symbols here but not all of them. The corresponding code in the test project is:

  s_TypeInfoTable = il2cpp::utils::Memory::Calloc(s_Il2CppMetadataRegistration->typesCount, 8ui64);
  s_TypeInfoDefinitionTable = il2cpp::utils::Memory::Calloc(
                                s_GlobalMetadataHeader->typeDefinitionsCount / 0x68ui64,
                                8ui64);
  s_MethodInfoDefinitionTable = il2cpp::utils::Memory::Calloc(s_GlobalMetadataHeader->methodsCount / 0x38ui64, 8ui64);
  s_GenericMethodTable = il2cpp::utils::Memory::Calloc(s_Il2CppMetadataRegistration->methodSpecsCount, 8ui64);

The memory allocations in the first and last lines of Honkai Impact don’t make any sense to us, but since the order of the known allocations is equivalent, we might be able to guess that the two unlabeled qwords are s_TypeInfoTable and s_GenericMethodTable and name them, even thought we can’t name anything in Il2CppGlobalMetadataHeader here and can’t be entirely sure. This is a good place to put some maybe-prefixed variable names.

When we are presented with code such as this:

v12 = qword_7FFF43D74F88(v8, *v9);
local_ImagesTable[imageIndex] = v12;

where a function in another DLL or via a function pointer is being called (you can tell this because it calls into a qword value), but the resulting assignment is obvious, we just rename the function without worrying about what it does, like this:

pImage = pGetImage(v8, *v9);
local_ImagesTable[imageIndex] = pImage;

If we need to, we can come back to it later.

It is very unlikely you’ll be able to derive every symbol in the function, nor should you try. Pick off what’s easy and move on: there is plenty of code left in the application to analyze! Eventually you’ll come unstuck and won’t be able to rename any more fields. At this point, we have to start delving into other parts of the application – which means we need to find them. How?

Tip: Not every field is used by automated IL2CPP tools. At the time of writing, Il2CppInspector ignores unresolvedVirtualCallParameterTypesOffset and various other fields, so there is no need to figure out their offsets. You will of course need to refer to the source code of your preferred tool to find out what you can skip. Don’t do unnecessary work!

I’ll give you a good reference

We can find all of the functions that use the metadata header by simply navigating to the static variables:

.data:00007FFF43D74AD0 ; Il2CppGlobalMetadataHeader *s_GlobalMetadata
.data:00007FFF43D74AD0 s_GlobalMetadata dq ?
.data:00007FFF43D74AD8 ; Il2CppGlobalMetadataHeader *s_GlobalMetadataHeader
.data:00007FFF43D74AD8 s_GlobalMetadataHeader Il2CppGlobalMetadataHeader <?>

and using List cross references to to produce a list of every function that references these addresses. There will be a lot of results since the metadata is used by many functions in the IL2CPP source code:

Some of these functions will be trivial to reverse engineer. Others will be horrifying. Start with the simple ones. Here is a nice example:

char *__fastcall sub_7FFF41E80080(int a1)
{
  return s_GlobalMetadata + 28 * a1 + s_GlobalMetadataHeader->unknown118;
}

This tells us that unknown118 is an offset into an array, of which each entry uses 28 (0x1C) bytes – or a size of 7 32-bit integers. Referring to il2cpp-metadata.h again, there is only one struct that fits the bill: Il2CppEventDefinition.

Tip: Two other good places to look are the API exports beginning with “il2cpp_” which are often used to fetch metadata, and the IL2CPP source code itself. If you find yourself flailing around clicking randomly, decide which field you want to locate, find all references to it in the IL2CPP source code, then work up through each function’s call hierarchy until you find an API call or some other function that is trivial to find (either standalone or by comparing with the test project), then in the decompiler, work back down the call hierarchy to where the metadata is accessed.

We can improve the decompilation further by importing all of the types from il2cpp-metadata.h so that we can assign them to the header fields. Use File -> Load file -> Parse C header file to do this. You will likely get errors, which you can resolve by temporarily removing the #includes from the top of the file so that only the struct definitions are present. Once you’ve done this, the Local Types window will fill with these structs at the bottom:

Be careful not to import Il2CppGlobalMetadataHeader as it will overwrite your work! (remove it from the file before importing, remembering to keep a backup)

To apply this to the function we just found, you are now able to redefine the function signature as:

Il2CppEventDefinition* GetEvent(int eventIndex)

which will produce the following updated decompilation:

Il2CppEventDefinition *__stdcall GetEvent(int eventIndex)
{
  return (s_GlobalMetadata + 28 * eventIndex + s_GlobalMetadataHeader->eventsOffset);
}

In addition – and this is crucial – any function you decompile which calls this one will now treat the return type as a pointer to an Il2CppEventDefinition. As it happens, this particular function is only called once. Let’s take a look at a snippet of part of it:

        v10 = GetEvent(v7);
        *(v9 - 8) = sub_7FFF41E806C0(v10->typeIndex);
        *(v9 - 16) = sub_7FFF41E81110(v10->nameIndex);
        *v9 = v2;
        if ( v10->add != -1 )
          *(v9 + 8) = *(*(v2 + 128) + 8i64 * v10->add);
        if ( v10->remove != -1 )
          *(v9 + 16) = *(*(v2 + 128) + 8i64 * v10->remove);
        if ( v10->raise != -1 )
          *(v9 + 24) = *(*(v2 + 128) + 8i64 * v10->raise);
        ++v7;
        *(v9 + 32) = v10->customAttributeIndex;
        *(v9 + 36) = v10->token;

As you can see, all of the fields of the event definition returned from GetEvent are used in the caller’s decompilation.

Cast aside

When an IL2CPP application loads, it takes the majority of the definitions in global-metadata.dat and turns them into runtime objects. These objects provide runtime type information to the application, and are also used to track things like whether a class’s static constructor has been called as well as many other properties.

Let’s take a look at a snippet of the first decompilation of il2cpp::vm::MetadataCache::Initialize() from the test project again:

s_TypeInfoDefinitionTable = (Il2CppClass **)il2cpp::utils::Memory::Calloc(
                                                s_GlobalMetadataHeader->typeDefinitionsCount / 0x68ui64,
                                                8ui64);
s_MethodInfoDefinitionTable = (MethodInfo **)il2cpp::utils::Memory::Calloc(
                                                 s_GlobalMetadataHeader->methodsCount / 0x38ui64,
                                                 8ui64);

Notice that for each Il2CppTypeDefinition in the metadata, one pointer to Il2CppClass is allocated. For each Il2CppMethodDefinition, one pointer to MethodInfo is allocated, and so on. We can also import these types from libil2cpp/il2cpp-class-internals.h, however it is quite a bit of hassle because of all the macros, method definitions, #includes and #defines, all of which IDA hates.

Note: Once you have imported types via C headers, you must synchronize them to the idb database before they will be recognized by the IDA decompiler. Right-click the desired types in the Local Types window (you can select more than one at a time; Ctrl+A selects every type) and select Synchronize to idb to do this.

Tip: We have a handy trick for you in Il2CppInspector to help with IL2CPP header imports: in Il2CppInspector/Il2CppInspector.Common/Cpp/UnityHeaders, you can find IDA and Ghidra-compatible header files for every version of Unity, not just the two files we have mentioned so far but also numerous others. We use a script to auto-generate this (source code in the link).

Technically, we don’t need these additional headers, but there is an excellent reason to include them: it prevents us from making really dumb mistakes. Remember earlier on when I said we’d name this function because its meaning was obvious:

pImage = pGetImage(v8, *v9);
local_ImagesTable[imageIndex] = pImage;
How To Be Wrong. Do you want to be right, or do you want… | by Ijeoma Oluo  | The Establishment | Medium

After applying the runtime object types, this decompiles as:

pImage = pGetImage(v12, *v13);
*(&local_ImagesTable->name + imageIndex_1) = pImage;

(the last line is equivalent to local_imagesTable[imageIndex_1].name = pImage)

Why would we be storing an Il2CppImage* in a name field? Let’s re-enable casting (which is actually the default):

pImage = (char *)pGetImage(v12, *v13);
*(const char **)((char *)&local_ImagesTable->name + imageIndex_1) = pImage;

Well this is alarming. EventInfo::name is a char * (as you might expect from a string). There are two possibilities: either our assumption about the purpose of pGetImage is incorrect, or there is some data obfuscation going on. Given that pGetImage is an external call, and that we previously detected custom decryption code that the game assembly called into UnityPlayer.dll to access, we may have stumbled across more shenanigans. What does the corresponding test project decompilation look like?

v9 = &v2[local_GlobalMetadataHeader->stringOffset + *v3];
s_ImagesTable[imageindex_1].name = v9;

(v3 in the test project corresponds to v13 in the real application)

Sneaky! The string table has been cunningly replaced with a function call. We’ll come back to this later, but note that we only realize this at this point because we imported the IL2CPP runtime object definitions and saw that an assignment didn’t make sense. The moral of the story is to use as much information as possible to aid in your analysis, even if it doesn’t seem particularly directly relevant.

Tip: Here are some useful IDA shortcuts used during this exercise:

Go to symbol in disassembly: G, type symbol name, Enter
Rename symbol: N, variable name, Enter
Change symbol type or function signature: Y, type signature, Enter
Convert variable to struct *: right-click variable, select Convert to struct *…
List cross references to symbol or function: place cursor on symbol or function signature, X
Navigate to static data from decompiler: double-click data symbol
Toggle casts: \ (backslash key)
Open local types window: Shift+F1
Create struct from C type definition: Shift+F1, Ins, type in definition, Ctrl+Enter
Parse C header file: Ctrl+F9
Add all imported types to project database: Shift+F1, Ctrl+A, right-click, select Synchronize to idb
Search for text (in decompiler or disassembly): Alt+T
Find next text match: Ctrl+T

I guess I didn’t know

This ball of string takes a couple of days to unravel even when working fairly efficiently, especially when you’re blogging it at the same time 😎 We dart back and forth all over the application, decompiling a vast swathe of functions and comparing constantly to the test project decompilation and the IL2CPP source code for clues. What I have described above is basically a snippet of the entire process, but this text encapsulates most of the techniques at hand and should be all the information needed for the enterprising analyst to reproduce the results.

After at least two days of blood, swearing and tears, our masterpiece is ready:

struct Il2CppGlobalMetadataHeader
{
	int32_t unknown00;
	int32_t unknown04;
	int32_t unknown08;
	int32_t unknown0C;
	int32_t unknown10;
	int32_t unknown14;
	int32_t unknown18;
	int32_t unknown1C;
	int32_t unknown20;
	int32_t unknown24;

	int32_t genericContainersOffset; // Il2CppGenericContainer
	int32_t genericContainersCount;

	int32_t nestedTypesOffset; // TypeDefinitionIndex
	int32_t nestedTypesCount;
	int32_t interfacesOffset; // TypeIndex
	int32_t interfacesCount;

	int32_t vtableMethodsOffset; // EncodedMethodIndex
	int32_t vtableMethodsCount;
	int32_t interfaceOffsetsOffset; // Il2CppInterfaceOffsetPair
	int32_t interfaceOffsetsCount;

	int32_t typeDefinitionsOffset; // Il2CppTypeDefinition
	int32_t typeDefinitionsCount;
	int32_t rgctxEntriesOffset; // Il2CppRGCTXDefinition
	int32_t rgctxEntriesCount;

	int32_t unknown60;
	int32_t unknown64;
	int32_t unknown68;
	int32_t unknown6C;

	int32_t imagesOffset; // Il2CppImageDefinition
	int32_t imagesCount;
	int32_t assembliesOffset; // Il2CppAssemblyDefinition
	int32_t assembliesCount;

	int32_t fieldsOffset; // Il2CppFieldDefinition
	int32_t fieldsCount;
	int32_t genericParametersOffset; // Il2CppGenericParameter
	int32_t genericParametersCount;

	int32_t fieldAndParameterDefaultValueDataOffset; // uint8_t
	int32_t fieldAndParameterDefaultValueDataCount;
	int32_t fieldMarshaledSizesOffset; // Il2CppFieldMarshaledSize
	int32_t fieldMarshaledSizesCount;

	int32_t referencedAssembliesOffset; // int32_t
	int32_t referencedAssembliesCount;
	int32_t attributesInfoOffset; // Il2CppCustomAttributeTypeRange
	int32_t attributesInfoCount;

	int32_t attributeTypesOffset; // TypeIndex
	int32_t attributeTypesCount;
	int32_t unresolvedVirtualCallParameterTypesOffset; // TypeIndex
	int32_t unresolvedVirtualCallParameterTypesCount;

	int32_t unresolvedVirtualCallParameterRangesOffset; // Il2CppRange
	int32_t unresolvedVirtualCallParameterRangesCount;
	int32_t windowsRuntimeTypeNamesOffset; // Il2CppWindowsRuntimeTypeNamePair
	int32_t windowsRuntimeTypeNamesSize;

	int32_t exportedTypeDefinitionsOffset; // TypeDefinitionIndex
	int32_t exportedTypeDefinitionsCount;
	int32_t unknownD8;
	int32_t unknownDC;

	int32_t parametersOffset; // Il2CppParameterDefinition
	int32_t parametersCount;
	int32_t genericParameterConstraintsOffset; // TypeIndex
	int32_t genericParameterConstraintsCount;

	int32_t unknownF0;
	int32_t unknownF4;

	int32_t metadataUsagePairsOffset; // Il2CppMetadataUsagePair
	int32_t metadataUsagePairsCount;

	int32_t unknown100;
	int32_t unknown104;
	int32_t unknown108;
	int32_t unknown10C;

	int32_t fieldRefsOffset; // Il2CppFieldRef
	int32_t fieldRefsCount;
	int32_t eventsOffset; // Il2CppEventDefinition
	int32_t eventsCount;

	int32_t propertiesOffset; // Il2CppPropertyDefinition
	int32_t propertiesCount;
	int32_t methodsOffset; // Il2CppMethodDefinition
	int32_t methodsCount;

	int32_t parameterDefaultValuesOffset; // Il2CppParameterDefaultValue
	int32_t parameterDefaultValuesCount;
	int32_t fieldDefaultValuesOffset; // Il2CppFieldDefaultValue
	int32_t fieldDefaultValuesCount;

	int32_t unknown140;
	int32_t unknown144;
	int32_t unknown148;
	int32_t unknown14C;

	int32_t metadataUsageListsOffset; // Il2CppMetadataUsageList
	int32_t metadataUsageListsCount;
} Il2CppGlobalMetadataHeader;

We note that the string literal, string literal index and .NET symbol tables are conspicuously absent, accesses to them having been replaced by external function calls. Most everything else seems to be present and correct, although substantially re-arranged in blocks: it looks like something a human has done rather than true randomization, because several groups of fields have been clumped and moved together while retaining their order within the group. This is indicative of some feisty Ctrl+C Ctrl+V action.

We are getting close to something we can drop into an Il2CppInspector plugin now, but there’s a problem. During our caffeine-fueled header-deobfuscating rampage, we noticed some disturbing patterns in a few of the functions. By this point we’ve named many functions; let’s have a look at a snippet of il2cpp::vm::SetupFieldsLocked. First the test project:

fieldDef = il2cpp::vm::MetadataCache::GetFieldDefinitionFromIndex(v10);
*(v11 - 8) = il2cpp::vm::MetadataCache::GetIl2CppTypeFromIndex(fieldDef->typeIndex);
*(v11 - 16) = il2cpp::vm::MetadataCache::GetStringFromIndex(fieldDef->nameIndex);
*v11 = v3;
v13 = il2cpp::vm::MetadataCache::GetIndexForTypeDefinition(v3);
*(v11 + 8) = il2cpp::vm::MetadataCache::GetFieldOffsetFromIndexLocked(v13, v10 - v9, (v11 - 16), v2);
v11 += 40i64;
++v10;
*(v11 - 28) = fieldDef->customAttributeIndex;
*(v11 - 24) = fieldDef->token;

Now Honkai Impact:

fieldDef = il2cpp::vm::MetadataCache::GetFieldDefinitionFromIndex(v12);
*(v13 - 8) = il2cpp::vm::MetadataCache::GetIl2CppTypeFromIndex(fieldDef->typeIndex);
*(v13 - 16) = il2cpp::vm::MetadataCache::GetStringFromIndex(fieldDef->customAttributeIndex);
*v13 = v3;
v15 = il2cpp::vm::MetadataCache::GetIndexForTypeDefinition(v3);
*(v13 + 8) = il2cpp::vm::MetadataCache::GetFieldOffsetFromIndexLocked(v15, (v12 - v10));
v13 += 40i64;
++v12;
*(v13 - 28) = fieldDef->nameIndex;
*(v13 - 24) = fieldDef->token;

Take a few moments to look over it carefully. Can you spot the difference?

nameIndex and customAttributeIndex have been switched. Shenanigans! We create a new Il2CppFieldDefinition struct and swap the two fields around to match.

This, of course, means that not just the header has been modified. If Il2CppFieldDefinition has also been reordered, it’s not beyond the realm of possibility that others have, too. There is good news and bad news here. The bad news is that – upon making this discovery – any metadata table in the entire file is now fair game as a candidate for having been obfuscated. The good news is that once we have the completed header in place, we have several options for deobfuscation:

  1. We can just run Il2CppInspector in a debugger, and every time it crashes, find which struct contains the problem value (generally a crash will occur when an index in one table entry that points to another table is out of bounds, so then you know the wrong field is being selected for the index and can assume the entire table layout is obfuscated)
  2. We can run Il2CppInspector in a debugger, setting a breakpoint at the end of the metadata loader then use the Autos debugger window to look at every field in each table to see if they make sense. If unsure, we can spin up a second instance with the same breakpoint and load the test project into it to perform a side-by-side comparison. We can also create a debugger plugin that dumps the first few items of each table in a human-readable format (implement the PostProcessMetadata hook if you want to do this)
  3. We know the offsets of every table now, so we can compare the tables with the ones in the test project’s global-metadata.dat and check if any items are out of place
  4. We can retrace our steps in IDA and check every table access. This is the most time-consuming but also the most accurate method, however at this point we will have located and named many functions so finding them again will be easy

For what it’s worth, I used a combination of options 1-3. Ultimately, the modified tables were Il2CppTypeDefinition, Il2CppMethodDefinition, Il2CppFieldDefinition and Il2CppPropertyDefinition. The changes were not extremely drastic – essentially some fields were reordered – but this is enough to defeat automated tools.

But what about the ball of strings, Katy?

We almost have everything we need to decrypt and read all of the metadata automatically, but we’re still missing all of the strings, and that will be the subject of part 3 of this mini-series, where we’ll gobble down the rest of this obfuscation layer cake, then take a nice nap before we contemplate eating another one. Until next time…

Advertisement
Categories: IL2CPP Tags: ,
  1. Carel
    January 19, 2021 at 22:21

    I started long ago with reverse engineering the first operating systems for pc as well as for example a pascal compiler. I learned a lot from that. The knowledge gained has served me well in the decennia that followed.

  1. January 19, 2021 at 21:06
  2. January 21, 2021 at 21:54

Share your thoughts! Note: to post source code, enclose it in [code lang=...] [/code] tags. Valid values for 'lang' are cpp, csharp, xml, javascript, php etc. To post compiler errors or other text that is best read monospaced, use 'text' as the value for lang.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: