Reverse Engineering Adventures: Honkai Impact 3rd (Part 3)
This is a continuation of the Reverse Engineering Adventures: Honkai Impact 3rd mini-series – read part 1 and part 2 first!
Recap
So far, we have decrypted global-metadata.dat
, and identified and resolved the data obfuscation of Il2CppGlobalMetadataHeader
and the four obfuscated metadata tables. We have observed that the string data is still out of our reach, and we need this to be able to load Honkai Impact into Il2CppInspector. Today we’ll find out how to access this information and create a final working plugin that will enable us to fully deobfuscate the game and analyze it.
Two balls of string
IL2CPP metadata includes two distinct kinds of string data:
- .NET symbol identifiers – these are identifiers used in the source code such as class, method, field and property names. This information is included in the metadata to enable reflection, which is a major design pattern in .NET applications. IL2CPP and this article refers to these as “strings“.
- Fixed application strings – these are strings used by the application itself, such as error messages, network hostnames, logging output and any other
const
orstatic
string values used by the source code which don’t change over the lifetime of the application. IL2CPP and this article refers to these as “string literals“.
Strings are stored in a single table in global-metadata.dat
, located at the offset named stringOffset
in the header. The strings are null-terminated and indexed by their byte offset from the start of the string table. Each string immediately follows the previous with no alignment padding, ie. the first character of string n can be found at the byte following the null terminator of string n-1.
String literals are managed by two tables in global-metadata.dat
. The first is located at the offset named stringLiteralDataOffset
in the header. This is a pure blob of string data. The strings are not null-terminated, and zero-indexed as a single-dimension array, ie. 0, 1, 2… Each string immediately follows the previous with no alignment padding. The table located at the offset named stringLiteralOffset
in the header consists of two 32-bit integers per entry, each corresponding to a single string, specifying the string’s offset from the start of stringLiteralDataOffset
and the string’s length. To find a string literal of index n, you look up entry n from this table, then read length bytes starting at offset from the data table.
IL2CPP provides two functions, const char *il2cpp::vm::MetadataCache::GetStringFromIndex(int index)
and Il2CppString *il2cpp::vm::MetadataCache::GetStringLiteralFromIndex(int index)
to retrieve strings and string literals respectively (Il2CppString *
is a type with a small header indicating the string length, followed by each character encoded as UTF-16).
In order to find out how Honkai Impact retrieves its string data, we need to find and investigate these two functions in the binary. One of several ways to do this is to scan the IL2CPP source code to find all of the call sites for these two functions and try to trace a path up each call site’s call stack until we find a known function that we’ve already discovered in Honkai Impact’s disassembly, then, starting from this point, we can work our way back down the same call stack – this time in the disassembly – until we reach the desired function.
Better to be lucky than good
As it happens, we already stumbled across GetStringFromIndex
by accident when we decompiled il2cpp::vm::MetadataCache::Initialize
:
imageIndex = 0;
if ( s_ImagesCount > 0 )
{
imageIndex_1 = 0i64;
while ( 1 )
{
pString = pGetStringFromIndex(v9, *v10);
*(&local_ImagesTable->name + imageIndex_1) = pString;
Here is the equivalent code from the IL2CPP source code:
for (int32_t imageIndex = 0; imageIndex < s_ImagesCount; imageIndex++)
{
const Il2CppImageDefinition* imageDefinition = imagesDefinitions + imageIndex;
Il2CppImage* image = s_ImagesTable + imageIndex;
image->name = GetStringFromIndex(imageDefinition->nameIndex);
You may recall from the end of part 2 that the call was replaced with a call to a function in a different DLL.
What are v9
and v10
?
v9 = s_GlobalMetadata;
v10 = (s_GlobalMetadata + s_GlobalMetadataHeader->imagesOffset);
v9
is very simple, it points to the start of global-metadata.dat
in memory. v10
is currently pointing to the start of the images table, which at first seems a little strange. At the end of the loop, v10
is incremented by 8:
v10 += 8;
The image table consists of a list of Il2CppImageDefinition
. Let’s look at this type:
typedef struct Il2CppImageDefinition
{
StringIndex nameIndex;
AssemblyIndex assemblyIndex;
TypeDefinitionIndex typeStart;
uint32_t typeCount;
TypeDefinitionIndex exportedTypeStart;
uint32_t exportedTypeCount;
MethodIndex entryPointIndex;
uint32_t token;
} Il2CppImageDefinition;
(all of the types ending in Index
here are uint32_t
s)
Now the situation becomes more clear. The first field in an Il2CppImageDefinition
is a string index (offset), and there are 8 items in each image definition. v10
is an unsigned int *
, so as a result of C pointer arithmetic magic, adding 8 to it increments the value by 8 pointer widths, not 8 bytes! Therefore, v10 + 8
points to the start of the next image definition, and therefore the next image definition’s string index. From this, we can understand that the loop iterates over all of the image definitions and fetches a string corresponding to each, which for an image definition is the name of the image, eg. mscorlib.dll
, UnityEngine.dll
and so on.
So, the external call to pGetStringFromIndex
passes in two arguments: a pointer to global-metadata.dat
and the desired string index, and returns a pointer to the fetched string.
How do we find which DLL the function is in, and its address within that DLL? Normally, we would fire up a debugger and set a breakpoint at the call site, then step forward one instruction to find out where we land. Unfortunately, Honkai Impact is protected by VMProtect and is full of debugger traps which will cause any attempt to attach a debugger to crash the process. We’re going to have to get creative.
What’s in a (thread) name?
We navigate to pGetStringFromIndex
in IDA and bring up the list of cross-references to find out where it is set:

We find just one location, in the IL2CPP API il2cpp_thread_get_name
. What? What does fetching a thread name have to do with setting an import address? And why is it setting it from an MMX/SSE register?
Smelling some hijinks, we disassemble this function:
.text:00007FFF41EA3740 il2cpp_thread_get_name proc near
.text:00007FFF41EA3740 sub rsp, 48h
.text:00007FFF41EA3744 cmp dword ptr [rdx], 5F5E0EBh
.text:00007FFF41EA374A jnz short loc_7FFF41EA3782
.text:00007FFF41EA374C movups xmm1, xmmword ptr [rcx]
.text:00007FFF41EA374F movsd xmm0, qword ptr [rcx+10h]
.text:00007FFF41EA3754 movsd [rsp+48h+var_18], xmm0
.text:00007FFF41EA375A mov rax, [rsp+48h+var_18]
.text:00007FFF41EA375F movq cs:unityplayer_DecryptMetadata, xmm1
.text:00007FFF41EA3767 psrldq xmm1, 8
.text:00007FFF41EA376C mov cs:qword_7FFF43D74F90, rax
.text:00007FFF41EA3773 xor eax, eax
.text:00007FFF41EA3775 movq cs:pGetStringFromIndex, xmm1
.text:00007FFF41EA377D add rsp, 48h
.text:00007FFF41EA3781 retn

Experienced analysts looking at this code right now probably just burst out laughing. If you didn’t, don’t worry: this is really obscure and the following explanation should put a smile on your face.
Let’s decompile this function:
__int64 __fastcall il2cpp_thread_get_name(__m128i *a1, _DWORD *a2)
{
__m128i v2; // xmm1
__int64 result; // rax
__int64 (__fastcall *v4)(_QWORD, _QWORD, _QWORD); // [rsp+30h] [rbp-18h]
if ( *a2 != 99999979 )
return sub_7FFF41EA8890(a2);
v2 = *a1;
v4 = a1[1].m128i_i64[0];
unityplayer_DecryptMetadata = *a1;
qword_7FFF43D74F90 = v4;
result = 0i64;
pGetStringFromIndex = *&_mm_srli_si128(v2, 8);
return result;
}
The second argument – a2
– is not used at all except to check whether its dereferenced value is the very suspicious 99999979
. If it’s not, the return function sub_7FFF41EA8890
simply sets *a
2 to zero and returns zero (lines 7-8). Why on earth do we need this? We don’t: this serves no purpose except to refuse to run the function unless *a2
is set to the tested value; it’s like a primitive form of authentication. The stench of shenanigans is rising.
The rest of the code uses something called SSE Intrinsics, which are generally used for SIMD floating point operations to accelerate functions like video decoding and other multimedia applications. The data type __m128i
as defined by Intel is a 16-byte integer. The intrinsic m128i_i64
returns a two-item array such that index 0 contains the lower 64 bits (8 bytes) of the value, and index 1 contains the upper 64 bits.
Line 10 tells us that a1
is a two-item array of __m128i
via the access to a1[1]
. Lines 9 and 11 which access *a1
can be considered equivalent to accessing a1[0]
(these are semantically equivalent in C).
Three function imports are stored via this function – one to unityplayer.DecryptMetadata
, one to qword_7FFF43D74F90
and one to pGetStringFromIndex
. We resolved unityplayer.DecryptMetadata
in part 1 of this series, so it’s reasonable to assume that the call to il2cpp_thread_get_name
is being made by UnityPlayer.dll
and that the other two imports are from the same DLL. We will investigate this more later.
We know that virtual memory address pointers are 64 bits (8 bytes). The assignment of v4
in line 10 takes the bottom 8 bytes of a1[1]
and discards the rest. This is then assigned to the import at qword_7FFF43D74F90
in line 12, which we currently do not know anything about.
The assignment of *a1
(a1[0]
) to unityplayer.DecryptMetadata
on line 11 causes an implicit cast of an __m128i
to a QWORD
. The latter is 64 bits wide so the top 64 bits of a1[0]
get discarded.
This leaves the slightly more tricky assignment on line 14. The SSE intrinsic _mm_srli_si128
(which I freely admit I had to look up in the Intel Intrinsics Guide – 90% of hacking is research!) shifts the first operand right by the number of bytes (not bits) specified in the second operand. Line 14 calls _mm_srli_si128(v2, 8)
, so we are essentially taking the top 64 bits of v2
and discarding the bottom 64 bits (by shifting v2
right 64 bits, the top 64 bits get filled with zeroes). The resulting value is then assigned to the import pGetStringFromIndex
.
Since this can be a little difficult to understand, a diagram might help:

a1
Ultimately, this function takes 24 bytes, interprets them as three 64-bit address pointers and stores them for later use.
This is some really sneaky stuff and I found it pretty amusing. This is probably a hand-coded assembly function, and it is quite strange because it’s clearly not obfuscated enough to have any meaningful effect, but it is obfuscated in an obscure enough way to raise a smile, knowing that whoever wrote this assembly was having a good time.
There is one final cherry on top to this bizarre excursion: il2cpp_thread_get_name
was removed in the Unity version immediately before the one used in Honkai Impact. In other words, this is not a real IL2CPP API export. It’s a decoy export designed to conceal where the import address are set. Hilarious!

Flipping the coin
It takes two to tango. Let’s load up UnityPlayer.dll
and find out where it calls this silly function.
Needless to say, the DLL doesn’t conveniently define this as an import, but the function name must be referenced somewhere, so we perform a string search for il2cpp_thread_get_name
. This is a few clicks of work and IDA has already labelled the string address as aIl2cppThreadGe
for us (a
stands for ASCII).
There is only one reference to this:
v179 = 0i64;
v181 = 0i64;
v182 = 68;
LOBYTE(v180) = 0;
sub_7FFF4E399380(&v179, "il2cpp_thread_get_name", 0x16ui64);
v152 = sub_7FFF4E7B30C0(qword_7FFF4F629F20, &v179, 0);
qword_7FFF4F629E48 = v152;
if ( v179 && v180 > 0 )
{
sub_7FFF4E51A4C0(v179, v182);
v152 = qword_7FFF4F629E48;
}
if ( !v152 )
{
v2 = 0;
sub_7FFF4E6E0A50("il2cpp: function il2cpp_thread_get_name not found\n");
}
This is part of a much longer function where this same pattern repeats dozens of times, one for each IL2CPP API. You don’t really need to screw around with all these function calls to get the gist of what is going on here: the string name is copied to v179
(line 5), the symbol is resolved to a code pointer and stored in v152
(line 6), then stored at qword_7FFF4F629E48
(line 7). We infer the meaning of v152
by looking at lines 13-17 which throw an error if v152
is zero (null). We infer the meaning of v179
by noting that the function in line 5 receives it by reference, the string literal pointer and the length of the literal – meaning it is probably a memcpy
-type function – then the result is passed again by reference to the symbol resolver function in line 6.
We rename qword_7FFF4F629E48
to pil2cpp_thread_get_name
and search for references to it. This time we are looking for where it is called. Again, there is only one location:
char __fastcall sub_7FFF4E816170(__int64 a1, __int64 a2, unsigned int a3, __int64 a4)
{
v10 = 99999979;
v7 = sub_7FFF4EB67110;
v8 = sub_7FFF4EB67130;
v4 = a4;
v9 = sub_7FFF4EB67120;
v5 = a3;
pil2cpp_thread_get_name(&v7, &v10);
sub_7FFF4E7E3A30();
qword_7FFF4F629E00(0i64);
qword_7FFF4F6299D0(v5, v4, 0i64);
qword_7FFF4F6299B8();
qword_7FFF4F6299C0();
qword_7FFF4F6299A0("IL2CPP Root Domain");
qword_7FFF4F6299E8("unused_application_configuration");
sub_7FFF4E81A540();
return 1;
}
We have a little giggle as we note that v10
is set to 99999979 to “authenticate” the call to il2cpp_thread_get_name
, but it’s not actually clear from the decompilation how the three function pointers at v7
, v8
and v9
are ordered in memory, or even if anything besides v7
is passed to the export, so we switch to the disassembly view:
.text:00007FFF4E816185 lea rax, sub_7FFF4EB67110
.text:00007FFF4E81618C mov [rsp+48h+arg_0], 5F5E0EBh
.text:00007FFF4E816194 mov [r11-28h], rax
.text:00007FFF4E816198 mov rsi, rdx
.text:00007FFF4E81619B lea rax, sub_7FFF4EB67130
.text:00007FFF4E8161A2 mov r14, rcx
.text:00007FFF4E8161A5 mov [r11-20h], rax
.text:00007FFF4E8161A9 lea rdx, [r11+8] ; _QWORD
.text:00007FFF4E8161AD lea rax, sub_7FFF4EB67120
.text:00007FFF4E8161B4 mov rbx, r9
.text:00007FFF4E8161B7 lea rcx, [r11-28h] ; _QWORD
.text:00007FFF4E8161BB mov [r11-18h], rax
.text:00007FFF4E8161BF mov edi, r8d
.text:00007FFF4E8161C2 call cs:pil2cpp_thread_get_name
The x64 calling convention dictates that the first argument is passed in rcx
, which we can see in line 11 is set to the address r11-28h
. Lines 3, 7 and 12 store the three function pointers loaded in lines 1, 5 and 9 at r11-28h
, r11-20h
and r11-18h
respectively. These are 64-bit pointers each 64 bits apart, so we fill in 24 bytes of consecutive pointer data, as expected.
This means the pointers are ordered in memory thus: sub_7FFF4EB67110
, sub_7FFF4EB7130
, sub_7FFF4EB7120
. By correlating these with our reverse engineered il2cpp_thread_get_name
function, we can discern their purposes: unityplayer.DecryptMetadata
, pGetStringFromIndex
, qword_7FFF43D74F90
. If we actually click on the first function, we get:
.text:00007FFF4EB67110 sub_7FFF4EB67110
.text:00007FFF4EB67110 jmp DecryptMetadata
.text:00007FFF4EB67110 sub_7FFF4EB67110 endp
where we defined DecryptMetadata
in part 1 of this series. Excellent! We now know where GetStringFromIndex
is in UnityPlayer.dll
, and we can construct a script to call it in isolation similarly to how we called DecryptMetadata
. In C#, this looks as follows:
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate IntPtr GetStringFromIndex(byte[] bytes, uint index);
// ...
var pGetStringFromIndex = (GetStringFromIndex) Marshal.GetDelegateForFunctionPointer(ModuleBase + 0x8E7130, typeof(GetStringFromIndex));
var stringIndex = 1234;
var decryptedString = Marshal.PtrToStringAnsi(pGetStringFromIndex(decryptedMetadata, (uint) index));
(ModuleBase
and decryptedMetadata
are defined in the script from part 1)
We calculate the offset of the string function from the module base by simply subtracting the base – 0x7FFF4E280000
– from the address of the function – 0x7FFF4EB67130
– to get 0x8E7130
. The function Marshal.PtrToStringAnsi
moves a null-terminated string from an unmanaged pointer address into a managed string
object.
Info: Ultimately, the entire string table can be found unencrypted starting at 0xD5F888
in this global-metadata.dat
file. The GetStringFromIndex
function is heavily obfuscated with control flow flattening, but appears to be a “do nothing” decoy function besides knowing the start offset of the string table, which is not actually present in the metadata header in plaintext form. I dumped every string using this function and ran a binary diff with the string table cut and paste from global-metadata.dat
and the results were identical. This may not necessarily be the case for future versions, of course.
Take me literally, not seriously
We can now repeat this process methodically to find GetStringLiteralFromIndex
, but let’s not do unnecessary work. Only the string literals are M.I.A. now, and we have one imported function that we don’t know anything about.
Returning to the game binary, we search for references to this import, and once again find only one call to it. If the function really is GetStringLiteralFromIndex
, this would make good sense because the IL2CPP source code only calls it in one place: il2cpp::vm::MetadataCache::InitializeMethodMetadataRange
. We pull up the source code for this and decompile the function which calls the import to compare.
Snippet of the source code:
void MetadataCache::IntializeMethodMetadataRange(uint32_t start, uint32_t count, const utils::dynamic_array<Il2CppMetadataUsage>& expectedUsages)
{
for (uint32_t i = 0; i < count; i++)
{
uint32_t offset = start + i;
IL2CPP_ASSERT(s_GlobalMetadataHeader->metadataUsagePairsCount >= 0 && offset <= static_cast<uint32_t>(s_GlobalMetadataHeader->metadataUsagePairsCount));
const Il2CppMetadataUsagePair* metadataUsagePairs = MetadataOffset<const Il2CppMetadataUsagePair*>(s_GlobalMetadata, s_GlobalMetadataHeader->metadataUsagePairsOffset, offset);
// ...
switch (usage)
{
case kIl2CppMetadataUsageFieldInfo:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetFieldInfoFromIndex(decodedIndex);
break;
case kIl2CppMetadataUsageStringLiteral:
*s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetStringLiteralFromIndex(decodedIndex);
break;
// ...
(kIl2CppMetadataUsageFieldInfo
is defined as 4 and kIl2CppMetadataUsageStringLiteral
is defined as 5)
Snippet of the decompilation:
Il2CppGlobalMetadataHeader *__fastcall sub_7FFF41E81E30(unsigned int a1)
{
v22 = -2i64;
result = s_GlobalMetadataHeader;
v2 = a1;
v3 = s_GlobalMetadata + s_GlobalMetadataHeader->metadataUsageListsOffset;
// ...
{
v9 = v6 + v4;
v10 = s_GlobalMetadata + s_GlobalMetadataHeader->metadataUsagePairsOffset;
v11 = *&v10[8 * v9];
// ...
switch ( v13 )
{
case 4:
v18 = (s_GlobalMetadata + 8 * v14 + s_GlobalMetadataHeader->fieldRefsOffset);
v19 = *(sub_7FFF41E81200(*v18) + 104) + 40i64 * v18[1];
v8 = qword_7FFF43D74A38;
result = *(qword_7FFF43D74A38 + 120);
**(&result->unknown00 + v11) = v19;
break;
case 5:
v20 = *(8i64 * v14 + qword_7FFF43D74800);
if ( !v20 )
{
if ( dword_7FFF43D74AE8 > *(*v7 + 4i64) )
{
Init_thread_header(&dword_7FFF43D74AE8, v13, v2, v8, v22);
if ( dword_7FFF43D74AE8 == -1 )
{
sub_7FFF41EE15A0(&unk_7FFF43D74AE0);
atexit(sub_7FFF420969A0);
Init_thread_footer(&dword_7FFF43D74AE8);
}
}
v24 = &unk_7FFF43D74AE0;
sub_7FFF41EE16C0(&unk_7FFF43D74AE0);
v23 = 0;
v21 = qword_7FFF43D74F90(s_GlobalMetadata, v14, &v23);
v20 = il2cpp_string_new_len_0(v21, v23);
*(8i64 * v14 + qword_7FFF43D74800) = v20;
sub_7FFF41EE16F0(&unk_7FFF43D74AE0);
v8 = qword_7FFF43D74A38;
}
result = *(*(v8 + 120) + 8 * v11);
*&result->unknown00 = v20;
break;
That looks pretty good! GetFieldInfoFromIndex
and GetStringLiteralFromIndex
(the original one) have been inlined by the compiler here – probably because they are only ever called once, from this function – so that is why the code in the case
blocks is not a direct match, but you can examine the IL2CPP source code to see they are similar.
The code in the highlighted line calls the import, passing in the metadata file pointer, the string index (v14
) and an address (v23
). You may recall that the original function returns an Il2CppString*
, and the API il2cpp_string_new_len
on the following line takes a string and a length and creates an Il2CppString *
. We can deduce from this that v23
contains the string length, and – since it is initialized to zero then passed by reference to the import – that the import sets the length. Renaming and retyping the function and variables leads to this much more readable code:
LODWORD(stringLength) = 0;
string = unityplayer_GetStringLiteralFromIndex(s_GlobalMetadata, index, &stringLength);
il2cppString = il2cpp_string_new_len_0(string, stringLength);
s_StringLiteralTable[index] = il2cppString;
Once again, we can augment our C# program to handle this as follows to fetch every string literal:
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate IntPtr GetStringLiteralFromIndex(byte[] bytes, uint index, ref int length);
// ...
var pGetStringLiteralFromIndex = (GetStringLiteralFromIndex)
Marshal.GetDelegateForFunctionPointer(ModuleBase + 0x2CFA0, typeof(GetStringLiteralFromIndex));
var stringLiterals = new List<string>();
var length = 0;
for (uint index = 0; index < stringLiteralCount; index++) {
var decryptedBytesUnmanaged = pGetStringLiteralFromIndex(decrypedMetadata, index, ref length);
var str = new byte[length];
Marshal.Copy(decryptedBytesUnmanaged, str, 0, length);
stringLiterals.Add(Encoding.UTF8.GetString(str));
}
Once again, we subtract the address of the function from the module base to get the pointer offset 0x2CFA0
. Notice that we use a ref
parameter this time to enable us to receive the returned string length from unmanaged code. Since the returned string is not null-terminated, we have to use Marshal.Copy
to copy the correct number of bytes from unmanaged memory into an array, then use Encoding.UTF8.GetString
to convert the byte array to a .NET string object.
How do we find the maximum string literal index? There are a couple of ways, and I’ll discuss a version that will work with any IL2CPP binary below, but first let’s return to the metadata file that we examined in part 1. You may recall there were some blocks of data that were unaccounted for. It turns out that the encrypted block between 0x158-0x146480
contains the actual string literal data (stringLiteralDataOffset
is 0x158
). The offset/length table normally found at stringLiteralOffset
starts immediately after this – at 0x146480
. The beginning looks like this:

This can be interpreted as follows: the first string literal starts at offset 0x0
and is 0x21
bytes long. The second string literal starts at 0x21
and is 0x1
byte long, and so on. The offsets are in sorted order so we can find the end of the table simply by searching for an inversion (an entry where the value is lower than the previous value):

The next table begins at 0x1A7238
. Since each entry is 8 bytes, we can trivially calculate the maximum index:
(0x1A7238 - 0x146480) / 8 - 1 == 0xC1B6
(the minus one term takes into account that the table is 0-indexed, so the index range is 0x0
– 0xC1B6
for a total of 0xC1B7
string literals)
Are we nearly there yet? Are we nearly there yet? Are we nearly there yet?
We now know how to access all of the required metadata and can begin coding an Il2CppInspector plugin (wiki documentation) to process all of this. If we had to modify the main tool’s source code it would be quite fiddly, but we have provided some APIs that enable you to make arbitrary changes to the load pipeline.
Along the way, we’re going to have to tidy up some loose ends. The full source code for the miHoYo loader plugin can be found here and it’s well-commented so I’m not going to go over every line of code here, but let’s cover the key points so you have a flavour of what’s possible.
Loader plugins are created by implementing events that are broadcast to all plugins at various stages in the load pipeline. First, we deal with loading and unloading UnityPlayer.dll
when the user triggers a new load task:
// This executes when the client begins to load a new IL2CPP application
public void LoadPipelineStarting(PluginLoadPipelineStartingEventInfo info) {
// Try to load UnityPlayer.dll
hModule = LoadLibrary(unityPath.Value);
if (hModule == IntPtr.Zero)
throw new FileLoadException("Could not load UnityPlayer DLL", unityPath.Value);
// Get the base address of the loaded DLL in memory
ModuleBase = Process.GetCurrentProcess().Modules.Cast<ProcessModule>().First(m => m.ModuleName == Path.GetFileName(unityPath.Value)).BaseAddress;
}
// This executes when the client finishes loading an IL2CPP application
public void LoadPipelineEnding(List<Il2CppInspector.Il2CppInspector> packages, PluginLoadPipelineEndingEventInfo info) {
// Release memory lock on UnityPlayer.dll
FreeLibrary(hModule);
}
This code is hopefully fairly self-explanatory.
The initial decryption of global-metadata.dat
described in part 1 is processed by PreProcessMetadata
, using exactly the same code shown in that part. We store a copy of the decrypted metadata byte array in metadataBlob
so we can pass it to GetStringFromIndex
and GetStringFromLiteralIndex
later.
More interesting is how we deal with the table data obfuscation described in part 2. For this, we engage the API IFileFormatStream.AddObjectMapping
:
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppGlobalMetadataHeader), typeof(Il2CppGlobalMetadataHeader));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppTypeDefinition), typeof(Il2CppTypeDefinition));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppMethodDefinition), typeof(Il2CppMethodDefinition));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppFieldDefinition), typeof(Il2CppFieldDefinition));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppPropertyDefinition), typeof(Il2CppPropertyDefinition));
This instructs Il2CppInspector to replace all reads of the standard IL2CPP objects specified in the 1st set of arguments with customized versions in the 2nd set of arguments. When Il2CppInspector encounters one of these objects, it reads the customized version, generates the original version and copies any fields with matching names over. Field names that don’t match in either object are ignored. This allows you to reorder fields and skip unknown data.
We also provide some attributes to help out. Let’s look at how we deal with the header:
public class Il2CppGlobalMetadataHeader
{
[SkipWhenReading]
public uint signature = Il2CppConstants.MetadataSignature;
[SkipWhenReading]
public int version = 24;
[ArrayLength(FixedSize = 0x28)]
public byte[] unk;
public int genericContainersOffset; // Il2CppGenericContainer
public int genericContainersCount;
public int nestedTypesOffset; // TypeDefinitionIndex
public int nestedTypesCount;
// ...
Recall that the magic bytes and IL2CPP version number are absent even from the decrypted metadata. By using the [SkipWhenReading]
attribute, we can tell Il2CppInspector not to read a value from the file, but to still copy the field to the deobfuscated object. We use this here to set the correct magic bytes and IL2CPP version statically without destructively editing the file stream.
If you have a large block of data you want to ignore, you can use the [ArrayLength]
attribute to tell Il2CppInspector to read a specified number of bytes (you can also use an argument to tell it to read the number of bytes specified by another field in the struct). If the field name doesn’t match a field in the original object, the read data will simply be discarded as in the case with unk
above.
For the encrypted/junk data in the header we don’t use, we just interpose dummy variables:
// ...
public int windowsRuntimeTypeNamesOffset; // Il2CppWindowsRuntimeTypeNamePair
public int windowsRuntimeTypeNamesSize;
public int exportedTypeDefinitionsOffset; // TypeDefinitionIndex
public int exportedTypeDefinitionsCount;
public int unk5;
public int unk6;
public int parametersOffset; // Il2CppParameterDefinition
public int parametersCount;
public int genericParameterConstraintsOffset; // TypeIndex
public int genericParameterConstraintsCount;
public int unk7;
public int unk8;
public int metadataUsagePairsOffset; // Il2CppMetadataUsagePair
public int metadataUsagePairsCount;
// ...
When it comes to fetching .NET string identifiers, we implement the GetStrings
event. There is a slight kink here, because we need to determine every string index used by the binary. Recall that they are actually offsets into the start of the table rather than sequential values.
String indices are stored in a variety of fields across the metadata, most but not all of which are called nameIndex
. We use Linq to iterate over every image, assembly, event, field, method, parameter, property, type and generic parameter in the metadata and build a list of every index as follows:
var stringIndexes =
metadata.Images.Select(x => x.nameIndex)
.Concat(metadata.Assemblies.Select(x => x.aname.nameIndex))
.Concat(metadata.Assemblies.Select(x => x.aname.cultureIndex))
.Concat(metadata.Assemblies.Select(x => x.aname.hashValueIndex))
.Concat(metadata.Assemblies.Select(x => x.aname.publicKeyIndex))
.Concat(metadata.Events.Select(x => x.nameIndex))
.Concat(metadata.Fields.Select(x => x.nameIndex))
.Concat(metadata.Methods.Select(x => x.nameIndex))
.Concat(metadata.Params.Select(x => x.nameIndex))
.Concat(metadata.Properties.Select(x => x.nameIndex))
.Concat(metadata.Types.Select(x => x.nameIndex))
.Concat(metadata.Types.Select(x => x.namespaceIndex))
.Concat(metadata.GenericParameters.Select(x => x.nameIndex))
.OrderBy(x => x)
.Distinct()
.ToList();
We can then call GetStringFromIndex
repeatedly, iterating over stringIndexes
to fetch every string.
For fetching string literals, we have another problem – determining the maximum index. Although we were able to find this by looking at global-metadata.dat
by eye, the table offset and length will surely change with every version so we’d like a more general solution.
We can look at the metadata usages table (see the Metadata Usages section of this article for an explanation of this table) to determine the maximum string literal index, however we cannot do this until Il2CppInspector has finished analyzing the binary.
We start by creating a dummy implementation of GetStringLiterals
:
public void GetStringLiterals(Metadata metadata, PluginGetStringLiteralsEventInfo data) {
// We need to prevent Il2CppInspector from attempting to read string literals from the metadata file until we can calculate how many there are
data.FullyProcessed = true;
}
We then implement PostProcessPackage
– which executes after both the metadata and binary have been analyzed and the relationships between the data in the two files have been correlated with each other – and scan the metadata usages table to find the maximum string literal index used:
public void PostProcessPackage(Il2CppInspector.Il2CppInspector package, PluginPostProcessPackageEventInfo data) {
var stringLiteralCount = package.MetadataUsages.Where(u => u.Type == MetadataUsageType.StringLiteral).Max(u => u.SourceIndex) + 1;
Now we can just implement a for
loop calling GetStringLiteralFromIndex
for each index.
Et voilà!

…and that’s it! At last, the application loads into Il2CppInspector and we can enjoy the fruits of our labour. We may now finally sleep – after we reflect on what we’ve learned.

Takeaways for budding analysts
Don’t be intimidated. The obfuscation of an application may seem overwhelming at first, but anything that is obfuscated can be deobfuscated with enough effort. Spend time gathering a superficial overview of the moving parts before delving into any one problem area, and break the problem up into manageable blocks.
Don’t do more work than necessary. While we could certainly sit and reverse engineer every function in the application, there is really no need to do this. Focus on key functions, don’t try to understand every line of code, work at the decompiler rather than assembly level where it makes sense to do so. Observe that we don’t need to understand deobfuscation functions to be able to call them: just get the application to do the hard work for us. Observe that we can often discern what a function does merely by looking at its input and output parameters.
Strong pattern recognition skills are essential. So many of the techniques we used in this series were essentially lo-fi attacks where we just used our eyes to find clustered data, related code and areas of high and low entropy. Humans are much better at finding patterns in seemingly random data than computers: leverage on this and hone your pattern-finding skills! You can determine vast amounts of information about how an application works just by looking at its data structures.
Make educated guesses based on logical thought processes. During this exercise, we made many assumptions about the purpose of code and data based purely on context or on known common design patterns. Most of the time, our assumptions were correct. The ability to make educated guesses is a skill that takes time to develop; the more code you reverse engineer, the more you will see the same design patterns over and over again, and gradually this will become easier and easier.
Do your research. A large percentage of reverse engineering is in external research. Without knowledge of IL2CPP or access to the freely available public source code, reverse engineering this application would have been much more difficult. Use any and all available information. Refer to the Microsoft API documentation. Refer to the Intel and ARM instruction set references. Google how to perform tasks in IDA or other tools. Don’t re-invent the wheel – there is probably a tool out there that does what you need already! Learn about common encryption and obfuscation techniques so that you’re aware of them and know their strengths and weaknesses. Don’t be afraid to ask questions or make mistakes – that is a normal part of the learning process, and there is always more to learn.
Don’t blog while you’re reverse engineering. It reduces your productivity ten-fold 😎
Thoughts on Honkai Impact
I expect miHoYo and others creating obfuscation schemes to read this article – this section is for you.
The obfuscation in Honkai Impact is interesting. It demonstrates an awareness of the IL2CPP reverse engineering tools available and explicitly targets them using several layers of protection. In addition to what I’ve covered in this series, the game is also obfuscated with VMProtect and – as a last line of defence if all else fails – Beebyte has been applied to the .NET identifiers. This is quite the layer cake.
Unfortunately, the layers are disjoint. The three primary decryption functions are obfuscated via control flow flattening, but there is nothing to stop an attacker from just loading the DLL in isolation and calling them without having to care how they work. No assembly-level obfuscation is applied to defeat a decompiler, which makes the code quite easy to compare to an arbitrary IL2CPP project. Anti-debugging is applied when BH3.exe
loads bh3base.dll
, but doesn’t prevent UnityPlayer.dll
from being debugged on its own.
Tricks like the il2cpp_thread_get_name
decoy are a cute Easter egg, but such security by obscurity doesn’t really add anything to the reverse engineering complexity.
I examined versions 3.8 to 4.3 of the game. This series covered version 4.3, but the obfuscation and encryption algorithms are identical from version to version. This is a mistake. Additionally, hackers will likely take the path of least resistance which is in fact likely not the PC version but the mobile versions. I’m just a sucker for punishment, but you cannot apply products like VMProtect or heavy obfuscation that impacts performance to an Android app. Versions for all platforms need to be bolstered, and this is likely to be a pain point. Additionally, the Android metadata can be fed into the PC UnityPlayer.dll
without issues. A tiny tweak to the encryption algorithm between the two platform builds could have mitigated this.
Overall, the obfuscation and protection feels bolted on as an afterthought, and that results in the various loopholes we exploited as we navigated around a hodgepodge of disparate forms of obfuscation. Obviously it requires significant expenditure of resources to design a product with security in mind from the outset and I for one am glad companies don’t waste too much time on this. Ultimately, obfuscation is only a delay tactic, so whether it matters is a really question of what you want to accomplish by applying it. For a paid game that typically generates most of its sales revenue in the first few weeks, slapping Denuvo on it to mitigate that for a few months may be a viable solution.
Honkai Impact is free-to-play, so we suppose that the obfuscation is to prevent cheating. In my opinion, relying on the client to enforce anti-cheat is a mistake. You can never trust the behaviour of a client or the data it sends to a server. It is better to deal with cheaters via backend design: always make sure the game server is the single source of truth, analyze incoming network data for suspicious patterns like super-human aiming ability and flag those accounts for review, create server-side honeypots that matchmake cheaters with each other and so on. Can client-side anti-cheat be a useful tool as part of a vertical solution? I believe the effect is extremely limited. Riot Games trotted out Vanguard with Valorant last year which employs a kernel mode driver reminiscent of the notorious StarForce DRM, yet cheating in Valorant is rampant. If your product is considered a high value target, you can expect it to be reverse engineered.
For companies that really want to apply client-side obfuscation despite it being essentially pointless, in-house obfuscation designs are contraindicated unless the developers have prior expertise in the field. While miHoYo was smart to target IL2CPP tools, and certainly achieved more than anyone else by far, the result was still lackluster because of the weak linkage between the different elements of obfuscation. Experienced obfuscation authors will not make this mistake. Nobody can battle-harden your application like determined hackers. The talent is out there: leverage on their expertise to produce hardened applications and have analysts try to find exploits before release.
One really interesting phenomenon is that, once the cat is out of the bag, you can’t put it back in again. We’ll see this when I examine how to break Genshin Impact with Powershell: the protection was improved from Honkai Impact, but it doesn’t matter because so much knowledge was gained by reverse engineering Honkai Impact that we could mostly just skip over all of it. Once your protection is broken, iterating on it incrementally doesn’t really help. You pretty much have to start again with an entirely new scheme.
A+ for effort, though.
I’m sorry, I couldn’t help it…
So we’re finally done with this game, right? We can sleep soundly tonight, yes? Well, not quite.
Although we’ve achieved our goals, we still don’t actually know how the encryption algorithm works. Join me next time for a bonus part where I delve into the murky underworld of VMProtect’s control flow obfuscation and show how to reverse engineer the 1800 decompiled-line string literal decryption function to discover how it really works – and all because I had insomnia the other night. You won’t want to miss it, it’s going to be a doozy. Until next time…

Really cool work! I honestly don’t understand much of it, but the level of detail and helpful tips in the blog are really nice. I can tell a lot of care goes into this.
I think https://blog.palug.cn/2460.html was doing some work in a similar realm. Was searching around since some stuff seems to have changed in the most recent version and I stumbled across these excellent blog posts.