Reverse Engineering Adventures: League of Legends Wild Rift (IL2CPP)

Home > IL2CPP, Reverse Engineering > Reverse Engineering Adventures: League of Legends Wild Rift (IL2CPP)

Reverse Engineering Adventures: League of Legends Wild Rift (IL2CPP)

January 15, 2021 Leave a comment Go to comments

The most common issue I receive on the tracker for Il2CppInspector is “this file won’t load”. Oftentimes this is due to a bug in the tool, but sometimes it leads me down a reverse engineering rabbit hole. From the end-user’s perspective, there is no difference: they post the issue and wait, and some time later I post some commits and tell them it’s fixed. I am usually unaware of what the target workload (application) actually is, and it doesn’t really concern me. As a tool author, my job is to try to ensure a broad range of compatibility with different workloads.

I don’t like to add workload-specific customizations to Il2CppInspector as it’s meant to be a generic tool intended to be fed with “clean” IL2CPP files – although you can now code your own plugins to add workload-specific functionality – but from time to time I see files with the same obfuscation over and over again, and then I take a look. So it was with a series of files over the last few months, which after deobfuscation turned out to be Riot Games’ new mobile magnum opus League of Legends: Wild Rift.

In this article, I’m going to guide you through (an albeit highly abridged version of) the steps I took to deobfuscate Wild Rift. It’s not too complicated, but it will highlight the importance of developing intuition and leveraging on the variety of tools at the analyst’s disposal to achieve a quick turnaround. While Wild Rift’s obfuscation is not complex, it is enough to defeat the majority of casual hackers, so the idea here is to encourage newcomers to move beyond only using automated tooling and develop the skills and intuition needed to be able to progress when the automated tools break. If you don’t know what to do when your favourite tool (whatever it may be) doesn’t work on a particular workload, this article is for you!

Info: All of the steps described below are performed by Il2CppInspector automatically, and current versions can analyse Wild Rift without any problems. Nevertheless, the curious hacker should always want to learn more, so keep reading for an adventure!

Oh no, this doesn’t look good

The first thing I do when confronted with an unknown binary is run it through Il2CppInspector to reproduce the error. What does Wild Rift say?

No matches in symbol table
No matches via code heuristics
No matches via data heuristics
Could not process IL2CPP image. This may mean the binary file is packed, encrypted or obfuscated in a way Il2CppInspector cannot process, that the file is not an IL2CPP image or that Il2CppInspector was not able to automatically find the required data.

One of the first things Il2CppInspector does is try to locate the Il2CppCodeRegistration and Il2CppMetadataRegistration structures as a starting point for its analysis. Alarm bells start ringing when this step fails, and my first thoughts are always “corrupted binary”, “obfuscation” or “encryption”. The first port of call is to load up the binary into both IDA Pro and a hex editor (I use HxD) and see what’s up.

After a short eternity waiting for IDA to do something, we navigate to the .text section where the code is and immediately there’s a problem:

.text:00000000016DDD70 ; Segment type: Pure code
.text:00000000016DDD70                 AREA .text, CODE, ALIGN=4
.text:00000000016DDD70                 ; ORG 0x16DDD70
.text:00000000016DDD70                 CODE64
.text:00000000016DDD70 ; __unwind {
.text:00000000016DDD70                 EXPORT start
.text:00000000016DDD70 start           DCQ 0x5AE303E113A159C1, 0x32A3E3A017A3A341, 0xF2A3A7E72BFC5FC1
.text:00000000016DDD70                                         ; DATA XREF: LOAD:off_18↑o
.text:00000000016DDD70                                         ; LOAD:0000000000003A18↑o
.text:00000000016DDD70                 DCQ 0x97A3A3062BA65FC7, 0x1AE3B3A1B45C5C5F, 0x1AA3B3A0F2A3A7E0
.text:00000000016DDD70                 DCQ 0xF7A3A3EFC8BCA3FC, 0x75FCA063B7ACCF97
.text:00000000016DDD70 ; } // starts at 16DDD70
.text:00000000016DDDB0 qword_16DDDB0   DCQ 0x3290A3A353A15883  ; DATA XREF: .fini_array:0000000006EFE170↓o
.text:00000000016DDDB8                 DCB 0x51, 0x59, 0x5C, 0xB4
.text:00000000016DDDBC ; __unwind {
.text:00000000016DDDBC dword_16DDDBC   DCD 0xA1CD85E           ; DATA XREF: .init_array:0000000007BE4C50↓o
.text:00000000016DDDC0                 DCQ 0x73A0B5A332A3A05E, 0x32A8A3A373A3A3A2, 0x37ABFBC932AA4382
.text:00000000016DDDC0                 DCQ 0x73A3A3A273A0B5A3, 0x32A9A38232A8A7A3, 0xB7ABFB2F0B62D85E
.text:00000000016DDDC0 ; } // starts at 16DDDBC
.text:00000000016DDDF0 ; __unwind {
.text:00000000016DDDF0 qword_16DDDF0   DCQ 0xAA2EC5772A3605C, 0x32A3205E0AA1D85E, 0x13A0B56253A15890
.text:00000000016DDDF0                                         ; DATA XREF: .init_array:0000000007BE4C58↓o
.text:00000000016DDDF0                 DCQ 0x3290A1D073A3AEA3, 0x328FC3A332950382, 0xAA35F9C09B0A041
.text:00000000016DDDF0                 DCQ 0x345C59E05AA3A39C, 0x3294A13713A0B577, 0x32A3A04032A38041

The code doesn’t disassemble at all. Uh oh, that probably means it’s encrypted somehow; but as we scroll further, we find something curious: there are blocks of code interspersed in the section that disassemble just fine:

.text:00000000016DEFB4 dword_16DEFB4   DCD 0xA1DD85E           ; DATA XREF: .init_array:0000000007BE4D98↓o
.text:00000000016DEFB8                 DCQ 0x32A3A05E33A159C3, 0x13A0B4D70AA2F050, 0x329341305AE303A3
.text:00000000016DEFB8                 DCQ 0x17A3A2A39AA3B1DC, 0x9B0A04313A3BE22, 0x345C5A7C32AED382
.text:00000000016DEFB8                 DCQ 0xB93CB443C8BCA3BC, 0x329341229AA3B1C3, 0xAE2F05013A3BE23
.text:00000000016DEFB8                 DCQ 0x910D0000B002FB22, 0x91330042A8C27BFD
.text:00000000016DF010 ; ---------------------------------------------------------------------------
.text:00000000016DF010                 B               .__cxa_atexit
.text:00000000016DF014                 STP             X29, X30, [SP,#var_20]!
.text:00000000016DF018                 ADRP            X1, #off_78E47B0@PAGE
.text:00000000016DF01C                 ADRP            X0, #nullsub_8@PAGE
.text:00000000016DF020                 MOV             X29, SP
.text:00000000016DF024                 STR             X19, [SP,#0x20+var_10]
.text:00000000016DF028                 ADRP            X19, #__data_start@PAGE
...
.text:00000000016DFFE0                 LDR             W9, [X9]
.text:00000000016DFFE4                 LDR             W3, [X10]
.text:00000000016DFFE8                 MOV             X0, X2
.text:00000000016DFFEC                 MOV             X4, X8
.text:00000000016DFFF0                 MOV             W2, W9
.text:00000000016DFFF4                 BR              X5
.text:00000000016DFFF4 ; } // starts at 16DFFD0
.text:00000000016DFFF4 ; ---------------------------------------------------------------------------
.text:00000000016DFFF8 ; __unwind {
.text:00000000016DFFF8                 DCQ 0xAA0103E8A940A869, 0x5AE3AFC75AE3A3C2, 0x1AE3A2E01AE3A28A
.text:00000000016DFFF8                 DCQ 0x9A1A04309A3A045, 0x9ABA04689AAA041, 0x5BBDAC5075BCA363
.text:00000000016DFFF8                 DCQ 0x32A3E05E0AA2D85E, 0x9A2A0500AE387CB, 0x1AE3A2A209B0A040
.text:00000000016DFFF8                 DCQ 0x9A3A04A1AE3A28B, 0x89ABA04109A1A043, 0x1BBC6003759CA283

This pattern repeats throughout the code section. IDA uses a hybrid algorithm of linear sweep and recursive descent to perform its analysis, so it won’t find every single valid instruction without a little help. The first instruction above is B .__cta_atexit – an unconditional jump in ARM code. It seems unlikely this would be preceded by garbage so we try to disassemble a few prior instructions:

.text:00000000016DEFF4                 DCB 0x22 ; "
.text:00000000016DEFF5                 DCB 0x41 ; A
.text:00000000016DEFF6                 DCB 0x93
.text:00000000016DEFF7                 DCB 0x32 ; 2
.text:00000000016DEFF8                 DCB 0x23 ; #
.text:00000000016DEFF9                 DCB 0xBE
.text:00000000016DEFFA                 DCB 0xA3
.text:00000000016DEFFB                 DCB 0x13
.text:00000000016DEFFC ; ---------------------------------------------------------------------------
.text:00000000016DEFFC                 BIC             W16, W2, W2,ROR#60
.text:00000000016DF000                 ADRP            X2, #__data_start@PAGE
.text:00000000016DF004                 ADD             X0, X0, #0x340
.text:00000000016DF008                 LDP             X29, X30, [SP],#0x20
.text:00000000016DF00C                 ADD             X2, X2, #__data_start@PAGEOFF
.text:00000000016DF010                 B               .__cxa_atexit

BIC is not widely used by compilers, but the other instructions look normal, so it seems we have an unencrypted block starting at 0x16DF000. After some skipping around in the code and seeing what disassembles and what doesn’t, we determine that there are alternating blocks of 0x1000 bytes that are regular code, and presumably encrypted. The first block in the section is encrypted and longer than 0x1000 bytes, since the encryption seems to operate on virtual address boundaries of 0x1000 and the section does not begin precisely on such a boundary.

We turn our attention to other sections in the file, in particular .rodata – the read-only data section – which by pure nature of how computer programs and data are represented, we would expect to contain quite a lot of zeroes (this is once place where intuition and experience comes in handy). What do we find?

00000000057D7D30  A3 A3 A3 A3 A2 A3 A3 A3  A1 A3 A3 A3 A0 A3 A3 A3
00000000057D7D40  A7 A3 A3 A3 A6 A3 A3 A3  A5 A3 A3 A3 A4 A3 A3 A3
00000000057D7D50  AB A3 A3 A3 AA A3 A3 A3  A9 A3 A3 A3 A8 A3 A3 A3
00000000057D7D60  AF A3 A3 A3 AE A3 A3 A3  AD A3 A3 A3 AC A3 A3 A3
00000000057D7D70  B3 A3 A3 A3 B2 A3 A3 A3  B1 A3 A3 A3 B0 A3 A3 A3
00000000057D7D80  B7 A3 A3 A3 B6 A3 A3 A3  B5 A3 A3 A3 B4 A3 A3 A3
00000000057D7D90  BB A3 A3 A3 BA A3 A3 A3  B9 A3 A3 A3 B8 A3 A3 A3
00000000057D7DA0  BF A3 A3 A3 BE A3 A3 A3  BD A3 A3 A3 BC A3 A3 A3
00000000057D7DB0  83 A3 A3 A3 82 A3 A3 A3  81 A3 A3 A3 80 A3 A3 A3
00000000057D7DC0  87 A3 A3 A3 86 A3 A3 A3  85 A3 A3 A3 84 A3 A3 A3
00000000057D7DD0  8B A3 A3 A3 8A A3 A3 A3  89 A3 A3 A3 88 A3 A3 A3
00000000057D7DE0  8F A3 A3 A3 8E A3 A3 A3  8D A3 A3 A3 8C A3 A3 A3
00000000057D7DF0  93 A3 A3 A3 92 A3 A3 A3  91 A3 A3 A3 90 A3 A3 A3
00000000057D7E00  97 A3 A3 A3 96 A3 A3 A3  95 A3 A3 A3 94 A3 A3 A3
00000000057D7E10  9B A3 A3 A3 9A A3 A3 A3  99 A3 A3 A3 98 A3 A3 A3

That’s pretty weird. What data structure could consist of a lot of 0xA3s? Notice also that some of the bytes have similar values like 0xA0, 0xA2 and 0xA4, and this provides our first clue that we may be looking at single-byte XOR encryption. Let’s scroll down some more, specifically to a 0x1000 byte boundary:

00000000057D8FB0  03 A7 A3 A3 02 A7 A3 A3  01 A7 A3 A3 00 A7 A3 A3 
00000000057D8FC0  07 A7 A3 A3 06 A7 A3 A3  05 A7 A3 A3 04 A7 A3 A3 
00000000057D8FD0  0B A7 A3 A3 0A A7 A3 A3  09 A7 A3 A3 08 A7 A3 A3 
00000000057D8FE0  0F A7 A3 A3 0E A7 A3 A3  0D A7 A3 A3 0C A7 A3 A3 
00000000057D8FF0  13 A7 A3 A3 12 A7 A3 A3  11 A7 A3 A3 10 A7 A3 A3 
00000000057D9000  B4 04 00 00 B5 04 00 00  B6 04 00 00 B7 04 00 00 
00000000057D9010  B8 04 00 00 B9 04 00 00  BA 04 00 00 BB 04 00 00 
00000000057D9020  BC 04 00 00 BD 04 00 00  BE 04 00 00 BF 04 00 00 
00000000057D9030  C0 04 00 00 C1 04 00 00  C2 04 00 00 C3 04 00 00 
00000000057D9040  C4 04 00 00 C5 04 00 00  C6 04 00 00 C7 04 00 00

As if by magic, the data starting at 0x57D9000 looks a lot more like normal data, and as we scroll down we find a similar repeating pattern of alternating blocks every 0x1000 bytes, just as in the code section.

Time to cook up a script. This encryption is trivially defeated with the following Python script which we run in IDA:

import idc
import idautils
import ida_bytes

xorKey = 0xA3A3A3A3A3A3A3A3
stripeSize = 0x1000

segments = {idc.SegName(x): (idc.SegStart(x), idc.SegEnd(x)) for x in idautils.Segments()}

def xorRange(start, stop, stripeSize):
	firstBlockLength = stripeSize
	if (start % stripeSize != 0):
		firstBlockLength += stripeSize - (start % stripeSize)

	for addr in range(start, start + firstBlockLength, 8):
		ida_bytes.put_qword(addr, get_qword(addr) ^ xorKey)

	for addr in range(start + firstBlockLength + stripeSize, stop, stripeSize * 2):
		for innerAddr in range(addr, addr + stripeSize, 8):
			ida_bytes.put_qword(innerAddr, get_qword(innerAddr) ^ xorKey)

print("Decrypting .text")
start, stop = segments[".text"]
xorRange(start, stop, stripeSize)

print("Decrypting .rodata")
start, stop = segments[".rodata"]
xorRange(start, stop, stripeSize)

Lines 5 and 6 set our desired XOR key and stripe size (block size). Even though it’s a single-byte key, we repeat it 8 times and re-write the data in blocks of 8 bytes using qwords, because both IDA and Python are staggeringly slow so this increases the execution speed by a factor of 8.

Line 8 fetches all of the segments in the program.

Our decryption code is defined in the xorRange function. Because the first block in each section is larger than the stripe size, we calculate the size of this (lines 11-13) and decrypt it separately (lines 15-16). The remaining code in lines 18-20 moves through the desired range, first decrypting a block, then skipping a block via the stepping factor stripeSize * 2.

Finally, lines 22-28 select the sections we want to decrypt and call the decryption function.

The resulting code and data looks completely normal and matches in form with the non-encrypted blocks. Excellent! Let’s run Il2CppInspector again with the decrypted file and see what happens.

Info (Warning: highly technical!): Doing this kind of work by eye is fairly trivial but codifying it into an automated tool that can work with arbitrary files is surprisingly challenging. We can’t assume the data section is mostly zero, but we can assume that the code section contains mostly valid instructions. Additionally, the instructions used will generally be a small subset of the total number of instructions supported by the processor, as architectures like ARM and x86 have many instructions only used in niche cases or in kernel mode (ring 0 privileged mode).

Il2CppInspector generates an RLE sequence of 0 and 1-bits from the code section indicating whether or not each address contains a commonly used instruction, cross-referenced from the ARM Architecture Reference Manual ARMv7-A and ARM Architecture Reference Manual ARMv8-A. The sequence generation has a tolerance to allow small groups of other instructions, then generates a frequency distribution of valid instruction counts, buckets them into a histogram and looks for substantial changes between buckets to determine the stripe size. The details of this are beyond the scope of this article, but you can view the source code for this solution here.

Not so fast

No matches in symbol table
Required structures acquired from code heuristics. Initialization function: 0x00000000016DEE4C
CodeRegistration struct found at 0x000000000727B688 (file offset 0x0726B688)
MetadataRegistration struct found at 0x000000000727B6F8 (file offset 0x0726B6F8)

System.ArgumentException : An item with the same key has already been added. Key: 18446744069414584320

The data structures are now found as we wanted! But there’s still a problem, and as we’re about to find out, this one’s a doozy.

These kinds of errors are often caused by bugs in the tool or by a new version of IL2CPP that has changed some data structures, but that’s not the case here as we’re using a known version of IL2CPP. To explore this issue further, we use the debugger in Visual Studio to find out what’s going on:

The exception occurs in the highlighted line. As you can see from the comments, typeRefPointers (which is referenced in the line that throws an exception) is meant to be an array of pointers. Looking at the start of the array, we can see this is definitely not an array of pointers. We’re loading the wrong data. What is going on here? Let’s return to IDA and look at the corresponding structure, at the address reported by Il2CppInspector. We give it a name and redefine the data to match the expected structure definition:

    public class Il2CppMetadataRegistration
    {
        public long genericClassesCount;
        public ulong genericClasses;
        public long genericInstsCount;
        public ulong genericInsts;
        public long genericMethodTableCount;
        public ulong genericMethodTable;
        public long typesCount;
        public ulong ptypes;
        public long methodSpecsCount;
        public ulong methodSpecs;
        public long fieldOffsetsCount;
        public ulong pfieldOffsets;
        public long typeDefinitionsSizesCount;
        public ulong typeDefinitionsSizes;
        public ulong metadataUsagesCount;
        public ulong metadataUsages;
    }

IDA’s disassembly now looks like this:

.data.rel.ro:000000000727B6F8 metadataRegistration DCQ 0xED55
.data.rel.ro:000000000727B700                 DCQ unk_70697E0
.data.rel.ro:000000000727B708                 DCQ 0x5BEA
.data.rel.ro:000000000727B710                 DCQ unk_76627F8
.data.rel.ro:000000000727B718                 DCQ 0x5BEA
.data.rel.ro:000000000727B720                 DCQ unk_7690748
.data.rel.ro:000000000727B728                 DCQ 0x11146
.data.rel.ro:000000000727B730                 DCQ unk_581322C
.data.rel.ro:000000000727B738                 DCQ 0x21509
.data.rel.ro:000000000727B740                 DCQ unk_739B748
.data.rel.ro:000000000727B748                 DCQ 0x10714
.data.rel.ro:000000000727B750                 DCQ unk_58E0174
.data.rel.ro:000000000727B758                 DCQ 0x2E31
.data.rel.ro:000000000727B760                 DCQ unk_725A3C8
.data.rel.ro:000000000727B768                 DCQ 0x106B3
.data.rel.ro:000000000727B770                 DCQ unk_71A8B20

Il2CppMetadataRegistration consists of an alternating set of list counts and pointers to said lists. The disassembly certainly seems to resemble a set of counts and pointers, but when we navigate to the pointer causing problems (ptypes – the 4th pointer, which is meant to be a list of pointers to Il2CppType structs), we find something else:

.rodata:000000000581322C qword_581322C   DCQ 0
.rodata:0000000005813234                 DCQ 0x1FFFFFFFF
.rodata:000000000581323C                 DCQ 0xFFFFFFFF00000000
.rodata:0000000005813244                 DCQ 2
.rodata:000000000581324C                 DCQ 0x3FFFFFFFF
.rodata:0000000005813254                 DCQ 0xFFFFFFFF00000000
.rodata:000000000581325C                 DCQ 0xFFFFFFFF00000381

Well that’s not great. It doesn’t look encrypted – it looks like data that should probably be 32-bit DWORDs – so we return to the definition of Il2CppMetadataRegistration and start clicking through every pointer to see if we recognize anything.

The definition of Il2CppType is:

    public class Il2CppType
    {
        public ulong datapoint;
        public ulong bits;
    }

datapoint is either a virtual address pointer or a metadata token, and bits is a collection of flags which mostly has all zeroes for the bottom 16 bits. By clicking through each list pointer in Il2CppMetadataRegistration and then clicking through the first few pointers in each list, we eventually come across this list:

.data.rel.ro:000000000739B748 off_739B748     DCQ off_727B778
.data.rel.ro:000000000739B750                 DCQ qword_5BABA68
.data.rel.ro:000000000739B758                 DCQ off_727B788
.data.rel.ro:000000000739B760                 DCQ off_727B798
.data.rel.ro:000000000739B768                 DCD off_727B7B8

The contents of the addresses pointed to looks like this:

.data.rel.ro:000000000727B778 off_727B778     DCQ unk_76D70B8
.data.rel.ro:000000000727B780                 DCD 0x150000

.rodata:0000000005BABA68 qword_5BABA68   DCQ 0x192
.rodata:0000000005BABA70                 DCQ 0x1C0000

.data.rel.ro:000000000727B788 off_727B788     DCQ unk_76D70D8
.data.rel.ro:000000000727B790                 DCQ 0x150000

.data.rel.ro:000000000727B798 off_727B798     DCQ unk_76D70F8
.data.rel.ro:000000000727B7A0                 DCQ 0x150000

.data.rel.ro:000000000727B7B8 off_727B7B8     DCQ unk_76D7118
.data.rel.ro:000000000727B7C0                 DCQ 0x150000

That sure looks like a list of Il2CppTypes to me. Obviously, the issue here for the casual analyst is you need a somewhat detailed knowledge of IL2CPP data structures to determine this, and it’s basically impossible if you’re not familiar with it. I was only able to discern this due to the fact I have spent way too many hours of my life poring over IL2CPP application disassemblies for no good reason.

So what is going on here? After some more investigating, we reach the conclusion that the field order of Il2CppMetadataRegistration (and also Il2CppCodeRegistration) has been scrambled. In this case, the order has been straight up reversed.

I love this obfuscation, because it has no effect on someone reading the disassembly code, but it will defeat automated IL2CPP tools. It demonstrates that the obfuscation authors have an awareness of how IL2CPP apps are reverse engineered and have created a simple yet effective scheme specifically designed to target the automated tools. I take this as a badge of honour – thank you Riot.

Alright, so after a couple of hours of messing around we rearrange the fields of both structures into the correct order and fire up Il2CppInspector once more.

Info: As with the XOR decryption, codifying a generalized solution to reordered fields is difficult. Il2CppInspector attempts to determine the field order for arbitrary files by walking every single pointer, attempting to cast what it finds to every single possible type we might expect to find, then checking the values in each type’s fields for validity by cross-referencing it with other metadata. This is substantially tricky; you can view the source code here.

Oh give me a break

System.Collections.Generic.KeyNotFoundException : The given key '66' was not present in the dictionary.

Full disclosure, the actual words I used were a little more industrial in nature, starting with “for” and ending in “sake”. Yes, the life of an analyst is rarely a simple one.

Delving once again into the debugger, we see that the analysis has progressed much further this time. In fact, it would appear on the surface that the load process actually completed successfully, but we’re having a problem generating the type model that Il2CppInspector uses as the basis for all of its outputs, as the cunningly-organized elements in this screenshot demonstrate:

Package.Strings is supposed to contain all of the strings stored in IL2CPP’s metadata file global-metadata.dat which is shipped with every application. These strings are indexed by their offset from the start of the string table, and Il2CppInspector reads them sequentially as a series of null-terminated strings, storing the current offset and string value for each one in a dictionary. Here the Assembly class is trying to find its assembly name at offset 0x42, but no such index exists in the string table. It is also notable that the error occurs here because assembly generation occurs very early on in the type model creation workflow. This is an indication that potentially most or all of the strings are incorrect.

The contents of the first few items in the string table do not look promising. It’s a bunch of garbled junk. Encrypted strings perhaps? One indication of this is that the string indices are quite far apart compared to what you would expect for the length of the average text string. This means that the null terminators are not present and Il2CppInspector has read too many characters.

The first assembly in an IL2CPP application is usually mscorlib.dll, and we are expecting to find this at offset 0x42 in the string table. The Il2CppGlobalMetadataHeader structure – which is stored at the very start of global-metadata.dat – contains the offset to the start of the string table, and we look at it in the debugger for simplicity:

Time to fire up a hex editor and see what we find at offset 0xefdcc + 0x42 = 0xefe0e:

mscorlib.dll is 12 characters + 1 null terminator for a total of 13 bytes, which I’ve highlighted above. Notably, the character l which is 0x6C in ASCII appears three times in this string. What are the corresponding bytes in the metadata? 0x48, 0x48 and… 0x48. Another single-byte encryption? The null terminator is replaced with 0x24. On a hunch – and given what we saw in the application binary – perhaps we have another single-byte XOR on our hands? When you XOR a number with zero, you get the same number, ie. x XOR 0 == x. So what is 0x24 XOR 0x48? No prizes for guessing the answer: 0x6C. This looks sus, let’s vote it out. Sorry.. wrong game.

So the XOR key is 0x24. What happens when we XOR the entire string table with 0x24? Besides the mscorlib.dll string we get garbage, so we haven’t found the full solution yet. We need to look at more strings to find out what is going on. Conveniently, the IL2CPP metadata provides us with all of the string indexes used by the application, so we can easily discover the start and end offsets of each string. Alternatively – since it’s a bit of a hassle to grep through every single structure and find adjacent strings – we can take a bit of a shortcut. We know that all of the assembly names end with .dll, so we can just get the assembly name indices and see what we find. If they also use XOR encryption, we might expect the last two bytes of each one before the null terminator to be the same.

Visual Studio’s debugger has a handy feature which lets you pin a property of a class as a favourite, then when the types are rolled up into a list, you can see just the favourite property. By setting the string offset property as the favourite, we can quickly get all of the assembly name string offsets from our list of Il2CppImageDefinition:

Let’s walk through a couple and see what we get. We navigate to 0xefdcc + 0x402ae = 0x13007a:

Here I looked at the bytes by eye and highlighted them until I found two that were the same – 0xA4 in this case. If the null terminator is the next character and the string is single-byte XORed, the XOR key will be 0xC8. I bet you will never be able to guess what 0xC8 XOR 0xA4 is. Okay… you got it, it’s 0x6C – the ASCII code for l – as in .dll.

So it seems we have our answer. Each string has its own XOR key, and we can get the key from the null terminator. To know which byte is the null terminator, we need to know every string index used by the application. Here then is the strategy:

Iterate over every assembly, type, method, field, event, property and so on to find the element name’s string index.
Sort the indices into order
Subtract each index from the previous index to get each string length
For each string, read the string length’s number of bytes
Store the final (null terminator) byte as the XOR key for the string
XOR every character in the string with the XOR key
Rewrite the string to the metadata

The code to do this is a bit convoluted but you can check it out here if you’re interested.

Once we’ve applied this strategy to global-metadata.dat, we load it back up into a hex editor and navigate to the start of the string table to admire the fruits of our labour:

Running Il2CppInspector yet again, the app now loads and we see all of the namespace strings nicely in the GUI:

So, we’re done right?

Well, maybe. Before we go pretending to our friends that we’re l33t h4x0rs now because we were able to defeat a one-byte XOR encryption, we should probably check the output.

Everything looks pretty good except for one small remaining problem. Only four IL2CPP API exports are found:

DO_API(void, il2cpp_shutdown, ());
DO_API(const Il2CppAssembly*, il2cpp_domain_assembly_open, (Il2CppDomain * domain, const char* name));
DO_API(void, il2cpp_gchandle_free, (uint32_t gchandle));
DO_API(void, il2cpp_monitor_exit, (Il2CppObject * obj));

It’s often the case that API exports are stripped from applications, but we take a quick skim over all the declared exports in the file to be sure:

The list of exports starting with nq2huu_ extends far beyond the screenshot, and seeing this should cause you to raise an eyebrow. The layout of these export names looks suspiciously like the il2cpp_ exports, with the ‘2’ and the underscores (‘_’) present, and repeating low-entropy strings like fwwfd. In addition, there are many of them, and the IL2CPP API has many functions. Knowing the standard IL2CPP export names helps here of course, but they are easy to find in the IL2CPP source code (libil2cpp/il2cpp-api-functions.h). They have names like il2cpp_method_get_return_type and il2cpp_gchandle_new.

The main point I’m making here is that experience in pattern recognition by eye is a great skill to have in your toolbox. On that theme, the trained eye will be able to spot fairly quickly that these exports have been encrypted with a basic ROT cipher. Specifically, the characters have been rotated forwards by 5 places, so that a becomes f, b becomes g and so on. The characters wrap at the end, so v becomes a and z becomes e.

If we take the first encrypted export – nq2huu_afqzj_gtc and rotate each character backwards by 5 places, we get il2cpp_value_box. That looks a lot like an API name!

Taking a final deep breath, we decrypt all of the export symbols and run the binary through Il2CppInspector again. Joy, all of the APIs are discovered!

DO_API(void, il2cpp_init, (const char* domain_name));
DO_API(void, il2cpp_init_utf16, (const Il2CppChar * domain_name));
DO_API(void, il2cpp_shutdown, ());
DO_API(void, il2cpp_set_config_dir, (const char *config_path));
DO_API(void, il2cpp_set_data_dir, (const char *data_path));
DO_API(void, il2cpp_set_temp_dir, (const char *temp_path));
// ...

For the generalized solution that works with any ROT key or combination of keys, you can view the source code here.

Now we can go pretend to our friends that we are l33t h4x0rs!

Post-analysis postmortem

Speaking holistically, what have we learned from this exercise?

From the analyst’s point of view, the lesson here is that just because you might imagine something is highly protected doesn’t mean that it is. However, intuition and lateral thinking are key. All of the techniques used to obfuscate this particular piece of code are simple to reverse engineer, if you know where to look and what to look for. If you don’t, you’re stuck. The only way you can gain this intuition is by studying, researching and practising to gain first-hand experience. As you look at more and more code, you will see the same patterns and techniques used over and over again, and you’ll come to recognize common strategies as if it’s second nature.

Sometimes there is no call for strong encryption. Developers love to encrypt strings in their code, and with good reason: you can learn a lot about an application’s behaviour by looking at its strings. When we perform malware analysis, it’s one of the first things we look at. The error messages generated by malware can often indicate whether it attempts to download malicious payloads, send emails, encrypt files, phone home and more. By searching for the locations where strings are used in disassembly, you can work back and often quickly determine the purpose of a particular function. Crucially though, there is generally no need to apply strong encryption to strings, because you can either see them or you can’t. A resourceful analyst will decrypt the strings either way, whereas a casual hacker will be defeated whether the encryption is strong or weak.

Not all obfuscation is aimed at human beings. Reordering the struct fields described above is a precision, narrowly-targeted countermeasure against automated tools. Once again, well-equipped analysts won’t have a problem with this, but it will cut out the majority of casual hackers, at least for a time. This particular obfuscation is actually genius in its simplicity: it takes 30 seconds of copy pasting to implement, and not only does it break the tooling, but creating a generalized automated solution is significantly time-consuming. The products using this particular obfuscation have enjoyed immunity from automated tools for many months, for the singular reason that the only two people who maintain these tools couldn’t be bothered to do anything about it (which by the way, is a really good reason why it’s good to know how to do these things yourself – we won’t be here forever!).

What of the XOR-encrypted binary? What is the point of applying such weak encryption? The developers of this strategy are not stupid, they know it can be reversed easily. One thing to consider is the constraints of the hardware on which the app runs, namely mobile devices. Sure, we can slam down some customized AES encryption, virtualize the code and fetch the key from the network over a secure connection (although it will still wind up in the device), but cellphones don’t have unlimited processing power and the startup time of such an application would be horrendous. If you virtualize the code, runtime performance will be severely impacted as well as creating massive battery drain. What we need is an encryption scheme that is sufficient to deter would-be hackers, but lightweight enough to incur a minimal performance penalty.

I would argue that they can do much better than single-byte XOR’ed blocks, but there is another possibility: I have seen the triple whammy of XOR’ed blocks, XOR-encrypted metadata strings and ROT 5’d API exports in a significant range of titles from different publishers; it’s possible that a company has advertised a security product that can “defeat automated IL2CPP reverse engineering tools” and the buyers simply lack sufficient expertise in IL2CPP to perform due diligence on the product. But hey, if the product is cheap and it works for a couple of months, why not?

The authors of this protection will undoubtedly read this article with interest when it comes to their attention, and that’s a good thing: in order to defeat a hacker, you need to think like a hacker. I’m of the opinion that users should be able to do what they want with the software on their storage devices as long as it doesn’t negatively impact others. You can mod Wild Rift to add a colorblind mode, or you can mod it to grief other players. Tools can be used for good or evil, and game protection is generally an exercise in futility that serves only to inconvenience the end user. That being said, in the dangerous modern online world full of security threats and poorly designed security products, there is a strong argument to be made for teaching vanilla software developers to think more like hackers. By highlighting flaws and exploit techniques they may not have thought of, they can design products with security in mind in a more savvy manner, and hopefully – one day – help protect us all, rather than an inconsequential video game.

Categories: IL2CPP, Reverse Engineering Tags: 0x340, IL2CPP, nullsub_8, off_78E47B0, __data_start

Comments (3) Trackbacks (5) Leave a comment Trackback

lht

August 1, 2021 at 13:37

Reply

Thanks for the amazing write up.

Riot Game updated their app to version v2.4.0.4727 and your approach had been patched. Can you make a follow up blog on the new version ? New APK can be downloaded from here: https://apkpure.com/league-of-legends-wild-rift/com.riotgames.league.wildrift/variant/2.4.0.4727-XAPK
firefly

May 8, 2021 at 22:46

Reply

it’s possible that a company has advertised a security product that can “defeat automated IL2CPP reverse engineering tools” and the buyers simply lack sufficient expertise in IL2CPP to perform due diligence on the product.

I believe you are talking about Tencent’s TersafeSDK

https://intl.cloud.tencent.com/product/mtp
boysshitinlife

February 3, 2021 at 19:39

Reply

Lol this article its so useful and interesting. Could you be able to help me with running the obb patch on a test game using emulator? I am a bit lost.