Home > IL2CPP, Reverse Engineering > Reverse Engineering Adventures: VMProtect Control Flow Obfuscation (Case study: string algorithm cryptanalysis in Honkai Impact 3rd)

Reverse Engineering Adventures: VMProtect Control Flow Obfuscation (Case study: string algorithm cryptanalysis in Honkai Impact 3rd)

January 23, 2021 Leave a comment Go to comments

Recently, while reverse engineering Honkai Impact 3rd I came across some string decryption code that was obfuscated by VMProtect‘s implementation of control flow flattening. It takes an encrypted table of string data plus an index as an input, retrieves the encrypted string at the specified index in the table, decrypts it and returns it. IDA decompiles this function to almost 2000 lines of spaghetti code. In this article, I’ll show you how – in one caffeine-fueled all-nighter – I reverse engineered this function, and present a reimplementation of the algorithm in C#. Afterwards, we’ll briefly examine the algorithm’s cryptographic properties, showing alternate starting points for cryptanalysis without reverse engineering any code.

Code obfuscation

The purpose of code obfuscation is to make it more difficult to determine the behaviour of a function via static analysis in a disassembler or decompiler. There are many forms of obfuscation, but some of the most common – and ones we will encounter today – include:

  • Opaque predicates – these are conditions whose results are known at compile-time and never change; for example 1 == 1 will always evaluate to true. The conditions are used to determine which code paths to take and may be difficult to evaluate statically

  • Dead code insertion – code that is never executed is inserted into the compiled code to confuse the analyst

  • Do-nothing code – code that executes but has no semantic effect is inserted into the compiled code, again to confuse the analyst

  • Control flow obfuscation – spaghettification of the execution order of instructions to reduce the amount of sequential code blocks and make tracing the flow of execution in a function harder

Such code can be introduced by the compiler, operating at the source code level, or by a post-processing tool which introduces obfuscation at the assembly instruction level.

Control flow flattening

Control flow flattening is a specialization of control flow obfuscation implemented by obfuscators such as ConfuserEx, Obfuscator-LLVM and VMProtect, among others.

The key idea is that all of the basic blocks in a function are merged into a single, top-level control block such as a switch statement, and a state variable is used to track which block will execute next.

For example, consider this trivial C function which prints Hello World! 10 times:

int i;
for (i = 0; i < 10; i++)
  printf("Hello World!\n");

There are four basic blocks in this function, executed in this order:

  1. The loop initialization (i = 0)
  2. The loop conditional (i < 10)
  3. The loop body (prints Hello World!)
  4. The loop update (i++)

After block 4 executes, control is transferred to block 2 and the cycle repeats until the conditional evaluates to false.

We can flatten this code by introducing a state variable and a switch statement as follows:

int i;
int state = 123;

while (true) {
  switch (state) {
    case 123:
      i = 0;
      state = 456;
      break;

    case 456:
      if (i >= 10)
        return;
      state = 789;
      break;

    case 789:
      printf("Hello World!\n");
      state = 1337;
      break;

    case 1337:
      i++;
      state = 456;
      break;
  }
}

The control flow graph (CFG) of the original function looks like this:

However, after flattening, the CFG looks like this:

Obviously this is a simple example, but if you add in some more code you can imagine how this can get quickly out of hand. In the real world, the blocks are not in a nice sequential order in the switch block, and the state variables are long random values. More advanced obfuscators will also insert dead code for states which are never executed – often very similar to real code with minor differences – as well as states which execute code that does nothing, states that simply switch to another state and so on.

What IF….

A more advanced variation on this obfuscation replaces the switch block with multiple layers of nested if statements, each of which may contain other groups of if statements. Consider the following reformulation of the for loop above:

int i;
int state = 123;

while (state > 100) {
	if (state > 430) {
		if (state > 750) {
			if (state > 1234) {
				i++;
				state = 456;
			}
			else if (state < 900) {
				printf("Hello World!\n");
				state = 1337;
			}
		}

		if (state < 700) {
			state = 90;
			if (i < 10)
				state = 789;
		}
	}
	if (state < 150) {
		i = 0;
		state = 456;
	}
}

Here we introduce some new elements:

  • nested if statements provide a form of binary search for the next basic block to execute and further convolute the ability of the analyst to follow the hot path
  • using greater-than/less-than comparisons rather than equality comparisons makes the code harder to read
  • the loop condition changes the state variable, then changes it again if the loop should continue. This is a common strategy for flattening conditions whereby a different state is selected based on a true or false outcome

Control flow flattening in VMProtect

The purpose of control flow flattening in VMProtect is to make software analysts vomit on their Herman Millers, a task at which it succeeds admirably. The future of expensive chairs is in peril.

Our target function today makes glorious use of all of the techniques described above. Here are some snippets (I’ve already renamed the state variable – this is the easy part since it’s the subject of all the if statements):

             // ...
             else if ( state == 126285762 )
              {
                state = 1700651401;
              }
              else if ( state == 137398678 )
              {
                v182 = a6;
                v195 = v208;
                state = -1626753307;
                if ( !a6 )
                  state = 2123597406;
              }
              else
              {
                sub_7FFF4E517D70(v185);
                state = -347256218;
              }
            }
            else if ( state <= -1383677363 )
            {
              if ( state > -1773960176 )
              {
                if ( state > -1626753308 )
                {
                  if ( state == -1626753307 )
                  {
                    LODWORD(v169) = v195;
                    state = 1711227048;
                    // ...

There are also some splendid for loops such as:

                  for ( j = 1811064616; ; j = v26 )
                  {
                    while ( j > 1226835988 )
                    {
                      if ( j == 1226835989 )
                      {
                        j = 291744032;
                      }
                      else
                      {
                        j = 1226835989;
                        if ( v228 )
                          j = 291744032;
                        if ( !v229 )
                          j = 1226835989;
                        if ( v228 != v229 )
                          j = 291744032;
                      }
                    }
                    if ( j != 291744032 )
                      break;
                    v233 = v25;
                  }
What the hell is even that - 9GAG

Of course, when people see this kind of code, they suddenly don’t want to reverse engineer the product anymore, which is exactly what the authors are hoping for. It’s not that it can’t be reverse engineered, it’s just that the effort and resources required to do so is so great that it outweighs the benefit of actually succeeding.

Statisically, Ted Cruz is more likely to develop a sense of morality than our chances of reverse engineering this function by decompilation alone (just kidding, I checked: the Universe is infinite and he’s a douchebag in all of them). How are we going to tackle this?

Break it down

Control flow flattening occurs at the basic block level, so we need to think at the basic block level. If you don’t know what the target function is meant to be doing – at least roughly – you’re going to have a bad time no matter what. Oftentimes, you will at least have some idea what the function is meant to do, especially since you will tend to avoid deciphering code like this unless you have narrowed it down as absolutely crucial to understand.

As stated at the start of the article, we will reverse engineer a string decryption function which takes a table of encrypted strings and the index of the desired string to retrieve. Unless the design is very special, we can use our knowledge of common programming paradigms to make some assumptions about how such a function should work at the basic block level:

  1. Convert the string index into the memory address of the string via table lookup
  2. Acquire the length of the string
  3. Initialize a loop counter x representing our position within the string
  4. Decrypt one or more characters at string position x
  5. Increment x by the number of decrypted characters
  6. If x is less than the length of the string, go to step 4

This is very approximate. Not everything may be executed in this exact order, and there may be other steps before, during and after the loop. Depending on the encryption algorithm, there may also be a decryption key acquisition step. The string may be decrypted in-place, or copied to a new location in memory. The string may be decrypted one character at a time or in blocks of several characters. The string may be decrypted forwards or backwards. There is a lot of room for variation, but we can at least have a rough idea what we are looking for.

Speaking about this specific case study – Honkai Impact – we determined from earlier analysis how to get the encrypted string’s address and length (and therefore the encrypted string itself) so we really only care about steps 3-6.

Here is the control flow graph of the decryption function we shall examine:

Info: For those who would like to follow along with the actual code, it can be found at file offset 0x2C5A0 in the UnityPlayer.dll supplied with Honkai Impact 4.3 (PC version), or offset 0x2CFA0 from the DLL’s image base virtual address. We call this function GetStringLiteralFromIndex.

The top two levels mainly consist of binary search-style if statements, with three layers of flattened basic blocks underneath.

Can’t touch this

Our first step is to spin up the excellent open-source debugger x64dbg and set a breakpoint at the start of the function, but immediately we run into a problem. The game is littered with anti-debug code and just crashes when we try to launch it with the debugger.

Luckily, the DLL with the actual function of interest is not protected, so instead we create a test harness that just lets us call GetStringLiteralFromIndex repeatedly with a chosen index value as follows (C#):

class Program
{
    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    private static extern IntPtr LoadLibrary(string path);

    [DllImport("kernel32.dll")]
    private static extern bool FreeLibrary(IntPtr hModule);

    [UnmanagedFunctionPointer(CallingConvention.Cdecl)]
    private delegate IntPtr DecryptMetadata(byte[] bytes, int length);

    [UnmanagedFunctionPointer(CallingConvention.Cdecl)]
    private delegate IntPtr GetStringLiteralFromIndex(byte[] bytes, uint index, ref int length);

    static void Main(string[] args) {
        var metadata = File.ReadAllBytes("global-metadata.dat"));

        var hModule = LoadLibrary("UnityPlayer.dll");
        var moduleBase = Process.GetCurrentProcess().Modules.Cast<ProcessModule>().First(m => m.ModuleName == "UnityPlayer.dll").BaseAddress;

        var pDecryptMetadata = (DecryptMetadata) Marshal.GetDelegateForFunctionPointer(moduleBase + 0x42110, typeof(DecryptMetadata));

        var decryptedBytesUnmanaged = pDecryptMetadata(metadata, metadata.Length);

        var decryptedMetadata = new byte[metadata.Length];
        Marshal.Copy(decryptedBytesUnmanaged, decryptedMetadata, 0, metadata.Length);

        var pGetStringLiteralFromIndex = (GetStringLiteralFromIndex)
            Marshal.GetDelegateForFunctionPointer(moduleBase + 0x2CFA0, typeof(GetStringLiteralFromIndex));

        while (true) {
            Console.Write("Enter string literal index (-1 to exit): ");
            var stringLiteralIndex = int.Parse(Console.ReadLine());

            if (stringLiteralIndex == -1)
                break;

            var length = 0;
            var decryptedStringBytes = pGetStringLiteralFromIndex(decryptedMetadata, (uint) stringLiteralIndex, ref length);
            var str = new byte[length];
            Marshal.Copy(decryptedStringBytes, str, 0, length);

            Console.WriteLine("String literal value: " + Encoding.UTF8.GetString(str));
        }

        FreeLibrary(hModule);
    }
}

As described in part 1 and part 3 of our investigation of Honkai Impact, the encrypted string literals and lookup table are stored in a file called global-metadata.dat, which itself has an additional layer of encryption. This layer must be decrypted before it can be passed to GetStringLiteralFromIndex. The details are not important for this exercise, but in any case we need to prepare the encrypted string blob for this particular game, which is the purpose of lines 21-26.

Line 18 loads the DLL of interest and line 19 locates it in memory. Lines 28-29 create a managed delegate to the function of interest so that we can call it from managed code.

The important part is the loop from line 31-44 which allows us to choose an index, calls the function and outputs the result. Example output:

Enter string literal index (-1 to exit): 0
String literal value: JNI: Unknown generic array type '
Enter string literal index (-1 to exit): 100
String literal value: Actor_ElementCriticalChanceDelta
Enter string literal index (-1 to exit): 1000
String literal value: NickName/Text
Enter string literal index (-1 to exit): 10000
String literal value: OWEndless_UIMgr_AvatarPromote

Great! Let’s get into the debugger.

Hammer time

We start by disabling every default breakpoint in the options dialog except the main entry breakpoint, which we keep enabled to allow us to break as soon as the application loads so that we can set more breakpoints:

We then launch our test exe from the debugger, which hits the entry breakpoint immediately. We can now navigate to the Breakpoints tab, right-click and select Add DLL breakpoint, then enter UnityPlayer.dll:

This will cause the debugger to break as soon as UnityPlayer.dll loads so that we can set a breakpoint within it. Hit F9 to continue execution until this occurs:

By right-clicking the address and selecting Follow in Memory map we can get the base address of the DLL:

We know that the offset to our function is 0x2CFA0, but we don’t need to do any math. Simply click the module name and hit Ctrl+G to bring up the expression editor. The module address will be pre-filled because we clicked on it, and we can just add “+2cfa0” to the end and hit Enter:

This lands us right at the start of GetStringLiteralFromIndex, and we tap F2 to set a breakpoint:

We can now press F9 twice to continue execution (the first tap returns the cursor to where the last breakpoint was triggered), and this new breakpoint will be hit every time we enter a new index and press Enter in our test app. We are ready to rock.

Taking out the trash

The first thing we want to do is identify which register contains the state variable. This is really easy because we’ll regularly find large blocks of code like this:

jmp     unityplayer-hi3-4.3.7FFF5478ECE0
cmp     ecx, 0x13958F2F
je      unityplayer-hi3-4.3.7FFF5479036F
cmp     ecx, 0x1B3652F5
jne     unityplayer-hi3-4.3.7FFF5478ECE0
mov     ecx, 0x45E25333
jmp     unityplayer-hi3-4.3.7FFF5478ECE0
cmp     ecx, 0xE22DEF78
je      unityplayer-hi3-4.3.7FFF5478F86A
cmp     ecx, 0xE4217CED
jne     unityplayer-hi3-4.3.7FFF5478ECE0
mov     ecx, 0x5C67F499
jmp     unityplayer-hi3-4.3.7FFF5478ECE0
cmp     ecx, 0x5C253F32
je      unityplayer-hi3-4.3.7FFF547903B7
cmp     ecx, 0x5C67F499
jne     unityplayer-hi3-4.3.7FFF5478ECE0

Clearly, ecx (the bottom 32 bits of rcx) is the state variable, so we’ll keep an eye on this. We now use a super low-tech attack: simply hold down F8 to single-step through the code one instruction at a time and watch RCX in the register pane for a repeating sequence of states, which indicates the execution of a flattened loop.

The start of the run will have a lot of cruft which you can ignore, and you may have to hold down F8 for a minute or more before you get to the meat. RCX will sometimes be overwritten by temporary calculation results but overall it will be easy to identify the state values as random-looking 32-bit values. Many of the states will be garbage and should be ignored. It’s a good idea to pick a string that has a decent length so we have several opportunities to spot the sequence. We don’t want a string with a length of 1 because then it won’t be clear where the loop starts or ends. Here I’ll just use index 0 which as we saw above is the 33-character string JNI: Unknown generic array type ' (including the apostrophe).

After holding down F8 for a while, we start to see a recurring sequence form in RCX:

1E114D24
1D1C2C24
7AD33C6C
EF1E9046

We will then continue to hold down F8 until we find which state is switched to when the loop completes. After sipping our coffee for a couple of minutes, we finally reach it:

8C4E5185

Excellent. Let us now tap F8 more slowly to find precisely where each state is set (when RCX is changed to one of the states shown above), and once that happens, tap F8 slowly again through all of the jmp sequences until we get to a basic block. Typically a basic block will start right after a jump, and end right before another jump or some more state variable comparisons. For example, for 1D1C2C24:

00007FFF5478F49A  | 0F8F 49040000    | jg      unityplayer-hi3-4.3.7FFF5478F8E9       |
00007FFF5478F4A0  | 81F9 6113741B    | cmp     ecx, 0x1B741361                        |
00007FFF5478F4A6  | 0F84 B00C0000    | je      unityplayer-hi3-4.3.7FFF5479015C       |
00007FFF5478F4AC  | 81F9 242C1C1D    | cmp     ecx, 0x1D1C2C24                        |
00007FFF5478F4B2  | 0F85 28F8FFFF    | jne     unityplayer-hi3-4.3.7FFF5478ECE0       |
00007FFF5478F4B8  | 8B05 964A3F01    | mov     eax, ds:[0x7FFF55B83F54]               |
00007FFF5478F4BE  | 8B15 8C4A3F01    | mov     edx, ds:[0x7FFF55B83F50]               |
00007FFF5478F4C4  | 8D48 FF          | lea     ecx, ds:[rax-0x1]                      |
00007FFF5478F4C7  | 0FAFC8           | imul    ecx, eax                               |
00007FFF5478F4CA  | F7D1             | not     ecx                                    |
00007FFF5478F4CC  | 83C9 FE          | or      ecx, 0xFFFFFFFE                        |
00007FFF5478F4CF  | 83F9 FF          | cmp     ecx, 0xFFFFFFFF                        |

We’re going to repeat this for each of the four main states we’ve identified as part of the loop and locate them in IDA’s decompiler. You can rebase UnityPlayer.dll in IDA and jump straight to the corresponding addresses to do this, however in a real world scenario it’s likely you will have to reload the application over and over again and the base address will change every time. Rebasing in IDA is very time-consuming because – and I don’t think I’ve mentioned this before in previous articles – IDA is about as fast and efficient as a one-legged hunting tortoise, so instead we’ll be a little bit sneaky and take a note of the first bytes of each state’s basic block. In the example above it would be 8B 05 96 4A 3F 01 8B 15 .... We navigate to the start of GetStringLiteralFromIndex in the disassembly view in IDA, press Alt+B to bring up the binary search dialog and enter the byte sequence:

Be sure to untick Search Up and Find all occurrences when you do this. When we press Enter, we come immediately to the corresponding code we’re paused at in the debugger:

Excellent! Rename this location to something you can find again later like STATE_1D1C2C24 and repeat this process for each of the states identified earlier (you can press Ctrl+P, Enter in IDA to return to the start of the function for each new search).

The idea behind this strategy is to use the debugger to find all the basic blocks that are actually executed so we don’t have to trace the if tree in the decompiler, then once we have them all, we can switch to the decompiler and start stitching them together.

To begin, open the disassembly and the decompiler in the same tab and choose Synchronize With… -> Pseudocode in the right-click context menu. Now when you scroll through the assembly, the decompiler will scroll to the corresponding lines of decompiled code. In the disassembly pane, press G and enter each of the location names defined earlier, eg. STATE_1D1C2C24. The decompiler will move to the first line of code for the block:

Press Ins (the Insert key) in the decompiler pane and add a comment to denote the start of the block (press Ctrl+Enter to commit). We’re going to be using freeform text search to find these later so make your life easier by giving them all a prefix you can search on. For example:

Repeat this for each of the identified states until all of the blocks have a comment at the start in the decompiler. Once you’ve done this, a handy optional tip is to click on the if or else keyword of non-labelled blocks and press - (minus) on the numeric keypad to fold up (hide) them. This will clean up the decompilation at least somewhat, although you have to be careful not to fold a parent if of a block we’re actually analyzing as it will also fold all the child blocks, so you won’t be able to get rid of everything that’s useless. It can be beneficial to press \ (backslash) to hide all of the casts too, since they also add significantly to the clutter.

Say my name

We will now start naming obvious variables, and also some that aren’t so obvious but that do have obvious starting values. We know from prior analysis what the arguments and return value of the function should be – the encrypted blob, the literal index and a pointer to receive the length as the arguments, and a pointer to the decrypted string as the return value. We name the arguments, scroll to the return statement at the end of the function which in this case says return var_240, and name that pDecryptedString. The start of this garbage looks like this:

const char *__usercall GetStringLiteralFromIndex@<rax>(int index@<edx>, void *pMetadata@<rcx>, _DWORD *pReturnLength@<r8>, __int64 *a4@<r13>, __int64 *a5@<r14>, const char *a6@<r15>)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  var_1A8 = *&index;
  v2 = pMetadata;
  v244 = &v187 ^ _security_cookie;
  dword_7FFF4F69502C = 1;
  dword_7FFF4F69500C = 1;
  LOBYTE(v233) = ((dword_7FFF4F6A3F94 * (dword_7FFF4F6A3F94 - 1)) & ((dword_7FFF4F6A3F94 * (dword_7FFF4F6A3F94 - 1)) ^ 0xFFFFFFFE)) == 0;
  v7 = 0x73FA68CC;
  if ( !((dword_7FFF4F6A3F94 * (dword_7FFF4F6A3F94 - 1)) & ((dword_7FFF4F6A3F94 * (dword_7FFF4F6A3F94 - 1)) ^ 0xFFFFFFFE)) )
    v7 = 0xBD2A2E8D;
  LOBYTE(v235) = dword_7FFF4F6A3F90 < 10;
  if ( dword_7FFF4F6A3F90 >= 10 )
    v7 = 0x73FA68CC;
  if ( (((dword_7FFF4F6A3F94 * (dword_7FFF4F6A3F94 - 1)) & ((dword_7FFF4F6A3F94 * (dword_7FFF4F6A3F94 - 1)) ^ 0xFFFFFFFE)) == 0) != dword_7FFF4F6A3F90 < 10 )
    v7 = 0xBD2A2E8D;
  for...
  var_238 = &pDecryptedString;
  dword_7FFF4F695030 = 1;
  v216 = ((dword_7FFF4F6A3F4C * (dword_7FFF4F6A3F4C - 1)) & ((dword_7FFF4F6A3F4C * (dword_7FFF4F6A3F4C - 1)) ^ 0xFFFFFFFE)) == 0;
  v215 = dword_7FFF4F6A3F48 < 10;
  var_220 = var_1A8 - 5 * ((var_1A8 / 5ui64) & 0xFFFFFFFFFFFFF800ui64);
  var_1F8 = 2 * var_1A8 + 1;
  v175 = pMetadata - 8;
  var_228 = var_1A8;
  var_210 = pMetadata + var_1A8;
  v10 = 0x7F5EEE6F;
  var_230 = pReturnLength;
  while ( 1 )
  {

There are some obvious candidates to pick off here: var_1A8 is the string index, v2 is a pointer to the encrypted blob, var_238 is a pointer to the pointer to what will be the decrypted string and so on. I like to name these with an l_ (local) prefix to denote that they are local variables that only exist in the scope of the function, as opposed to the addresses of global static variables.

We can also name some variables that we can understand the starting values of but don’t yet know what they’re used for – if anything. For example:

  var_220 = l_index - 5 * ((l_index / 5ui64) & 0xFFFFFFFFFFFFF800ui64);
  var_1F8 = 2 * l_index + 1;
  v175 = pMetadata - 8;
  var_228 = l_index;
  var_210 = pMetadata + l_index;
  v10 = 0x7F5EEE6F;
  l_pReturnLength = pReturnLength;

Here I will name var_1F8 as indexTimesTwoPlusOne. Whether this is of any use or not is still an open question, but we can see the starting value quite easily and given the huge number of locals, it doesn’t hurt to give it a name. Similarly we can name v175 as pMetadataMinus8 and var_210 as pMetadataPlusIndex. When a variable takes the same value as another, I tend to apply a suffix, so var_228 will be renamed l_index_1 here. Some or all of these may be junk variables, others may have a use. We may further rename them later as we discover their true purpose.

Quick maff

The first line is of particular note. This is a form of mathematical obfuscation which can be simplified. Let’s examine this equation more closely:

x - 5 * ((x / 5) & 0xFFFFFFFFFFFFF800)

First, x is divided by 5 and the bottom 11 bits are zeroed out via the AND term, meaning that the result of the inner bracketed term must always be a multiple of 2^11 = 2048 (0x800). This means that:

  • when 0*0x800 <= x <= 4*0x800, the result is zero,
  • when 5*0x800 <= x <= 9*0x800, the result is 0x800,
  • when 10*0x800 <= x <= 14*0x800, the result is 0x1000 and so on.

In other words, the net effect is to divide x by 5 * 0x800 == 0x2800, discard the remainder and multiply the result by 0x800. The outer term then multiplies this result by 5 and subtracts it from x, leading to a net effect of:

x - (x / 0x2800) * 0x2800

This means that:

  • when 0 <= x <= 0x27FF, x = x
  • when 0x2800 <= x <= 0x4FFF, x = x - 0x2800
  • when 0x5000 <= x <= 0x77FF, x = x - 0x5000

This forces x to remain in the range 0 - 0x27FF, which is equivalent to:

x = x % 0x2800

(where % is the modulo operator)

Hence, the equation:

x - y * ((x / y) & z)

can be simplified to:

x % (y * (~z + 1))

Finally then, we can rename var_220 as follows:

l_indexMod0x2800 = l_index - 5 * ((l_index / 5ui64) & 0xFFFFFFFFFFFFF800ui64);

Block party

The final result of our tranche of renaming looks like this:

  l_index = *&index;
  l_pMetadata = pMetadata;
  // ...
  l_ppDecryptedString = &pDecryptedString;
  // ...
  l_indexMod0x2800 = l_index - 5 * ((l_index / 5ui64) & 0xFFFFFFFFFFFFF800ui64);
  l_indexTimesTwoPlusOne = 2 * l_index + 1;
  l_pMetadataMinus8 = pMetadata - 8;
  l_index_1 = l_index;
  l_pMetadataPlusIndex = pMetadata + l_index;
  // ...
  l_pReturnLength = pReturnLength;

We also name the state variable to state, which is trivially discovered by looking at the condition inside all of the if blocks to see which variable is compared.

Be very careful when performing this initial bout of symbol naming: decoy code paths may be present which deliberately misset some variables. Ensure to only look at code which is actually executed. All of the code above appeared at the start of the function and not inside any if block, so we know for sure it will be executed.

We now start to look at each of the basic blocks we identified earlier. It is not necessary to look at them in order, and indeed sometimes it’s easier to look at them out-of-order to determine the meaning of variables. First we take a cursory glance over each one to see if the block’s behaviour can be guessed. This is where our block comments come in handy because we can just press Alt+T, type in the prefix we used (STATE in this example) and quickly use Ctrl+T to cycle through every block:

First block:

else if ( v1 <= 562449327 )
{
  if ( v1 > 460591968 )
  {
    if ( v1 > 504450339 )
    {
      if ( v1 == 0x1E114D24 )
      {
        // STATE 1E114D24
        var_E4 = var_11C;
        v111 = (~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE) == -1;
        v1 = 0x77D17CB0;
        v112 = 0x1D1C2C24;
        if ( (~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE) == -1 )
          v1 = 0x1D1C2C24;
        v114 = __OFSUB__(dword_7FFF4F6A3F50, 10);
        v113 = dword_7FFF4F6A3F50 - 10 < 0;
        v115 = dword_7FFF4F6A3F50 < 10;
        v116 = 0x77D17CB0;
        goto LABEL_307;
      }

// ...
LABEL_307:
  if ( !v115 )
    v1 = v116;
  if ( v111 ^ v113 ^ v114 )
    v1 = v112;
}

v1 is another state variable so we name that state_1. The block equates to an if selector since v1 is set to one of two values depending on the obfuscated condition (which may or may not be an opaque predicate depending on the values of the static qword addresses). Other than that, it looks like a mess with no obvious function. Moving on.

              else
              {
                // STATE 1D1C2C24
                v109 = ~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE;
                v110 = 0x77D17CB0;
                if ( (v109 == -1) != dword_7FFF4F6A3F50 < 10 )
                  v110 = 0x7AD33C6C;
                v16 = v109 == -1;
                state_1 = v110;
                if ( v16 )
                  state_1 = 0x7AD33C6C;
                var_128 = **(&Size + 4);
                if ( dword_7FFF4F6A3F50 >= 10 )
                  state_1 = v110;
                var_DE = var_E4 < var_128;
              }

Doesn’t really make a whole lot of sense either. Next.

            else if ( state == 0x7AD33C6C )
            {
              // STATE 7AD33C6C
              state = 0x8C4E5185;
              if ( var_DE )
                state = 0xEF1E9046;
            }

Now, we have discovered a crucial piece of information. Recalling that 0xEF1E9046 is part of the loop and 0x8C4E5185 is the state entered when the loop completes, we discover that state 0x7AD33C6C is the loop’s conditional expression, that 0xEF1E9046 is the main loop body and that var_DE is a flag indicating whether the loop should continue. We rename var_DE as loopNotFinished and continue on.

                      // STATE EF1E9046
                      v2 = *(qword_7FFF4F6955D0 + 24);
                      v46 = var_E4 - 5 * ((var_E4 / 5ui64) & 0xFFFFFFFFFFFFF800ui64);
                      byte_7FFF4F6955E0[var_E4] = (~((~*(v2
                                                       + var_E4
                                                       + 0x1400i64
                                                       - 5 * (((var_E4 + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0xD6 | *(v2 + var_E4 + 0x1400i64 - 5 * (((var_E4 + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0x29) ^ (~byte_7FFF4F6955E0[var_E4] & 0xD6 | byte_7FFF4F6955E0[var_E4] & 0x29)) & 0x54 | ((~*(v2 + var_E4 + 0x1400i64 - 5 * (((var_E4 + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0xD6 | *(v2 + var_E4 + 0x1400i64 - 5 * (((var_E4 + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0x29) ^ (~byte_7FFF4F6955E0[var_E4] & 0xD6 | byte_7FFF4F6955E0[var_E4] & 0x29)) & 0xAB) ^ (~(*(var_1E0 + v46) + var_E4) & 0x54 | (*(var_1E0 + v46) + var_E4) & 0xAB);
                      var_11C = var_E4 + 1;
                      state = 0x1E114D24;

Now we’re cooking. One element of a byte array is set to the result of some insanely convoluted expression on lines 4-7 and the state is changed on line 9. Although we cannot say with 100% certainty, this is very likely to be the code that decrypts a single character. We posit that byte_7FFF4F6955E0 is a pointer to the decrypted string, and var_E4 is the character being modified. We rename this l_pDecryptedString and l_charIndex. var_E4 is also incremented on line 8 but it is stored in a different variable. We rename this to l_charIndexPlusOne.

On line 3 we see the exact same obfuscated modulo expression we encountered earlier, operating on l_charIndex, so we rename v46 to l_charIndexMod0x2800.

We don’t try to fully reverse engineer the presumed obfuscated decryption algorithm yet, but now we’ve renamed some variables we take a closer look to see if we can find any other nuggets of information:

l_pDecryptedString[l_charIndex] = (~
       ((~*(v2 + l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0xD6 |
       *(v2 + l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0x29)
       ^ (~l_pDecryptedString[l_charIndex] & 0xD6 | l_pDecryptedString[l_charIndex] & 0x29)) & 0x54 |
       ((~*(v2 + l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0xD6 |
       *(v2 + l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)) & 0x29)
       ^ (~l_pDecryptedString[l_charIndex] & 0xD6 | l_pDecryptedString[l_charIndex] & 0x29)) & 0xAB)
       ^ (~(*(var_1E0 + l_charIndexMod0x2800) + l_charIndex) & 0x54
        | (*(var_1E0 + l_charIndexMod0x2800) + l_charIndex) & 0xAB);

Is there anything we can glean from this pile of goop (NSFW, probably)? Yes there is. l_pDecryptedString is read in the expression, then written to. Therefore it must already be initialized with some values which are almost certainly the encrypted string bytes – we rename it to s_pStringBuffer to clarify its use.

All of the values used by this expression are either literals or variables we have already accounted for, with the exceptions of v2 and var_1E0. The latter is suspiciously accessed as an array of size 0x2800 (we know this because the read byte is at offset l_charIndexMod0x2800 which cannot be larger than 0x27FF), and then XOR’d with the previous terms. This makes it a prime candidate to be a 0x2800-byte XOR decryption key block. We rename var_1E0 to p_maybeKeyBlock.

What about v2? This is set at the start of the block as follows:

v2 = *(qword_7FFF4F6955D0 + 24);

This is accessed four times in the algorithm, and each access is the same:

*(v2 + l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64))

If this looks familiar, you are not wrong. Let’s go back to our equation:

x % (y * (~z + 1))

and substitute the variables:

x = l_charIndex + 0x1400
y = 5
z = 0xFFFFFFFFFFFFF000  (~z + 1 == 0x1000)

The access is therefore:

    *(v2 + (l_charIndex + 0x1400) % (5 * 0x1000))
==  *(v2 + (l_charIndex + 0x1400) % 0x5000))

v2 is therefore likely a byte array of size 0x5000 and could be a second key block. We rename it to l_p_maybeKeyBlockSize0x5000 and the previous suspected key block to p_maybeKeyBlockSize0x2800 for clarity. v2 comes from *(qword_7FFF4F6955D0 + 24) which is a static address reference. The fact 24 bytes is added indicates that this might be a struct, so we right-click on the symbol in the decompiler and select Create new struct type… IDA analyzes the function and fills in the fields for us:

qword18 is 24 bytes into the struct so we change the definition of that field to a named byte array as follows:

_BYTE **p_maybeKeyBlockSize0x5000;

IDA now decompiles the function and we get:

l_p_maybeKeyBlockSize0x5000 = qword_7FFF4F6955D0->p_maybeKeyBlockSize0x5000;

Finally, we rename qword_7FFF4F6955D0 to s_pDecryptionInfo (where s means static) for later reference.

The world’s longest merry-go-round

We are making steady progress. Our job now is to cycle around all of the state blocks again and again until we have resolved their behaviour (handy tip: press Ctrl+PgUp to return to the top of the function, then Ctrl+T repeatedly to cycle through all of the blocks once more). Each go round, we hope to use the variables we have identified on the previous cycle to make sense of parts we couldn’t understand before. We may have to do this many times to reach a conclusion. We will hopefully still have the test app paused in the debugger so we can also step through any behaviour we don’t understand to see the real-time results.

We return to block 1E114D24:

                  // STATE 1E114D24
                  l_charIndex = l_charIndexPlusOne;
                  v111 = (~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE) == -1;
                  state_1 = 0x77D17CB0;
                  v112 = 0x1D1C2C24;
                  if ( (~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE) == -1 )
                    state_1 = 0x1D1C2C24;
                  v114 = __OFSUB__(dword_7FFF4F6A3F50, 10);
                  v113 = dword_7FFF4F6A3F50 - 10 < 0;
                  v115 = dword_7FFF4F6A3F50 < 10;
                  v116 = 0x77D17CB0;
                  goto LABEL_307;

Immediately on the first line we see that this block increments the current character being decoded. This example should clearly demonstrate the importance of naming variables like l_charIndexPlusOne even if we are not entirely sure what they’re for. The rest of the code is very obscure, but this is where the debugger comes to the rescue. We can request a few different string literals and observe that this block advances the state to 0x1D1C2C24 every time, which means we can largely assume the rest is junk code that does nothing useful and is just there for the purposes of increasing the difficulty of static analysis.

This is a great example of why using a blend of static analysis in the disassembler and dynamic analysis in the debugger is a potent combination that is greater than the sum of its parts. Working in the debugger alone, we would be stuck at the assembly level, but working in the decompiler alone, we would not be able to easily determine every branch predicate without a huge amount of work.

We conclude that 1E114D24 is the loop increment expression and continue.

                // STATE 1D1C2C24
                v109 = ~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE;
                v110 = 0x77D17CB0;
                if ( (v109 == -1) != dword_7FFF4F6A3F50 < 10 )
                  v110 = 0x7AD33C6C;
                v16 = v109 == -1;
                state_1 = v110;
                if ( v16 )
                  state_1 = 0x7AD33C6C;
                var_128 = **(&Size + 4);
                if ( dword_7FFF4F6A3F50 >= 10 )
                  state_1 = v110;
                loopNotFinished = l_charIndex < var_128;

Once again, we know the state always advances to 7AD33C6C so this is primarily a pile of junk, however the critical line is the final one which sets the result of the loop condition. We surely hope the loop executes one iteration per character of the string, so we can determine that var_128 is the string length and rename it l_stringLength.

Tip: In the last two snippets, we saw several convoluted expressions with a similar structure:

(~(dword_7FFF4F6A3F54 * (dword_7FFF4F6A3F54 - 1)) | 0xFFFFFFFE) == -1

If we put our thinking caps on, we can actually resolve these statically. In fact, this is an opaque predicate that will always evaluate to true regardless of the contents of the memory referenced via dword_7FFF4F6A3F54. Why?

Multiplying a number by an even number will always produce an even number. In the above expression, the inner term n * (n - 1) guarantees that exactly one side of the multiplication will be even, so the result will be even. This means that the bottom bit will always be zero. The result is then NOT’ed, flipping the bottom bit to a one. OR’ing this with 0xFFFFFFFE produces 0xFFFFFFFF, which in a signed integer is the equivalent binary representation of the value -1 . Therefore, the condition always evaluates to true.

Paint me a picture

We are starting to paint a picture of what is happening now and can assign meaning to the various states:

  • 1E114D24 – increments loop counter (sets the next character index)
  • 1D1C2C24 – executes the loop condition (l_charIndex < l_stringLength) and sets a flag if the loop should end
  • 7AD33C6C – checks the loop finished flag, switches state to 8C4E5185 if so, otherwise:
  • EF1E9046 – is the loop body which decrypts one character and switches state back to 1E114D24

Notice that while this is not exactly the same as our prediction of how the function would be laid out, it resembles it fairly closely: strings are decrypted one character at a time in a loop as expected.

Our ultimate goal is to create our own implementation of the function so that we can decrypt strings without needing the application DLL, but we’re going to need some more information before we can do that.

We identified two potential key blocks in the decryption algorithm that we don’t know anything about. We need to figure out some things:

  • what are the decryption keys?
  • are the keys the same for every call to the function or do they vary?
  • if they vary, how are the keys generated so that we can replicate it in our own code?

I’ve got the key, I’ve got the secret

Out first port of call is to click the static addresses in IDA, but we quickly discover they are uninitialized at compile-time, so the easiest way to find the keys is to just trace the decryption block – EF1E9046 – in the debugger. First, switch to the disassembly view in IDA to find out which register contains the key block address:

.text:0007FFF4E2ADE3E STATE_EF1E9046:
.text:00007FFF4E2ADE3E                 movsxd  rcx, [rbp+1C0h+l_charIndex]
.text:00007FFF4E2ADE45                 lea     r9, s_pStringBuffer
.text:00007FFF4E2ADE4C                 mov     bl, [rcx+r9]
.text:00007FFF4E2ADE50                 mov     rax, cs:s_pDecryptionInfo
.text:00007FFF4E2ADE57                 mov     rsi, [rax+18h]

Variable naming once again helps us tremendously here and we can easily see that rax is a pointer to the struct we defined and the field we named is loaded into rsi. This would be pretty difficult to identify in the debugger output:

movsxd  rcx, ss:[rbp+0xDC]
lea     r9, ds:[0x7FFF55B755E0]
mov     bl, ds:[rcx+r9]
mov     rax, ds:[0x7FFF55B755D0]
mov     rsi, ds:[rax+0x18]

Stepping to the line immediately after rsi is loaded:

We right-click the value of rsi in the register pane and click Follow in dump which reveals the data starting at 0x23B7F9C0000:

The fact the previous bytes contain obviously unrelated data is encouraging because it means we are almost certainly in the right place. Since we’re expecting this array to be 0x5000 bytes long, we can skip to the end to see what we find:

Perfect. The high entropy data ends precisely where we expect it to at 0x23B7F9C5000.

To determine whether this block of data is always the same or not, we can set a breakpoint at the line after rsi is loaded, unpause the debugger and use our test app to request a few different strings, perhaps also restarting the application a couple of times as well. After these anecdotal tests, the block appears to be the same every time, so we can just select the entire region of 0x5000 bytes, right-click and select Binary -> Save To A File and save the key. Smooth going down.

We now just need one more key block before we can start to reverse engineer the algorithm proper and reimplement it. We’re getting there!

By clicking on the symbol p_maybeKeyBlockSize0x2800 in the decompiler, the synchronized disassembly view shows where the value is accessed (the top line):

.text:00007FFF4E2ADED2                 mov     rax, [rbp+1C0h+p_maybeKeyBlockSize0x2800]
.text:00007FFF4E2ADED6                 mov     edi, [rbp+1C0h+l_charIndex]
.text:00007FFF4E2ADEDC                 add     dil, [rax+rdx]
.text:00007FFF4E2ADEE0                 mov     eax, ebx

We step through to the line immediately after rax is loaded and… now you just wait one darn minute… what’s this?!

rax has the same value as rsi! We are simply looking at two different offsets into the same key block! We re-run the function again with a few different string indexes just to be sure, and notice that the value of rax has the string literal index added to it, modulo 0x2800 – the value l_indexMod0x2800 that we set way back at the start of the decompilation. Again just to be sure, we search for p_KeyBlockSize0x2800 in the function to see where it is set. It appears twice and both definitions are the same so it doesn’t matter which execution path is taken:

p_KeyBlockSize0x2800 = s_pDecryptionInfo->p_maybeKeyBlockSize0x5000 + l_indexMod0x2800;

It now makes perfect sense. Both pointers are different offsets into the same key block. That’s awesome news because we can now proceed to creating our own implementation!

(In reality, I reverse engineered a bit more code to confirm all of this – state 3EDE0FDC fetches the string index and length from metadata and 751C481C copies the encrypted string from the metadata to s_pStringBuffer and sets l_charIndexPlusOne to zero so that the loop starts on the first character. I also set breakpoints on all of the locations in the decompilation where the key block values are initialized – not knowing which one was the real execution path – and waiting to see which one was triggered to confirm that both static values were set to the same pointer to the start of the key)

Yo dawg, I heard you like algebra

Our next play is to copy paste the obscene decryption algorithm into a text editor and reformat it with proper indentation so that we can start to simplify it. I began with:

// Where pKeyBlockSize0x5000 is 0x5000 bytes of high entropy data
// Where pKeyBlockSize0x2800 is 0x2800 bytes of pKeyBlockSize0x5000 at offset (stringLiteralIndex mod 0x2800)

s_pStringBuffer[l_charIndex] =
(
	~(
		(~l_p_KeyBlockSize0x5000[l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)] & 0xD6
		| l_p_KeyBlockSize0x5000[l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)] & 0x29)
		
		^ (~s_pStringBuffer[l_charIndex] & 0xD6 | s_pStringBuffer[l_charIndex] & 0x29)
	) & 0x54
	|
	(
		(~l_p_KeyBlockSize0x5000[l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)] & 0xD6
		| l_p_KeyBlockSize0x5000[l_charIndex + 0x1400i64 - 5 * (((l_charIndex + 0x1400i64) / 5ui64) & 0xFFFFFFFFFFFFF000ui64)] & 0x29)
		
		^ (~s_pStringBuffer[l_charIndex] & 0xD6 | s_pStringBuffer[l_charIndex] & 0x29)
	) & 0xAB
)
^
(
	 ~(p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0x54
	| (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0xAB
);

To fully simplify this expression, we are going to need to know some mathematical axioms as they pertain to bitwise logic operations. Let’s first deal with the obfuscated modulo expressions that we already covered and substitute in simplified operations:

s_pStringBuffer[l_charIndex] =
(
	~(
		(~l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] & 0xD6
		| l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] & 0x29)
		
		^ (~s_pStringBuffer[l_charIndex] & 0xD6 | s_pStringBuffer[l_charIndex] & 0x29)
	) & 0x54
	|
	(
		(~l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] & 0xD6
		| l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] & 0x29)
		
		^ (~s_pStringBuffer[l_charIndex] & 0xD6 | s_pStringBuffer[l_charIndex] & 0x29)
	) & 0xAB
)
^
(
	 ~(p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0x54
	| (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0xAB
);

There are many occurrences of the expression:

~x & y | x & ~y

(note that ~0xD6 == 0x29 and ~0x54 == 0xAB)

The AND operator has higher precedence than the OR operator, so this has the effect of flipping the bits in x that are set in y and leaving the rest as they are. This is equivalent to x XOR y which flips every bit in x that is set in y (and vice versa – the result would be the same if x and y were swapped). We simplify the inner key expressions first:

s_pStringBuffer[l_charIndex] =
(
	~(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		
		^ (~s_pStringBuffer[l_charIndex] & 0xD6 | s_pStringBuffer[l_charIndex] & 0x29)
	) & 0x54
	|
	(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		
		^ (~s_pStringBuffer[l_charIndex] & 0xD6 | s_pStringBuffer[l_charIndex] & 0x29)
	) & 0xAB
)
^
(
	 ~(p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0x54
	| (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0xAB
);

Then the string buffer terms:

s_pStringBuffer[l_charIndex] =
(
	~(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		^ s_pStringBuffer[l_charIndex] ^ 0xD6
	) & 0x54
	|
	(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		^ s_pStringBuffer[l_charIndex] ^ 0xD6
	) & 0xAB
)
^
(
	 ~(p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0x54
	| (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) & 0xAB
);

Then the second inner key block expression:

s_pStringBuffer[l_charIndex] =
(
	~(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		^ s_pStringBuffer[l_charIndex] ^ 0xD6
	) & 0x54
	|
	(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		^ s_pStringBuffer[l_charIndex] ^ 0xD6
	) & 0xAB
)
^
(
	 (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) ^ 0x54
);

Then the outer expression:

s_pStringBuffer[l_charIndex] =
(
	(
		l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
		^ s_pStringBuffer[l_charIndex] ^ 0xD6
	) ^ 0x54
)
^
(
	 (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) ^ 0x54
);

All the remaining operations are XORs so we can remove the brackets:

s_pStringBuffer[l_charIndex] =
	l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000] ^ 0xD6
	^ s_pStringBuffer[l_charIndex] ^ 0xD6
	^ 0x54
	^ (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex) ^ 0x54;

XOR is a commutative, associative and distributive operation, and XORing a number by itself is equivalent to a no-op, so we remove the extraneous XORs that cancel each other out:

s_pStringBuffer[l_charIndex] =
	l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000]
	^ s_pStringBuffer[l_charIndex]
	^ (p_KeyBlockSize0x2800[l_charIndexMod0x2800] + l_charIndex);

and finally neaten it up with some syntactic sugar and remove unnecessary variables:

s_pStringBuffer[l_charIndex] ^=
	l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000]
	^ (p_KeyBlockSize0x2800[l_charIndex % 0x2800] + l_charIndex);

and replace the second key block reference with the primary offset:

s_pStringBuffer[l_charIndex] ^=
	l_p_KeyBlockSize0x5000[(l_charIndex + 0x1400) % 0x5000]
	^ (l_p_KeyBlockSize0x5000[(stringLiteralIndex % 0x2800) + (l_charIndex % 0x2800)] + l_charIndex);

And we’re done! Let’s look at the final implementation of this 1800-line function in C#:

var keyBlock = File.ReadAllBytes(Path.Combine(path, "key-block.dat"));
var metadata = new BinaryReader(File.Open("global-metadata-decrypted.dat", FileMode.Open, FileAccess.Read));

var startOfStringLiterals = 0x158;
var stringLiteralOffsetLengthTable = 0x146480;
var endOfStringLiteralOffsetTable = 0x1A7238;
            
var stringLiteralsCount = (endOfStringLiteralOffsetTable - stringLiteralOffsetLengthTable) / 8;

for (var stringLiteralIndex = 0; stringLiteralIndex < stringLiteralsCount; stringLiteralIndex++) {

    metadata.BaseStream.Position = stringLiteralOffsetLengthTable + stringLiteralIndex * 8;
    var stringOffset = metadata.ReadUInt32();
    var stringLength = metadata.ReadInt32();

    metadata.BaseStream.Position = startOfStringLiterals + stringOffset;
    var encryptedString = metadata.ReadBytes(stringLength);
    var decryptedString = new byte[stringLength];

    for (var charIndex = 0; charIndex < stringLength; charIndex++) {
        decryptedString[charIndex] = (byte) (encryptedString[charIndex]
            ^ keyBlock[(charIndex + 0x1400) % 0x5000]
            ^ (keyBlock[(stringLiteralIndex % 0x2800) + (charIndex % 0x2800)] + charIndex));
    }

    var decrypted = System.Text.Encoding.UTF8.GetString(decryptedString);
    Console.WriteLine($"{stringLiteralIndex:x4}: {decrypted}");
}

Line 1 loads the key block we saved earlier and line 2 loads the encrypted string literals blob.

Lines 3-5 define where in the blob the encrypted string literals start, and where the index reference table (which provides an offset into the blob and length for each literal – see here for details) starts and ends. Line 8 uses these values to calculate the total number of literals.

Line 10 begins a loop which iterates over every string literal, first finding each encrypted string’s offset and length in lines 12-14, reading the encrypted string and allocating space for the decrypted string in lines 16-18, then performing the decryption algorithm on lines 20-23, before finally converting the byte array to a .NET string and displaying the result on the console in lines 26-27.

Yes, that’s right.

The decryption function is 4 lines of code. Isn’t it beautiful?

Can you sue an anonymous hacker for a ransomware attack? This company is. |  BenefitsPRO

“I made a thing”

Let us now turn our attention to the actual algorithm itself.

The decryption algorithm used by Honkai Impact is quite strange. It is based on two sliding windows into a 0x5000 byte XOR key block. This in itself is not entirely unreasonable, however the distribution of key bit usage is poor:

  • The first sliding window offset extends from the string’s table index value, to this value plus the string length.
  • More egregiously, the second sliding window extends from byte 0x1400 to this value plus the string length
Distribution of key bit usage in Honkai Impact string literal decryption algorithm

The first issue means that overlapping key bytes are used in sequential strings, and if a later string X+n is smaller in length than previous string X by n characters or more, the entire set of used key bits for X+n will be a subset of those used for X.

The second issue is even more serious. What’s going to happen at string literal index 0x1400, 0x3C00 and all of the other strings per 0x2800 indices? You better believe the two XORs on the first character are going to cancel each other out and it will be available in plaintext in the string blob. String 0x1400 in this application is Unknown min/max value:. The corresponding encrypted metadata is:

Umiccxh9ebP aVj1f\u0012~XY%

Since charIndex is added to each character’s XOR value, the characters in this string have each essentially been XORed with their key byte and then XORed again with the same key byte added to the offset from the start of the string. The first character is therefore unencrypted, the second is XORed by key ^ (key + 1) , the third is XORed by key ^ (key + 2) and so on. Yikes, this is a major blunder, and while it might not matter in a video game, it nevertheless opens up interesting opportunities for cryptanalysis.

Suppose we know nothing about the algorithm, but we do know that strings should be human-readable. A simple search for sequences of characters in the printable character range (or nearby) can be cross-referenced with the currently unencrypted offset/length table to discern a pattern that repeats at regular index intervals.

To demonstrate just how trivial this is, consider this adaptation of our previous code:

for (var stringLiteralIndex = 0; stringLiteralIndex < stringLiteralsCount; stringLiteralIndex++) {
    // ... get the encrypted string ...

    var asciiCount = encryptedString.Count(c => c >= 0x20 && c <= 0x7F);
    if ((double) asciiCount / stringLength >= 0.9 && stringLength > 6)
                    Console.WriteLine($"{stringLiteralIndex:X4}");
}

This outputs all of the string indexes where at least 90% of the characters are ASCII. We elide strings of 6 characters or less to increase the signal-to-noise ratio. Both of these heuristic values can be adjusted if desired. The output is:

1400
2F98
5607
6400
6B68
7F94
8447
8822
8C00
97BB
9B18
B400

Although there is some noise and we do miss the string at index 0x3C00, there is an obvious pattern on the highlighted lines. We can then fetch the strings and further inspect them, as well as dialing the heuristics down to spot false negatives.

Maybe we should have thought this through some more

We can then perform a chosen ciphertext attack by modifying the metadata blob before it is passed to the decryption function. Consider the case where we change all of the string data to zeroes:

for (int p = startOfStringLiterals; p < endOfStringLiterals; p++)
    decryptedMetadata[p] = 0x00;

Having previously identified weakly encrypted sequences at index intervals of 0x2800 (plus 0x1400 initial offset), we can run the code against these specific string indexes with zeroed out string data and the results will show us exactly which bits were flipped during the decryption:

0x1400: 00 03 02 0D 0C 0F 06 19 08 0B 3E 0F 0C 37 12 11 10 73 12 2D 3C 1F
0x3C00: 00 03 02 0D 0C 0F 06 19 08 0B 3E 0F 0C 37 12 11 10 73 12 2D 3C 1F FE 19 18 3B 6E 2B 1C 2D 66 67 20 6F E2 5D 6C EB 26 EB 28 59 6A 57 74 57 32
0x6400: 00 03 02 0D 0C 0F 06 19 08 0B 3E 0F 0C 37 12 11 10 73 12
...

This is horrible, because:

  • we can immediately rule out strong encryption, which would never produce such low entropy data
  • it tells us there is a repeating key block
  • it tells us a factor of the key length
  • it gives clues about the existence of XOR-with-addition

This last point merits further explanation. If x is a pseudo-random key byte and y is a string offset where we assume that strings are of a typical average length for application string literals (say, 32 characters or less), the operation x ^ (x + y) only has a limited set of possible results. Specifically the result of the operation will be precisely the bits that are changed by the addition of x and y. For example, if x == 0x35 and y == 1:

x =            0b00110101 // 0x35
y =            0b00000001 // 0x01
x + y ==       0b00110110 // 0x36 - bottom 2 bits changed
x ^ (x + y) == 0b00000011 // 0x03 - result shows which bits changed

We can construct a partial rainbow table for these possibilities to prune the key space for a brute-force attack. I’ve just used the lower 4 bits here for simplicity:

var testBytes = new byte[] {
    0x00, 0x03, 0x02, 0x0D, 0x0C, 0x0F, 0x06, 0x19,
    0x08, 0x0B, 0x3E, 0x0F, 0x0C, 0x37, 0x12, 0x11
};

for (int p = 0; p < testBytes.Length; p++) {
    var xors = Enumerable.Range(0x00, 0xFF).Select(b => new { Byte = b, Xor = b ^ (b + p) })
                .Where(b => b.Xor == testBytes[p]).Select(b => b.Byte & 0x0F).Distinct().ToList();

    Console.WriteLine($"Candidate bottom 4 bits for {p:g2}/{testBytes[p]:X2}: " + string.Join(", ", xors.Select(x => string.Format($"{x:X1}"))));
}

The output:

Candidate bottom 4 bits for 0/00: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
Candidate bottom 4 bits for 1/03: 1, 5, 9, D
Candidate bottom 4 bits for 2/02: 0, 1, 4, 5, 8, 9, C, D
Candidate bottom 4 bits for 3/0D: 5, 7
Candidate bottom 4 bits for 4/0C: 4, 5, 6, 7
Candidate bottom 4 bits for 5/0F: 5
Candidate bottom 4 bits for 6/06: 0, 1, 8, 9
Candidate bottom 4 bits for 7/19: 9, B, D, F
Candidate bottom 4 bits for 8/08: 0, 1, 2, 3, 4, 5, 6, 7
Candidate bottom 4 bits for 9/0B: 1, 5
Candidate bottom 4 bits for 10/3E: A, B
Candidate bottom 4 bits for 11/0F: 2
Candidate bottom 4 bits for 12/0C: 0, 1, 2, 3
Candidate bottom 4 bits for 13/37: 5, D
Candidate bottom 4 bits for 14/12: 2, 3, 6, 7, A, B, E, F
Candidate bottom 4 bits for 15/11: 1, 3, 5, 7, 9, B, D, F

These values represent possible options for the lower 4 bits of the key bytes used for this particular string (index 0x1400). Let us now look at the actual key data for the string:

B0 35 50 D7 96 55 90 0B 82 61 9B 32 51 15 07 C3

If you compare the bottom 4 bits of each byte with the rainbow table, you can see that we have successfully eliminated impossible key bits. A strong encryption algorithm would use an arbitrary selection of key bits for each operation, making it impossible (or at least extremely difficult) to prune the key space in this manner.

Extrapolating this to derive the entire key block is more convoluted and beyond the scope of this novella, but I believe the point is made: algorithms that are slapped together with minimal field testing will leave open easily exploitable attack vectors that don’t require the code to be reverse engineered.

Take the EZ legend way out

Perhaps you suspect a XOR blob but don’t care how the algorithm works or how the blob is derived from the key data. In that case, the course of action is simple: we iterate through all of the string literals – having zeroed out all the data as before – and write the output to the file. By eliminating the string data from the equation, the algorithm is now only performing arithmetic on key data. If we suspect a XOR algorithm, we can simply XOR the saved file with the original string literal blob:

var xorBlob = File.ReadAllBytes("zeroized-string-literals-output.dat");
for (int p = 0; p < xorBlob.Length; p++)
    xorBlob[p] ^= metadata[p + startOfStringLiterals];
File.WriteAllBytes("decrypted-string-literals.dat", xorBlob);

And as sure as the sun rises in the morning:

We produce the entire set of literals with no reverse engineering and almost no knowledge of the algorithm at all. We do not actually recover the 0x5000-byte key block here (nor do we know it exists), rather the combination of the two sliding windows XORed together to produce the final XOR key byte for each location. Still, this is the fastest and most effective way to decrypt the data if the algorithm itself is not needed.

Not great, not terrible

First of all I want to say this is a ridiculously over-engineered attempt at hiding an application’s strings, and I applaud the developers for such a valiant effort. It is certainly more than enough to defeat a significant portion of people who might dare to tackle it, well, I mean, until someone comes along and blogs the solution, if such a thing should happen. It was a lot of fun to crack and write about. I skipped over a lot of messing around in the article, overall I’d say it took a few hours to get to grips with (it took longer to write the article) but you really have to pool all the tools available and have knowledge in multiple areas to make headway.

Mistakes were made in this obfuscation. The list of string offsets and lengths is in plaintext in the metadata and both the index list and the encrypted blob in the metadata file are easily demarcated. Encrypting these as well would have increased the reverse engineering complexity somewhat.

The application’s anti-debugging implementation did not take into account the ability to load the DLL in an isolated sandbox, which made the task substantially easier. There was no assembly-level obfuscation to defeat IIDA’s decompiler or make reading the disassembly more difficult for a human.

The function acts in a predictable manner. We expected a for loop and that’s exactly what we got. Source-level obfuscation of this would have made reverse engineering considerably more confusing.

The decryption function itself was simple, which made it possible to just simplify the obfuscated equation in Notepad without too much trouble. Additionally, it contained major flaws which left it vulnerable to exploits via simplistic cryptanalysis with chosen ciphertext attacks.

Ultimately, the function doesn’t even need to be reverse engineered since you can just invoke the decryption code in the DLL in isolation anyway to extract all of the strings.

The main remaining question is: how is the key block generated? The block is not present in the DLL file, and indeed feeding in different global-metadata.dat files produces different blocks, so the metadata file is clearly used to derive the key data. For the curious, the key data is written in 8-byte blocks, and beyond that I shall leave determining how to generate it as an exercise for the reader 😎

I really hope you enjoyed this deep dive and picked up some useful tips and techniques. Until next time, happy hacking!

Further Reading

Rolf Rolles has written a fantastic article about using IDA’s Microcode API to heuristically unflatten code compiled with control flow flattening. His technique modifies the pseudo-language code produced by IDA before it is passed to the decompiler by detecting state machines and coalescing the basic blocks back together into normal for loops and other control statements. It’s a fantastic deep dive and well worth your time.

Vladislav Hrčka discusses new control flow and string obfuscation techniques found in cryptomining malware released on the Statinko botnet in 2020.

T. László and Á. Kiss wrote a paper on the techniques and pitfalls of implementing control flow flattening in C++.

Levwu examines control flow flattening found in the C# HawkEye malware which uses ConfuserEx for obfuscation.

Categories: IL2CPP, Reverse Engineering Tags:

Share your thoughts! Note: to post source code, enclose it in [code lang=...] [/code] tags. Valid values for 'lang' are cpp, csharp, xml, javascript, php etc. To post compiler errors or other text that is best read monospaced, use 'text' as the value for lang.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: