Skip to the content.

Basic PL Concepts

Hands-on analysis of basic Rust programming language concepts and their binary representations.

Overview

This sample project demonstrates fundamental Rust concepts and how they appear in compiled binaries:

  • Enums (algebraic data types)
  • Structs (product types)
  • Traits (interfaces)
  • Pattern matching
  • String handling

Project Location: docs/01-Rust-Binary-Analysis/01-basic_pl_concepts/

Table of Contents

Reverse Engineering Rust Codes

Analysis

Source Code Analysis

Enum Definition

enum Beatle {
    John,
    Paul,
    George,
    Ringo,
}

Binary Representation:

  • Enums are represented as integer discriminants
  • Simple enums (no data) use smallest integer type needed
  • Discriminant values: John=0, Paul=1, George=2, Ringo=3

Struct Definition

struct Person {
    name: String,
    age: u32,
}

Memory Layout:

Person {
    name: String {           // 24 bytes on x64
        ptr: *const u8,      // 8 bytes
        len: usize,          // 8 bytes
        cap: usize,          // 8 bytes
    }
    age: u32,                // 4 bytes
}
Total: 32 bytes (with padding)

Output of this Rust code

In 6 years, Alice will be 33
In 7 years, Alice will be 34
In 8 years, Alice will be 35
In 9 years, Alice will be 36
Carol's favorite song is Yesterday

Building the Sample

Standard Release Build

cd docs/01-Rust-Binary-Analysis/01-basic_pl_concepts
cargo build --release

Output: target/release/basic_pl_concepts.exe

Cross-Platform Builds

x86-64 (64-bit) Windows

# MSVC toolchain
cargo build --release --target x86_64-pc-windows-msvc

# GNU toolchain
cargo build --release --target x86_64-pc-windows-gnu

x86 (32-bit) Windows

# MSVC toolchain
cargo build --release --target i686-pc-windows-msvc

# Install the target
rustup target add i686-pc-windows-gnu

# Compile with GNU toolchain
cargo build --release --target i686-pc-windows-gnu

Optimisation Levels

O3 Optimisation

[profile.release]
opt-level = 3
cargo build --release --target i686-pc-windows-msvc

Aggressive Optimisation

[profile.release]
opt-level = 3
lto = true
codegen-units = 1
strip = true
panic = "abort"

Results:

  • Default build: ~113 KB
  • O3 optimisation: ~113 KB
  • Aggressive optimisation: ~101 KB (11% reduction)

Binary Analysis

Available Samples

Located in datasets/Benign-Samples/01-basic-pl-concepts/:

Debug Builds

  1. basic_pl_concepts-x86_64-cargo-build-debug.exe
    • Architecture: x86-64 (64-bit)
    • Build Type: Debug
    • Toolchain: MSVC (default cargo build)
    • Size: 145 KB
  2. basic_pl_concepts-x86-i686-msvc-debug.exe
    • Architecture: x86 (32-bit)
    • Build Type: Debug
    • Toolchain: MSVC
    • Size: 125 KB
  3. basic_pl_concepts-x86-i686-debug-gnu.exe
    • Architecture: x86 (32-bit)
    • Build Type: Debug
    • Toolchain: GNU (MinGW-w64)
    • Size: 2.1 MB
  4. basic_pl_concepts-aarch64-debug
    • Architecture: ARM64 (Aarch64)
    • Build Type: Debug
    • Platform: macOS (Mach-O)
    • Size: 459 KB

Release Builds

  1. basic_pl_concepts-x86_64-cargo-build-release.exe
    • Architecture: x86-64 (64-bit)
    • Build Type: Release
    • Toolchain: MSVC (default cargo build –release)
    • Size: 131 KB
  2. basic_pl_concepts-x86-64-msvc-release.exe
    • Architecture: x86-64 (64-bit)
    • Build Type: Release
    • Toolchain: MSVC
    • Size: 131 KB
  3. basic_pl_concepts-x86-i686-msvc-release.exe
    • Architecture: x86 (32-bit)
    • Build Type: Release
    • Toolchain: MSVC
    • Size: 113 KB
  4. basic_pl_concepts-x86-i686-release-gnu-.exe
    • Architecture: x86 (32-bit)
    • Build Type: Release
    • Toolchain: GNU (MinGW-w64)
    • Size: 1.3 MB
  5. basic_pl_concepts-x86-release-O3.exe
    • Architecture: x86 (32-bit)
    • Build Type: Release
    • Optimisations: O3 (opt-level = 3)
    • Size: 113 KB
  6. basic_pl_concepts-x86-release-most-aggressive-optimisation.exe
    • Architecture: x86 (32-bit)
    • Build Type: Release
    • Optimisations: Most aggressive (LTO, strip, codegen-units=1, panic=abort)
    • Size: 101 KB
  7. basic_pl_concepts-aarch64-release
    • Architecture: ARM64 (Aarch64)
    • Build Type: Release
    • Platform: macOS (Mach-O)
    • Size: 398 KB

Static Analysis

String Extraction

# Extract all strings
strings basic_pl_concepts.exe

# Look for Rust-specific strings
strings basic_pl_concepts.exe | grep -E "(panic|rust|src)"

Expected Findings:

  • “panicked at” - Panic handler
  • Source file paths with .rs extension
  • Enum variant names (if not optimised out)

Symbol Analysis

# List all symbols (if not stripped)
nm basic_pl_concepts.exe

# Demangle Rust symbols
nm basic_pl_concepts.exe | rustfilt

# Find main function
nm basic_pl_concepts.exe | rustfilt | grep "main"

Key Symbols:

  • main - Entry point
  • std::rt::lang_start - Rust runtime initialization
  • core::panicking::panic - Panic handler
  • Type-specific implementations

File Type Detection

# Check file type
file basic_pl_concepts-x86-i686-msvc-release.exe
# Output: PE32 executable for MS Windows (console) Intel 80386

file basic_pl_concepts-x86-64-msvc-release.exe
# Output: PE32+ executable (console) x86-64, for MS Windows

Disassembly Analysis

Overview

  • MACH-O executables (ARM64/Aarch64) are compiled on MacOS M1
  • PE(x86/i686) or PE (x86-64) are compiled on Windows
  • If the executables are cross-compiled for different platforms, the file names will cealrly listing it.

release` by Default

cargo build --release

Cargo.toml

[profile.release]
opt-level = 3

Emphasise O3

Cargo.toml

[profile.release]
opt-level = 3

Most aggresive optimisation

Cargo.toml

[profile.release]
opt-level = 3
lto = true
codegen-units = 1
strip = true
panic = "abort"

What each flag does:

  • opt-level = 3: Maximum LLVM optimisations
  • lto = true: Enables cross-crate inlining and better dead code elimination
  • codegen-units = 1: Forces all code through a single - Optimisation pipeline (increases compile time but improves Optimisation)
  • strip = true: Removes debug symbols, reducing binary size
  • panic = “abort”: Uses simpler panic handler that terminates immediately instead of unwinding the stack

Reverse Engineering Rust Codes

Binaries

PE files used in this analysis are listed below, located in /datasets/Benign-Samples/01-basic-pl-concepts:

1. basic_pl_concepts-x86_64-cargo-build-debug.exe

  • Architecture: x86-64
  • Compilation: Debug build
  • Optimisation: None
  • Strip: No debug symbols removed

2. basic_pl_concepts-x86_i686-msvc-debug.exe

  • Architecture: x86 (32-bit)
  • Compilation: Debug build
  • Optimisation: None
  • Strip: No debug symbols removed

They are all compiled using typical cargo compilation

cargo build

Tools

  • Binary Ninja v5

Methodology

Start with debug builds (both 32-bit and 64-bit), then release build (32-bit).

  • Import binaries into Binary Ninja
  • Identify function boundaries and demangle Rust symbol names for readability.
  • Analyse panic, error handling, and unwinding patterns unique to Rust.
  • Locate and interpret vtables, trait objects, and fat pointers.
  • Examine memory safety constructs (ownership, borrowing, lifetimes) as reflected in code/data.
  • Recognise common Rust standard library patterns and idioms.
  • Reconstruct high-level types, enums, and structures from decompiled output.
  • Rename functions, variables, and types based on their usage.
  • Add comments explaining complex logic, especially around pattern matching and generics.
  • Fix decompilation issues related to Rust’s code generation (inlining, monomorphisation).
  • Document findings and improve readability for further analysis.

To be clear, for this analysis, the whole point is to recognise the trait pattern. To find Rust trait objects and vtables, a good start might be searching for:

  • Data cross-references and pointers to read-only memory (often vtable tables)
  • Function pointers grouped together (potential vtables)
  • Function signatures that take or return fat pointers (structs with data pointer + vtable pointer)

Analysis

32-bit GNU Debug Build (PE File)

GNU Binary (GCC/MinGW)

  • File: basic_pl_concepts-x86-i686-debug-gnu.exe
  • Compiler: GCC (MinGW-w64 toolchain)
  • Entry Point: 0x401410 - mainCRTStartup
  • Architecture: x86 (32-bit)
  • Build Type: Debug

0. Execution Summary

  • GNU binary is larger than MSVS binary
  • More symbols in the binary to reverse engineer with (e.g., main function name)

1. Entry Point

GNU (GCC/MinGW) Startup Chain

Check xref first.

Windows Loader
    ↓
mainCRTStartup (0x401410)  →  STARTS HERE
    ↓                      →  (__mingw_app_type = 0)
__tmainCRTStartup (0x40101c)
    ↓                      →  (runtime initialisation)
_main (0x401b30)
    ↓
__main (0x4a4ee0)          →  __do_global_ctors
    ↓
std::rt::lang_start (Rust runtime)
    ↓
basic_pl_concepts::main (actual user code)

In this x86 GNU debug build, both functions exist, but only ONE actually runs (see figure below):

  • WinMainCRTStartup (0x401400): Present but UNUSED ❌ (dead code)
  • mainCRTStartup (0x401410): ACTUAL ENTRY POINT ✅ (runs first)

According to the PE32 Optional Header, the AddressOfEntryPoint field contains the RVA (Relative Virtual Address) of 0x1410, which translates to absolute address 0x401410 (with image base 0x400000).

This means mainCRTStartup at 0x401410 is the function that Windows loader calls when the process starts

The key insight:

  • WinMainCRTStartup never executes, it’s just compiled in as an alternative that the linker didn’t select.
  • Address order ≠ Execution order.
  • The PE header’s entry point field determines what runs first, not the function addresses!

Side-by-Side Comparison

WinMainCRTStartup (0x401400) - NOT USED:

WinMainCRTStartup:
    __mingw_app_type = 1    // Set app type to GUI (1)
    return __tmainCRTStartup() __tailcall

mainCRTStartup (0x401410) - ACTUAL ENTRY POINT:

mainCRTStartup:
    __mingw_app_type = 0    // Set app type to Console (0)
    return __tmainCRTStartup() __tailcall

The Pattern

Both functions:

  1. Set the __mingw_app_type global variable
  2. Tail-call the same __tmainCRTStartup function

The ONLY difference is the app type:

  • WinMainCRTStartup sets __mingw_app_type = 1 (GUI application)
  • mainCRTStartup sets __mingw_app_type = 0 (Console application)

Why Both Exist?

This is a MinGW/GCC linker pattern that provides two entry points in every executable:

  1. Console applications (like this one):
    • Linker sets entry point to mainCRTStartup
    • User’s main function signature: int main(int argc, char *argv[])
  2. GUI applications (Windows apps):
    • Linker would set entry point to WinMainCRTStartup
    • User’s main function signature: int WinMain(HINSTANCE, HINSTANCE, LPSTR, int)

Later Impact

This __mingw_app_type value affects initialisation behaviour later in __tmainCRTStartup:

int __mingw_app_type_1 = __mingw_app_type
if (__mingw_app_type_1 != 0)
    __set_app_type(_crt_gui_app)      // GUI: No console allocation
else
    __set_app_type(_crt_console_app)  // Console: Attach to console

How This Differs from MSVC

You can find detailed analysis on x86 MSVC build later, in the section of 32-bit MSVC Debug Build (PE File).

MSVC has completely separate entry point functions with different names:

  • Console: mainCRTStartup calls main()
  • GUI: WinMainCRTStartup calls WinMain()
  • Unicode Console: wmainCRTStartup calls wmain()
  • Unicode GUI: wWinMainCRTStartup calls wWinMain()

MinGW/GCC uses a unified approach where both entry points exist but share the same initialisation code (__tmainCRTStartup), differing only in the app type flag.

Cross-References Check

# No code references to either entry point - they're only called by Windows loader!
xrefs to WinMainCRTStartup (0x401400): (none)
xrefs to mainCRTStartup (0x401410): (none)

This confirms these are true entry points. They are called by the OS, not by any code within the binary.

Summary Table

Entry Point Address App Type Set PE Entry? Purpose
WinMainCRTStartup 0x401400 1 (GUI) ❌ No Windows GUI applications
mainCRTStartup 0x401410 0 (Console) Yes Console applications (this binary)
__tmainCRTStartup 0x40101c N/A N/A Unified startup logic for both

Both entry points are compiled into every MinGW executable, but only one is referenced in the PE header based on the subsystem (CONSOLE vs WINDOWS) specified during linking!

How to Determine Which Entry Point is Used

Method 1: Check PE Header

# Using Binary Ninja or PE viewer
PE Optional Header → AddressOfEntryPoint → 0x1410
With Image Base 0x400000 → Absolute: 0x401410
Function at 0x401410 → mainCRTStartup ✓

Method 2: Check Subsystem

PE Optional Header → Subsystem
  - IMAGE_SUBSYSTEM_WINDOWS_GUI (2) → WinMainCRTStartup
  - IMAGE_SUBSYSTEM_WINDOWS_CUI (3) → mainCRTStartup ✓

Method 3: Linker Configuration

# GCC/MinGW linker flags:
-mconsole     → Sets entry to mainCRTStartup
-mwindows     → Sets entry to WinMainCRTStartup
(default)     → Depends on main vs WinMain function

Entry Point Architecture

This is the flowchart to make it clearer, as the summary we discussed previously about entry point identification.

2. main - Unique Code for GNU

| Aspect | GNU (GCC/MinGW) | MSVC | |——–|—————–|——| | Multi-threading Check | Stack base comparison with sleep loop | Not present in entry | | TLS Initialization | Three separate force flags set | Single _register_thread_local_exe_atexit_callback | | Stack Detection | Uses fsbase->NtTib.StackBase | Uses saved base pointer tracking |

This is a unique GNU/MinGW pattern to detect if the process is being debugged or has multiple threads racing during startup!

3. Initialisation Tables

Aspect GNU (GCC/MinGW) MSVC
Error-checking Init _initterm_e(&__xi_a, &__xi_z) _initterm_e(0x419168, 0x419174)
Regular Init _initterm(&__xc_a, &__xc_z) _initterm(0x41915c, 0x419164)
Initialization State ___native_startup_state tracking crtInitializationStateGlobal enum
Startup Lock Not present ___scrt_acquire_startup_lock()

GNU Code:

int eax_10 = _initterm_e(&__xi_a, &__xi_z)
if (eax_10 != 0)
    return 0xff

_initterm(&__xc_a, &__xc_z)

4. Command-Line & Environment Setup

Aspect GNU (GCC/MinGW) MSVC
Argv Parsing __getmainargs(&argc, &argv, &envp, _dowildcard, startup_info) Not visible in main flow
Argv Copying Manual deep copy of all argv strings Not needed
Environment Init *__p___initenv() = envp _get_initial_narrow_environment()
Wildcard Expansion _dowildcard flag Handled internally

GNU Code (unique argv deep copy!):

int32_t eax_13 = __getmainargs(&argc, &argv, &envp, _dowildcard, startup_info)

if (eax_13 >= 0) {
    argc_1 = argc
    eax_15 = malloc((argc_1 << 2) + 4)  // Allocate argv array
    
    if (eax_15 == 0)
        goto error
    
    // Deep copy each argument string
    for (ebx = 0; argc_1 != ebx; ebx++) {
        _Size = strlen(argv[ebx]) + 1
        int32_t eax_18 = malloc(_Size)
        eax_15[ebx] = eax_18
        
        if (eax_18 == 0)
            goto error
        
        memcpy(eax_18, argv[ebx], _Size)
    }
    
    eax_15[argc_1] = 0  // NULL terminate
    argv = eax_15
}

Why? GNU/MinGW creates a persistent copy of argv to prevent issues if the original environment is modified!

MSVC Code:

char** initialNarrowEnvironment = _get_initial_narrow_environment()
char** argv = *__p___argv()

4. Global Constructors (C++)

The Big Picture - What’s Global Constructors

When you write C++ code with global or static objects that have constructors, those constructors need to run before main() starts. The __main() function is responsible for calling all these constructors. This is a fundamental part of C++ runtime initialisation.

Aspect GNU (GCC/MinGW) MSVC
Constructor Mechanism __main()__do_global_ctors() Handled via _initterm() tables
Constructor Table __CTOR_LIST__ array Not visible
Destructor Registration atexit(__do_global_dtors) _register_thread_local_exe_atexit_callback()
Initialisation Flag initialized static variable State enum

GNU Code: This is the classic GCC global constructor pattern!

__main() {
    int initialized_1 = initialized
    if (initialized_1 != 0)
        return initialized_1
    
    initialized = 1
    return __do_global_ctors()
}

__do_global_ctors() {
    // Count constructors
    int32_t i_2 = 0
    do {
        i_1 = i_2
        i_2 += 1
    } while ((&__CTOR_LIST__)[i_2] != 0)
    
    // Call them in reverse order
    if (i_1 != 0) {
        do {
            (&__CTOR_LIST__)[i_1]()
            i = i_1
            i_1 -= 1
        } while (i != 1)
    }
    
    return atexit(__do_global_dtors)
}

This is the classic GCC global constructor pattern!

MSVC Code:

// Already handled in initialisation tables
_initterm(0x41915c, 0x419164)

// Later, register cleanup
if (data_420200 != 0 && sub_416f74(&data_420200) != 0)
    _register_thread_local_exe_atexit_callback(data_420200)

Why They’re Special?

Unlike local objects (which are constructed when execution reaches their declaration), global/static objects must be initialised before the program starts, specifically before main() begins execution.

How GCC/MinGW Implements This: The __main() Function?

GCC uses a special mechanism to track all constructors that need to be called:

  1. Constructor Lists: Arrays of function pointers
    • __CTOR_LIST__ - List of constructors
    • __DTOR_LIST__ - List of destructors
  2. The __main() Function: Calls all constructors
    • Defined in gccmain.c (part of MinGW CRT)
    • Called explicitly from startup code
    • Walks through __CTOR_LIST__ and calls each constructor

5. Call main() / Rust Entry Point

Aspect GNU (GCC/MinGW) MSVC
Main Wrapper _mainstd::rt::lang_start sub_401e10sub_402900
__main() Call Explicitly called in _main Not present
Arguments Passed (basic_pl_concepts::main, argc, argv, 0) (sub_401990, argc, argv, 0)
  • There are two similar main pre-function calls _main and __main (pay attention to the amount of underscores in the names).

GNU Code: Check xref:

6. Cleanup & Exit

Aspect GNU (GCC/MinGW) MSVC
Exit Decision Based on managedapp and has_cctor flags Based on sub_4171ca()
Quick Exit Return directly _cexit() then return
Full Exit Not shown exit(_Except) → noreturn
CRT Cleanup Not shown ___scrt_uninitialize_crt(1, 0)

GNU Code:

_Except = _main(argc, argv)

if (managedapp != 0)
    exit(_Except)

if (has_cctor != 0)
    _cexit()

return _Except

MSVC Code:

int32_t _Except = sub_401e10(*__p___argc(), argv)

if (sub_4171ca() == 0)
    exit(_Except)  // Never returns

if (entry_initializationFlagCopy.b == 0)
    _cexit()

___scrt_uninitialize_crt(1, 0)
return _Except
_main(int32_t arg1, int32_t arg2) {
    __main()  // ← IMPORTANT: Call global constructors AGAIN!
    
    return std::rt::lang_start::h8aca30958a1bfdec(
        basic_pl_concepts::main::h0f63fd3b1e96e122,
        arg1, arg2, 0
    )
}

Why call __main() twice?

  • First call in __tmainCRTStartup: initialises CRT C++ globals
  • Second call in _main: initialises Rust-specific globals
  • The initialized flag prevents double-execution

MSVC Code: It was difficult for me to analyse MSVC x86 binary first. But after conapring to GNU x86 binary and identify the patterns of these two different compilers for building Windows PE files, it’s much clearer now.

sub_401e10(int32_t arg1, int32_t arg2) {
    return sub_402900(sub_401990, arg1, arg2, 0)
}

Now, if we click on sub_401990, you will find the real Rust main function for the program.

See how much we have been through above! It’s time for analyse the real purpose of this program.

Finally!

  • It’s just the beginning!
  • We haven’t identified the patterns of Rust idioms yet!

7. main and Rust runtime startup routine

Rust Source Code - match

We can see Binary Ninja interprets that, from match in Rust to switch in HIL, a more readable format.

However!!

The contiguous strings can cause confusion, they are due to Rust compiler’s optimisation, which inlines strings together without null-terminator in the end of line.

Wrapper functions for Rust startup routine

The main function in a Rust binary compiled for Windows is typically a thin wrapper that calls the Rust runtime startup routine, specifically std::rt::lang_start (e.g., std::rt::lang_start::h8aca30958a1bfdec) and std::rt::lang_start_internal. These functions are responsible for initialising the Rust runtime, setting up stack guards, handling panics, and then invoking the actual user-defined main function.

  • std::rt::lang_start: This is the public entry point for Rust binaries. It sets up the runtime environment and calls std::rt::lang_start_internal.
  • std::rt::lang_start_internal: This function performs lower-level initialisation, including panic handling and catching unwinding, before calling the user main function.

These functions abstract away platform-specific initialisation and ensure that the Rust runtime is properly set up before the main logic runs.

8. for loop

To identify the for loop in the decompiled or disassembled code, look for a pattern that represents the Rust for i in 0..10 construct. In Rust, this is typically compiled into a manual loop over a range using an index variable, with bounds checking and incrementing.

HIL View: Pattern to look for in the binary:

  • Initialisation of a loop variable (i = 0)
  • A comparison against the upper bound (i < 10)
  • Conditional jump to exit the loop if the bound is reached
  • Loop body (the if i > 5 { … } and println! call)
  • Increment of the loop variable (i = i + 1)
  • Unconditional jump back to the comparison

The for loop will appear as a classic counted loop:

  • Set i = 0
  • Compare i < 10
  • If not, exit loop
  • If yes, execute body
  • Increment i
  • Jump back to compare

The for loop from the Rust source code (for i in 0..10 { if i>5 { … } }) is implemented in the function basic_pl_concepts::main::h0f63fd3b1e96e122 at address 0x4016d0. Warning about lifting. Besically I still appreciate the effort of Vector35 team, they provide several view (e.g. HIL, LIL, Advanced IL forms etc) for researchers to identify patterns.

  • At 0x4018a8, the loop starts.
  • At 0x4018b4, the iterator’s next method is called.
  • At 0x4018c7, if the iterator is exhausted, the loop breaks.

Summary Table

Step HLIL/Decompilation Clue Disassembly Clue
Iterator Setup Range struct/init mov/init two locals (start, end)
Next/Compare call to next/break on empty cmp/jge or call to next + test/je
Body Loop body code code block between cmp and inc/jmp
Increment Implicit in next or manual inc/add to start value
Loop while/for/loop jmp back to comparison

Disassembly View

In Rust Binaries

  • The pattern may be wrapped in iterator calls, so look for calls to functions like core::iter::range::next and checks for the iterator being exhausted.
  • The loop variable is often stored in a struct (the range iterator), and the next method is called each iteration.
  • Look for a call to a function named like core::iter::range::next, followed by a conditional jump based on its return value.

Identify struct p

.data - Writable data .rdata - Read-only data Now, go back to main of the Rust code

Contant 27

Rust source code

The constant ALICE_AGE: i64 = 27 from the Rust source code is a global constant with value 27. In the binary, it will appear as an immediate value (27) used in the initialization of the alice struct in main. There is no named global variable for ALICE_AGE—the compiler inlines this value wherever it is used. In hexadecimal, 0x1b euqals decimal value 27.


How exatcly we can identify this most interesting area?

Rust source code:

for i in 0..10 {
        if i>5 {
            println!(
                "In {} years, Alice will be {}", i, age_in_future(&alice,i)
            );
        }
    }

How Rust’s for i in 0..10 Loop is Disassembled

1. Loop Initialisation (0x401836 - 0x401852)

mov     dword [esp+0x158], 0x0    ; Range start = 0
mov     dword [esp+0x15c], 0xa    ; Range end = 10 (0xa)
mov     dword [esp+0x160_3], 0x0  ; Iterator state
mov     dword [esp+0x164_3], 0x0  ; Iterator state

The 0..10 range is converted into a Range<i64> structure with:

  • start = 0
  • end = 10

Then it calls IntoIterator::into_iter() to create an iterator.

2. Main Loop Structure (0x40189f - 0x401b04)

The loop follows this pattern:

a) Iterator Next Call (0x4018a8 - 0x4018b4)
lea     ecx, [esp+0xa0]           ; Load iterator reference
mov     dword [eax+0x4], ecx      
lea     ecx, [esp+0xb0]           ; Output location
mov     dword [eax], ecx
call    Range<A>::next            ; Get next value

This calls core::iter::range::Range::next() which returns an Option<i64>.

b) Check if Iterator is Exhausted (0x4018bb - 0x4018c7)
mov     eax, dword [esp+0xb0]     ; Load Option discriminant
test    eax, 0x1                  ; Check if Some(value)
je      0x4018fc                  ; If None, exit loop

The iterator returns:

  • Some(i) - discriminant has bit 0 set → continue loop
  • None - discriminant is 0 → exit loop

3. Conditional Check: if i > 5 (0x4018e9 - 0x4018f4)

mov     esi, dword [esp+0xc0]     ; Load i (low 32 bits)
mov     ecx, dword [esp+0xc4]     ; Load i (high 32 bits)
xor     eax, eax
mov     edx, 0x5
sub     edx, esi                  ; Compare: 5 - i
sbb     eax, ecx                  ; Signed subtraction with borrow
jl      0x401a16                  ; If i > 5, jump to println! block

This implements the comparison i > 5 by computing 5 - i and checking if the result is negative.

4. The println! Block (0x401a16 onwards)

When i > 5:

a) Call age_in_future(&alice, i) (0x401a32)
mov     ecx, dword [esp+0xc0]     ; i (low)
mov     edx, dword [esp+0xc4]     ; i (high)
mov     dword [eax+0x8], edx
mov     dword [eax+0x4], ecx
lea     ecx, [esp+0x28]           ; &alice
mov     dword [eax], ecx
call    basic_pl_concepts::age_in_future
b) Format Arguments (0x401a88 - 0x401aa4)

Creates formatting arguments for the two {} placeholders:

  • Argument 1: i value
  • Argument 2: Result from age_in_future()
c) Call Print Function (0x401afd)
call    std::io::stdio::_print

5. Loop Back (0x401b04)

jmp     0x40189f                  ; Jump back to loop start

Key Observations:

  1. Iterator Pattern: Rust’s for loop uses the Iterator trait, not a simple counter. The Range::next() method is called each iteration.

  2. **Option Return**: The iterator returns an `Option`, which is checked with a test instruction on the discriminant field.

  3. 64-bit Values on 32-bit: Since this is a 32-bit binary (i686), the i64 loop variable requires two registers (low/high 32 bits).

  4. Jump Table: There’s also a jump table at 0x401919 that handles a switch statement (likely for the favorite_beatle enum printing, which happens elsewhere in the code).

  5. No Simple Counter: Unlike C loops, there’s no visible inc instruction for a counter. Instead, the Range iterator internally manages the state.

This demonstrates how Rust’s high-level iterator abstraction compiles down to assembly that’s more complex than a traditional C-style for loop, but provides better type safety and abstraction guarantees.


Rust source code:

Carol’s favourite song is Beatle::Paul, which is the second option in match from Rust source code.

let song = match carol.favorite_beatle {
        Beatle::John => "Imagine",
        Beatle::Paul => "Yesterday",
        Beatle::George => "Here Comes The Sun",
        Beatle::Ringo => "Don't Pass Me By"
    }; // should evaluate to "Yesterday"
    println!("Carol's favorite song is {}", song);

Hence, the compiler directly allocated 2 to the register.

eax.b = 2
p.favorite_beatle = eax.b  # 2

Indentified Patterns

Discovered patterns:

  • for loop
  • struct
  • match

Result of Rust code

In 6 years, Alice will be 33
In 7 years, Alice will be 34
In 8 years, Alice will be 35
In 9 years, Alice will be 36
Carol's favorite song is Yesterday

Unused struct bob

    let bob = Person {
        name: String::from("Bob"),
        age: 71,
        favorite_beatle: Beatle::Ringo
    };

In the function age_in_future

The function basic_pl_concepts::age_in_future::hc7fa85f942b545c9 takes a pointer to a Person struct and a 64-bit integer years, and returns the sum of the person’s age and years.

  • It adds p->age and years.
  • If the addition would overflow, it triggers a panic (Rust’s checked addition semantics).
  • Otherwise, it returns the sum as the future age.

This function safely computes a person’s age in the future by adding years to their current age.

Types

Binary Ninja identifiees the sturuct and match, Rust idioms.

Favourite songs

Each case extracts a different Beatles song title from the concatenated string by using different offsets, and the second parameter appears to be the length of that song title:

Analyse the result of each case

Case 0x0

lea eax, [data_4ab0a8[0xd]]  → "ImagineYesterdayHere Comes The SunDon't Pass Me ByCarol's favorite song is \n"
mov dword [esp+0x110 (var_58)], eax
mov dword [esp+0x114 (var_54)], 0x7
  • Loads pointer to: String starting at offset 0xd (13 bytes into the data)
  • String starts with: “Imagine…”
  • Second parameter: 0x7 (7)
  • Result: Skips “AliceBobCarol” and points to “Imagine…”

Case 0x1

lea eax, [data_4ab0a8[0x14]]  → "YesterdayHere Comes The SunDon't Pass Me ByCarol's favorite song is \n"
mov dword [esp+0x110 (var_58)], eax
mov dword [esp+0x114 (var_54_1)], 0x9
  • Loads pointer to: String starting at offset 0x14 (20 bytes)
  • String starts with: “Yesterday…”
  • Second parameter: 0x9 (9)
  • Result: Skips “AliceBobCarolImagine” and points to “Yesterday…”

Case 0x2

lea eax, [data_4ab0a8[0x1d]]  → "Here Comes The SunDon't Pass Me ByCarol's favorite song is \n"
mov dword [esp+0x110 (var_58)], eax
mov dword [esp+0x114 (var_54_2)], 0x12
  • Loads pointer to: String starting at offset 0x1d (29 bytes)
  • String starts with: “Here Comes The Sun…”
  • Second parameter: 0x12 (18)
  • Result: Points to “Here Comes The Sun…”

Case 0x3

lea eax, [data_4ab0a8[0x2f]]  → "Don't Pass Me ByCarol's favorite song is \n"
mov dword [esp+0x110 (var_58)], eax
mov dword [esp+0x114 (var_54_3)], 0x10
  • Loads pointer to: String starting at offset 0x2f (47 bytes)
  • String starts with: “Don’t Pass Me By…”
  • Second parameter: 0x10 (16)
  • Result: Points to “Don’t Pass Me By…”

Result

  • Case 0: “Imagine” (7 chars)
  • Case 1: “Yesterday” (9 chars)
  • Case 2: “Here Comes The Sun” (18 chars)
  • Case 3: “Don’t Pass Me By” (16 chars)

Trailing text:

  • Carol's favorite song is \n - A phrase ending with a newline character Null terminator:
  • , 0 - The string is null-terminated (standard C string)

Notable Characteristics

  1. No delimiters: The names and song titles run together without spaces or separators, making this likely meant to be parsed programmatically
  2. Mixed content: Combines personal names with song titles in an unusual pattern
  3. Incomplete sentence: Ends with “Carol’s favorite song is \n” but doesn’t specify which song
  4. Size: The array is defined as [0x5a] which is 90 bytes in hexadecimal (decimal 90)

32-bit MSVC Debug Build (PE File)

MSVC Binary

  • File: basic_pl_concepts-x86-i686-msvc-debug.exe
  • Compiler: Microsoft Visual C++
  • Entry Point: 0x416d6a - _start
  • Architecture: x86 (32-bit)
  • Build Type: Debug

Compilation for this binary:

# MSVC toolchain
cargo build --release --target i686-pc-windows-msvc

0. Execution Summary

  • MSVC produced a smaller PE file compare to GNU compiled PE file.
  • MSVC PE file contains less symbols, for exmaple, there is no explicit main function.

Let’s start with 32-bit build. Before diving deeper, you might notice that rust strings are all demangled, so you might see lots of strings starting with _ZN or ?.

It’s clear and readable to see the Rust librairies used in the binary.

1. Entry Point

MSVC Startup Chain

MSVC has completely separate entry point functions with different names:

  • Console: mainCRTStartup calls main()
  • GUI: WinMainCRTStartup calls WinMain()
  • Unicode Console: wmainCRTStartup calls wmain()
  • Unicode GUI: wWinMainCRTStartup calls wWinMain()

MinGW/GCC uses a unified approach where both entry points exist but share the same initialization code (__tmainCRTStartup), differing only in the app type flag.

_start (0x416d6a)
    ↓
___security_init_cookie
    ↓
crt_startup (0x416be5)
    ↓
___scrt_initialize_crt
    ↓
_initterm_e / _initterm (initialisation tables)
    ↓
sub_401e10 (wrapper)
    ↓
sub_402900 (std::rt::lang_start equivalent)
    ↓
sub_401990 (actual user code)

Entry Point & Security

Aspect GNU (GCC/MinGW) MSVC
Entry Function mainCRTStartup _start
Stack Cookie Not visible in entry ___security_init_cookie() called first
Exception Handling _gnu_exception_handler registered SEH (Structured Exception Handling) with __except_handler4
Security Features Later in init sequence Immediate (highest priority)

GNU Code:

mainCRTStartup:
    __mingw_app_type = 0
    return __tmainCRTStartup()

MSVC Code:

_start:
    ___security_init_cookie()
    return crt_startup(initialStackPointer, initialBasePointer)

This is the program entry point, which is, the very first code that executes when the Windows executable runs.

int32_t _start()
{
    ___security_init_cookie()
    int32_t initialStackPointer
    int32_t initialBasePointer
    return crt_startup(
        processHandle: initialStackPointer,
        startupMode: initialBasePointer) __tailcall
}

This function initialises security cookies for stack buffer overflow protection. It is a security feature in MSVC (called “stack canary” or “security cookie”), which helps detect stack corruption and buffer overflows.

int32_t initialStackPointer and int32_t initialBasePointer

These capture the initial stack and base pointer values. The values are passed to the C runtime initialisation.

return crt_startup(...) __tailcall

  • This calls the C Runtime (CRT) startup function.
  • __tailcall means this is a tail call optimisation, which is, the function jumps to crt_startup rather than calling and returning. It passes the process handle and startup information to initialise the C runtime.

2. Program Initialisation

Analyse the code references to crtInitializationStateGlobal to determine its purpose, usage patterns, and typical values.

crtInitializationStateGlobal is a global int32_t variable at 0x4201b0 used exclusively in crt_startup to track the C runtime initialisation state:

  • 0 = uninitialised
  • 1 = initialising
  • 2 = initialised.

It is checked and set during startup to coordinate one-time CRT setup and prevent re-initialisation, supporting safe state transitions and error handling.

Let’s add some comments:

// Tracks CRT initialization state: 0=uninitialized, 1=initializing, 2=initialized
Proposed type: enum CRTInitState { Uninitialized=0, Initializing=1, Initialized=2 }; 

All code references to crtInitializationStateGlobal show it tracks CRT initialisation state (0=uninitialised, 1=initialising, 2=initialised) to coordinate safe, one-time startup; the variable is now renamed as CRTInitState and documented for clarity.

Apologise for mixed usage of American and British spelling, but sometimes the resources I used were mixed with different spelling!

3. Indentify main function

For this binary (32-bit), I didn’t find data cross-references and pointers in read-only memory (often vtable tables) at this point.

This suggests vtables and trait objects may be obfuscated, inlined, or use atypical layouts. Might have to implement manual inspection of cross-referenced read-only data and function signatures. But the question is, which is the first target. Which one is the specific function, address, or data region for us to conduct the deeper analysis?

Phase-by-Phase Comparison: x86 GNU (GCC/MinGW) vs MSVC

Phase 1: Entry Point & Security

Aspect GNU (GCC/MinGW) MSVC
Entry Function mainCRTStartup _start
Stack Cookie Not visible in entry ___security_init_cookie() called first
Exception Handling _gnu_exception_handler registered SEH (Structured Exception Handling) with __except_handler4
Security Features Later in init sequence Immediate (highest priority)

GNU Code:

mainCRTStartup:
    __mingw_app_type = 0
    return __tmainCRTStartup()

MSVC Code:

_start:
    ___security_init_cookie()
    return crt_startup(initialStackPointer, initialBasePointer)

Review crtStartup

The operating system doesn’t directly call main(), it calls the program’s entry point, which is crtStartup. This abstraction allows the C runtime to set up everything the code expects to be available (like malloc(), printf(), global variables, etc.) before the code run

The main function at 0x401990 is named main_logic, uses the cdecl calling convention, takes no parameters, and returns an int32_t; it processes composite Alice strings and structures, with no evidence of Rust-specific mangling or fat pointer usage. All findings and documentation have been applied for future type and control flow recovery.

In the context of this program, the main function at 0x401990 is named main_logic, uses the cdecl calling convention, takes no parameters, and returns an int32_t; it processes composite Alice strings and structures, with no evidence of Rust-specific mangling or fat pointer usage. All findings and documentation have been applied for future type and control flow recovery.

64-bit MSVC Debug Build (PE File)

1. Entry Point

This is the High Level Language (HIL). Sometimes we called it High-level Intermediat

This is the disassembly.

Just for comparison, it helps me see clearly how the variables and function calls work.

In the function __scrt_common_main_seh__, basically you can see many lines of CRT insitialisation. The most interesting function call is… main (see figure below).

2. Comments on CRT and Runtime Support Routines in Startup Sequence

Try to enumerate all referenced CRT and runtime support routines that are part of or invoked during the startup sequence, starting from _start and including direct and indirect cross-references.

For each function, I tried to document its role in the initialisation process (e.g., memory setup, exception handling, environment setup, I/O configuration).

int64_t _start

__scrt_common_main_seh

Initialisation State Machine

cif (rcx == 1)
    sub_140018270(7)
    noreturn

if (rcx != 0)
    rsi.b = 1
    char var_18_1 = 1
else
    data_140024268 = 1  // Mark as "initialising"
  • rcx == 0: First time initialisation needed → set to 1 (initialising)
  • rcx == 1: Already initialising (race condition) → abort
  • rcx != 0: Already initialised → skip initialisation

Call main
_get_initial_narrow_environment()
*__p___argv()
int32_t _Except = main(*__p___argc())

Finally, the actual program runs! You can see we get command-line arguments (argc, argv) here, including environment variables, and then call main() with arguments. Lastly, store the return value in _Except.

Comments on sub-functions

The Complete Startup Flow

_start()
  ↓
__scrt_common_main_seh()
  ↓
1. Initialise CRT (__scrt_initialize_crt)
2. Acquire startup lock
3. Check initialisation state
4. Run pre-main initialisers (_initterm_e, _initterm)
   - C++ global constructors
   - Static object initialisation
5. Release startup lock
6. Register exit callbacks
7. ★ Call main() ★  ← The CODE RUNS HERE
8. Cleanup and exit
  ↓
return exit code

3. Reconstruct the program’s logic from initialisation through to its core behaviour

To understand what the code is doing from the entry point (_start at 0x140017e50), I will follow the execution flow:

  1. Analyse sub_14001813c to see any early initialisation or setup it performs.
  2. Examine __scrt_common_main_seh, which is the C runtime’s main setup routine—this typically leads to the program’s main function.
  3. Trace how control passes from __scrt_common_main_seh to main and then analyse main and its callees (sub_140001270, sub_140002520, etc.).

I also found vtable struct when I browsed the function calls “by accident”, it will be useful later on.

3. Enumerate and Characterise _start Call Neighborhood

Callers of _start

None (entry point has no callers within the binary), because the operating system loader jumps directly to _start.

Enumerate all functions in the immediate call neighborhood of _start, including both direct and indirect callees such as sub_14001813c and sub_1400183c4. For each function, document its likely role (CRT, system, or custom logic), summarise its main actions, and highlight any that deviate from standard CRT startup patterns. Present results in a table for easy reference as the user explores the startup phase.

Entry Point

  • Address: 0x140017e50
  • Function: _start
  • Role: This is the program’s entry point (the first function executed)

_start makes two function calls:

  • sub_14001813c (Security Cookie Initialisation) at 0x140017e54
  • Purpose: Initialises the security cookie for stack buffer overflow protection
  • Key operations: Checks if __security_cookie is default value (0x2b992ddfa232)

sub_14001813c checks if __security_cookie is default value (0x2b992ddfa232)

sub_14001813c generates random cookie using: - GetSystemTimeAsFileTime() → current time - GetCurrentThreadId() → thread ID - GetCurrentProcessId() → process ID - QueryPerformanceCounter() → high-resolution counter - Stack address (&var_18)

sub_14001813c stores cookie in __security_cookie and its complement in data_140024100.

2. __scrt_common_main_seh (Main CRT Initialisation) at 0x140017e5d (tail call)
  • Full name: __scrt_common_main_seh
  • Address: 0x140017cd4
  • Purpose: Standard C Runtime (CRT) initialisation and main program execution

This function orchestrates the entire program startup:

Initialisation Phase:

  • __scrt_initialize_crt(1): Initialise C runtime
  • __scrt_acquire_startup_lock(): Acquire startup synchronization lock
  • _initterm_e(&data_14001a2f8, &data_14001a310): Execute C++ initialisers (can return errors)
  • _initterm(&data_14001a2e0, &data_14001a2f0): Execute C initialisers
  • __scrt_release_startup_lock(): Release startup lock

Reading:

  • Microsoft Learn - _initterm, _initterm_e : https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/initterm-initterm-e?view=msvc-170
  • GitHub Source (Microsoft Docs) URL: https://github.com/MicrosoftDocs/cpp-docs/blob/main/docs/c-runtime-library/reference/initterm-initterm-e.md

Pre-Main Setup:

  • __scrt_is_nonwritable_in_current_image(): Security checks
  • _register_thread_local_exe_atexit_callback(): Register cleanup handlers
  • _get_initial_narrow_environment(): Get environment variables
  • __p___argv(): Get command line arguments
  • __p___argc(): Get argument count

⭐️ Main Execution:

  • main(*__p___argc()) at 0x140001620: Execute user’s main function
  • This calls sub_140002520(sub_140001270): the actual Rust program logic

Cleanup Phase:

  • sub_140018284(): Check if cleanup needed
  • exit(_Except) or _cexit(): Normal termination
  • __scrt_uninitialize_crt(1, 0): Cleanup C runtime

One thing worths mentioning here, is that:

  • WARP > Match Functions in Binary Ninja is a feature that uses the Workflow-Assisted Reverse-engineering Platform (WARP) system to identify and match unctions in the binary against a large database of known functions from other binaries or libraries.
  • This automated matching helps you quickly recognizse standard library functions, compiler-generated code, or reused code across different binaries, improving analysis efficiency and accuracy by automatically renaming and annotating matched functions.

Error Handling:

  • sub_140018270(7): Called on initialisation errors

Call Graph Summary

_start (0x140017e50) [ENTRY POINT]
├── sub_14001813c() [Security Cookie Init]
│   ├── GetSystemTimeAsFileTime()
│   ├── GetCurrentThreadId()
│   ├── GetCurrentProcessId()
│   └── QueryPerformanceCounter()
│
└── __scrt_common_main_seh() [TAIL CALL]
    ├── __scrt_initialize_crt(1)
    ├── __scrt_acquire_startup_lock()
    ├── _initterm_e()
    ├── _initterm()
    ├── __scrt_release_startup_lock()
    ├── _get_initial_narrow_environment()
    ├── main() [0x140001620] ← THE PROGRAM
    │   └── sub_140002520(sub_140001270)
    ├── _cexit() or exit()
    └── __scrt_uninitialize_crt(1, 0)

4. main function

In Rust program, usually there are wrappers for entry point functions and main function (see figures below).

HIL View

Disassembly

A - sub_140002520 in x64 Rust Program

Normally, pure C program should look liks this (x64 PE file can be found at ../../datasets/Benigh-Samples/01-basic-pl-concept/c-output/hello-x64.exe):

#include <stdio.h>

int main(int argc, char** argv) {
    printf("Hello World\n");
    return 0;
}

Disaseembly of simple C program:

main:
    sub    rsp, 0x28           ; Allocate stack
    lea    rcx, [string]       ; Load "Hello World\n"
    call   printf              ; Call printf directly
    xor    eax, eax            ; return 0
    add    rsp, 0x28           ; Cleanup
    ret

No wrapper needed - main directly contains the code logic.

main in C program

The __main() call at 0x14000145f in the hello-x64.exe is a GCC/MinGW-specific initialisation mechanism.

It Guards against re-initialization using a static flag and calls global C++ constructors by walking __CTOR_LIST__. Also, __main registers global destructors via atexit(__do_global_dtors), usually executing before any user code in main.

You can always check Cross References.

This is functionally equivalent to MSVC’s _initterm_e() mechanism but implemented differently. In a simple C program with no global objects, the constructor list will be nearly empty, making this call very fast. However, in C++ programs with global objects, this is critical for proper initialisation.

Comparison: MSVC vs GCC/MinGW

clang basically is similar to GCC/MinGW, so I didn’t include it in the table below.

Aspect MSVC GCC/MinGW (this binary)
Constructor mechanism .CRT$XC* sections __CTOR_LIST__ array
When constructors run In CRT startup (before main) Via __main() call in main
Initialisation function _initterm_e() __do_global_ctors()
Destructor registration _initterm() with .CRT$XP* atexit(__do_global_dtors)
Explicit call required No Yes (__main() in main)

Architecture & Design Philosophy

Feature GNU (GCC/MinGW) MSVC
Modularity Multiple discrete functions Integrated into fewer functions
State Tracking ___native_startup_state integer crtInitializationStateGlobal enum
Thread Safety Stack-based detection with sleep loop Startup lock mechanism
Security First Security features later in sequence Security cookie initialised first

Unique GNU/MinGW Features

  1. _pei386_runtime_relocator() - MinGW-specific runtime relocations for PE32
  2. Argv deep copy - Persistent copy of command-line arguments
  3. Triple TLS force flags - initltsdrot, initltsdyn, initltssuo
  4. Stack base detection loop - Multi-threading/debugging detection
  5. __main() double-call - Once for CRT C++, once for Rust
  6. __CTOR_LIST__ / __DTOR_LIST__ - Classic GCC constructor tables
  7. Manual COM/file mode setup - Explicit __p__fmode() / __p__commode()

Unique MSVC Features

  1. ___security_init_cookie() - Immediate stack canary setup
  2. SEH frames - Built-in exception handling infrastructure
  3. Startup lock mechanism - ___scrt_acquire_startup_lock()
  4. State enum - UninitializedInitializingInitialized
  5. Integrated CRT init - Single ___scrt_initialize_crt() call
  6. Thread-local exit callbacks - _register_thread_local_exe_atexit_callback

32-bit GNU Release Build (PE File)

GNU Binary (GCC/MinGW)

  • File: basic_pl_concepts-x86-i686-release-gnu.exe
  • Compiler: GCC (MinGW-w64 toolchain)
  • Entry Point: 0x401410 - mainCRTStartup
  • Architecture: 32-bit x86 (i686)
  • Build Type: release

0. Execution Summary

The entry point function at 0x401410 is _mainCRTStartup, which sets ___mingw_app_type to 0 and then tail-calls ___tmainCRTStartup(), delegating further initialisation to the C runtime startup routine.

There is no std::rt::lang_start in release build, only std::rt::lang_start_internal (0x427850) to initialise Rust std library.

1. Entry point

As previously discussed, _WinMainCRTStartup can be ignored.

mainCRTStartup (0x401410) - PE Entry Point
  ↓
  • Initialise flag: dword[0x4e222c] = 0
  • Jump to __tmainCRTStartup
  ↓
__tmainCRTStartup (0x401010) - Main CRT Startup
  ↓
  [Complete CRT initialisation - all 7 phases]
  ↓
_main (0x4017f0) - C Main Wrapper (Setup Rust Runtime Entry)
  ↓
std::rt::lang_start_internal (0x427850) - Initialise Rust Standard Library  
  ↓
basic_pl_concepts::main::h524223c2eb0d038e (0x4015d0) ★ YOUR OptimisED RUST CODE (RELEASE BUILD) ★     
  ↓
Return to std::rt::lang_start_internal
  ↓
Return to _main
  ↓
Return to __tmainCRTStartup
  ↓
__tmainCRTStartup cleanup:
  • _cexit() - Run exit handlers
  • Cleanup resources  
  • exit(exit_code) - Terminate process
  ↓
Process Terminates

  • Entry Point: mainCRTStartup (0x401410)
  • Purpose: The official entry point defined in the PE header

mainCRTStartup initialises a global variable at 0x4e222c to 0, then immediately jumps to __tmainCRTStartup at 0x401010

2. Compiler Optimisations Applied

  • No Person structs created
  • No String allocations
  • Loop completely unrolled
  • All values computed at compile time
  • Enum match resolved statically

3. main

The code in basic_pl_concepts::main::h524223c2eb0d038e hardcodes the print calls for i = 6..9 and corresponding ages, reusing the format string at 0x4aa0ac, and calls std::io::stdio::_print() for each; the final print uses a different string at 0x4aa05c and 0x4aa080, followed by standard function epilogue and return.

4. How to Recognise Patterns in Optimised Binaries

Carol’s favourite song

Iteration 1

hex_values = [0x21, 0x22, 0x23, 0x24]
for h in hex_values:
    print(f"0x{h:02x} = {h}")

Output:

0x21 = 33
0x22 = 34
0x23 = 35
0x24 = 36

Repeating Number Pattern

00401619  c7 44 24 18 06 00 00 00   mov dword [esp+0x18], 0x6
00401629  c7 04 24 21 00 00 00      mov dword [esp], 0x21      ; 33 decimal

00401691  c7 44 24 18 07 00 00 00   mov dword [esp+0x18], 0x7
004016a1  c7 04 24 22 00 00 00      mov dword [esp], 0x22      ; 34 decimal

004016fb  c7 44 24 18 08 00 00 00   mov dword [esp+0x18], 0x8
0040170b  c7 04 24 23 00 00 00      mov dword [esp], 0x23      ; 35 decimal

00401757  c7 44 24 18 09 00 00 00   mov dword [esp+0x18], 0x9
00401767  c7 04 24 24 00 00 00      mov dword [esp], 0x24      ; 36 decimal

Pattern Recognition:

  • Numbers increment by 1: 6, 7, 8, 9
  • Paired with: 0x21, 0x22, 0x23, 0x24 (33, 34, 35, 36)
  • Deduction: This is a loop! for i in 6..10
  • Relationship: 33 = 27 + 6 → Someone is 27 years old, calculating future age

Format String Analysis

Address: 0x4aa090
String: "In  years, Alice will be "
         ↑↑
Notice the TWO spaces! This is for formatting a number.

Pattern: “In {} years, Alice will be {}”

  • First {} → loop variable (6, 7, 8, 9)
  • Second {} → calculated age (33, 34, 35, 36)

The string at 0x4aa090 is “In years, Alice will be “, with two spaces marking the positions for the formatted numbers; this matches the pattern “In {} years, Alice will be {}”, where the first placeholder is the loop variable (i) and the second is the calculated age.

Address Instruction Value (Hex) Value (Dec)  
  0x401619 mov dword [esp+0x18], 0x6 0x6 6
  0x401629 mov dword [esp], 0x21 0x21 33
  0x401691 mov dword [esp+0x18], 0x7 0x7 7
  0x4016a1 mov dword [esp], 0x22 0x22 34
  0x4016fb mov dword [esp+0x18], 0x8 0x8 8
  0x40170b mov dword [esp], 0x23 0x23 35
  0x401757 mov dword [esp+0x18], 0x9 0x9 9
  0x401767 mov dword [esp], 0x24 0x24 36

In Binary Ninja Python console or external script

*Organise Data into Pairs**

Notice the pattern that values always come in pairs before each call _print:

Iteration 1:  [esp+0x18] = 6,  [esp] = 33
Iteration 2:  [esp+0x18] = 7,  [esp] = 34
Iteration 3:  [esp+0x18] = 8,  [esp] = 35
Iteration 4:  [esp+0x18] = 9,  [esp] = 36

The values for each print are set up in the function basic_pl_concepts::main::h524223c2eb0d038e at 0x4015d0, specifically at the following HLIL code addresses:

  • For i=6, age=33: values are set up around 0x401619 (loop var) and 0x401630 (age), followed by the print call at 0x401655
  • For i=7, age=34: values are set up around 0x401691 (loop var) and 0x4016a8 (age), followed by the print call at 0x4016bd
  • For i=8, age=35: values are set up around 0x4016fb (loop var) and 0x40170b (age), followed by the print call at 0x401727
  • For i=9, age=36: values are set up around 0x401757 (loop var) and 0x401767 (age), followed by the print call at 0x40178f

What we recognise here

Each pair is loaded just before the corresponding call to std::io::stdio::_print.

Identifying “Age” at 0x4016a8Step

1. Understand Rust’s println! Format

- Rust's `println!` macro compiles to: ```rust println!("In {} years, Alice will be {}", years, age)
     └── arg 1 ──┘  └── string ──┘  └── arg 2 ──┘ ``` This becomes to: ``` Format string: "In {} years, Alice will be {}" Arguments: [years, age]
       └─ 1st ─┘ └─ 2nd ─┘ ``` #### 2. Find the Format String Structure ![](/The-State-of-Rust-in-Malware-Programming/images/01-basic-pl-concepts/84-format-string-structure.png)

At 0x4016a1 (mov [esp], 0x22), the value 34 (age) is placed as the second argument for the format string, while at 0x401691 (mov [esp+0x18], 0x7), the value 7 (years) is set as the first argument; these match the Rust println! macro’s argument order for the format string “In {} years, Alice will be {}”.

Let’s look at the disassembly around 0x4016a8:

; Second iteration (i=7, age=34)
00401671  c7 44 24 24 ac a0 4a 00   mov [esp+0x24], 0x4aa0ac  ; Format descriptor
00401679  c7 44 24 28 03 00 00 00   mov [esp+0x28], 0x3       ; 3 string fragments
00401681  c7 44 24 34 00 00 00 00   mov [esp+0x34], 0x0       
00401689  c7 44 24 1c 00 00 00 00   mov [esp+0x1c], 0x0       
00401691  c7 44 24 18 07 00 00 00   mov [esp+0x18], 0x7       ; ← FIRST value (7)
00401699  c7 44 24 04 00 00 00 00   mov [esp+0x4], 0x0        
004016a1  c7 04 24 22 00 00 00      mov [esp], 0x22           ; ← SECOND value (34)
                                                                ;   0x22 = 34 decimal
004016a8  c7 44 24 14 20 1f 49 00   mov [esp+0x14], 0x491f20  ; fmt function ptr
004016b0  89 44 24 2c                mov [esp+0x2c], eax       
004016b4  c7 44 24 30 02 00 00 00   mov [esp+0x30], 0x2       ; 2 arguments
004016bc  56                         push esi
004016bd  e8 fe 12 03 00             call _print               ; Call print!

3. Decode the Format String Table

At 0x4aa0ac, the format descriptor is a table of pointers and lengths that define the string fragments for formatting—each entry pairs a pointer to a string segment (e.g., “In “, “ years, Alice”, “ will be “) with its length, allowing the print function to reconstruct the full format string with inserted arguments.

At 0x4aa0ac, we have the format descriptor:

Offset | Value      | Meaning
-------|------------|------------------------------------------
+0x00  | 0x4aa090   | → Pointer to "In "
+0x04  | 0x00000003 | → Length of "In " = 3 bytes
+0x08  | 0x4aa093   | → Pointer to " years, Alice"  
+0x0c  | 0x00000016 | → Length = 22 bytes (0x16)
+0x10  | 0x4aa07e   | → Pointer to " will be "
+0x14  | 0x00000001 | → Length = 1 byte

This creates the template:

"In {} years, Alice will be {}"
 └─1─┘ └────────2────────┘ └─3─┘
   ↑                          ↑
 arg[0]                    arg[1]

4. Map Stack Positions to Arguments

Looking at the stack layout before _print:

Stack Layout Analysis:
┌──────────────┬─────────────────────────────────────┐
│ [esp+0x24]   │ 0x4aa0ac (format string descriptor) │
│ [esp+0x28]   │ 0x3 (number of string pieces)       │
│ [esp+0x30]   │ 0x2 (number of arguments)           │
├──────────────┼─────────────────────────────────────┤
│ [esp+0x18]   │ 0x7 (First argument: YEARS)         │ ← arg[0]
│ [esp]        │ 0x22 = 34 (Second argument: AGE)    │ ← arg[1]
├──────────────┼─────────────────────────────────────┤
│ [esp+0x8]    │ Pointer to arg[0]                   │
│ [esp+0x10]   │ Pointer to arg[1]                   │
│ [esp+0xc]    │ 0x491f20 (Display::fmt for i64)     │
│ [esp+0x14]   │ 0x491f20 (Display::fmt for i64)     │
└──────────────┴─────────────────────────────────────┘

The order matters!

  • First {} in format string → Takes argument at [esp+0x18] = 7 (years)
  • Second {} in format string → Takes argument at [esp] = 34 (age)

5. Reconstructed Rust code

// Deduced from the binary:
fn main() {
    let alice_age = 27;  // Computed from 33 - 6 = 27
    
    // Loop unrolled to only i=6,7,8,9 in binary
    // Original probably: for i in 0..10 { if i > 5 { ... } }
    for years in 0..10 {
        if years > 5 {
            println!("In {} years, Alice will be {}", 
                     years, 
                     alice_age + years);
        }
    }
    
    // Carol's favorite song
    let carol_favorite = "Yesterday";  // Hardcoded in binary
    println!("Carol's favorite song is {}", carol_favorite);
}

6. Pattern Recognition:

  • [esp+0x18] increments: 6 → 7 → 8 → 9 (loop counter)
  • [esp] increments: 33 → 34 → 35 → 36 (calculated value)
  • Relationship: [esp] = [esp+0x18] + 27

7. Semantic Deduction:

  • Loop counter = “years in the future”
  • Calculated value = “future age”

🎓 Key Principles learnt

  1. Look for repetition → Suggests loops
  2. Extract all constants → Build data set
  3. Test arithmetic operations → Addition, subtraction, multiplication
  4. Verify consistency → Same formula across all data points
  5. Context from strings → “years” + “age” = time calculation

8. Runtime Argument Order Convention

Rust follows this calling convention for println!:

std::io::stdio::_print(&Arguments {
    pieces: &["In ", " years, Alice will be "],
    args: &[
        Argument { value: &years, formatter: Display::fmt },  // ← arg[0]
        Argument { value: &age,   formatter: Display::fmt },  // ← arg[1]
    ]
})

Stack layout mirrors this:

Arguments Array:
  [0] → years (at [esp+0x18])
  [1] → age   (at [esp])

9. Trace Function Pointer Usage

Notice at 0x4016a8, this is a function pointer to display the integer. It points to core::fmt::Debug for i64. This confirms it’s formatting an integer for display.

004016a8  c7 44 24 14 20 1f 49 00   mov [esp+0x14], 0x491f20

10. Summary Flowchart

Key Differences: Debug vs Release Build

Debug Build (0x4016d0):

  • Creates actual Person structs on stack
  • locates Strings on heap
  • Full loop with iterator
  • All conditional logic present
  • 352 bytes stack frame
  • Readable variable names
  • Pattern matching logic intact

Release Build (0x4015d0):

  • No structs created, they are completely eliminated
  • No heap allocations, they are all on stack
  • Loop unrolled, only 4 iterations (i=6,7,8,9)
  • Values precomputed, ages calculated at compile time
  • 64 bytes stack frame, which is 82% reduction!
  • Constant folding that “Yesterday” hardcoded
  • Dead code eliminated, which means, removed i=0..5 iterations

Finding!

This is a perfect example of Rust’s zero-cost abstractions!


Rust Runtime Initialisation

Why Rust Needs MORE

Rust has additional runtime requirements beyond C:

_start
└── __scrt_common_main_seh() [C Runtime]
    └── main() [C-compatible entry]
        └── sub_140002520() [Rust Runtime - std::rt::lang_start]
            ├── Initialise panic handler
            ├── Initialise allocator
            ├── Setup thread locals
            ├── Initialise backtrace support
            └── sub_140001270() [RUST CODE]

What Rust Initialises That C Doesn’t

Feature C Rust
Stack canaries ✅ (via CRT) ✅ (via CRT)
Global constructors ✅ (via _initterm) ✅ (via _initterm)
Heap allocator ✅ (malloc ready) ✅ (custom allocator setup)
Panic handler ✅ (Rust-specific)
Unwinding support ❌ (longjmp/SEH only) ✅ (Rust panic unwinding)
Thread-local storage Minimal ✅ (Rust’s TLS model)
Backtrace initialisation ✅ (for panic messages)
Command-line encoding Basic ✅ (UTF-8 validation/conversion)

Key Differences Illustrated

C Program Entry

OS Loader
  ↓
_start (CRT)
  ↓
__scrt_common_main_seh (CRT initialisation)
  ↓
main() ← THE C CODE DIRECTLY
  ↓
exit (CRT cleanup)

Rust Program Entry

OS Loader
  ↓
_start (CRT)
  ↓
__scrt_common_main_seh (C runtime initialisation)
  ↓
main() [Trampoline wrapper]
  ↓
std::rt::lang_start (Rust runtime initialisation)
  ↓
std::rt::lang_start_internal
  ↓
RUST main() ← THE RUST CODE
  ↓
Rust cleanup + CRT cleanup

Compare to C program initilisation - Stage 1

C programs have runtime initialisation (_start __scrt_common_main_seh),

Aspect C Rust
CRT initialisation ✅ Yes ✅ Yes
(inherits C’s)Language runtime ❌ No extra layer ✅ Yes (std::rt::lang_start)
Main function Direct entry Wrapped/indirect entry
Complexity Lower Higher

The wrapper function you see (main calling sub_140002520) is Rust-specific - it’s the Rust standard library’s runtime initialization that C doesn’t need.

A pure C program would have the code directly in main without this extra indirection.

C vs Rust Runtime Initialisation - Stage 2

Feature C Rust Key References
Stack canaries ✅ (via CRT /GS) ✅ (inherits CRT) MS Learn /GS
Global constructors ✅ (via _initterm) ✅ (via _initterm) MS Learn _initterm
Heap allocator ✅ (malloc ready) ✅ (custom setup) Rust RFC 1974
Panic handler ✅ (Rust-specific) Rust Book Ch9
Unwinding support ❌ (longjmp/SEH) ✅ (panic unwinding) Rust std::rt
Thread-local storage Minimal ✅ (Rust TLS model) Rust Reference
Backtrace initialisation ✅ (panic messages) SO Backtrace
Command-line encoding Basic ✅ (UTF-8 validation) Rust rt.rs

Comparison Table: C VS C++ VS Rust

Language Layers What Initialises
C 2 layers OS → CRT → main()
C++ 2 layers OS → CRT (+ constructors) → main()
Rust 3 layers OS → CRT → Rust runtime → main()

Comparison: x86 vs x86-64

Key Differences

Features x86 (32-bit) x86-64 (64-bit)
Address Size 0x00416d6a 0x140017e50
Integer Types int32_t int64_t
Binary size 127 KB 148 KB
Calling convention cdecl/stdcall __fastcall (args in registers)
Registers EAX, EBP, ESP RAX, R10, GS
Entry Point crt_startup() __scrt_common_main_seh()

Assembly Differences

x86 (32-bit):

push ebp
mov ebp, esp
sub esp, 0x20
; Use 32-bit registers

x86-64 (64-bit):

push rbp
mov rbp, rsp
sub rsp, 0x40
; Use 64-bit registers, more parameter passing in registers

Optimisation Impact Analysis

Code Size Comparison (x86 32-bit)

Build Type Size Notes
Default release 116 KB opt-level=3 (default)
Explicit O3 103 KB No change from default
Aggressive 103 KB LTO, strip, panic=abort

Optimisation Effects

LTO (Link-Time Optimisation):

  • Cross-crate inlining
  • Better dead code elimination
  • ~5-10% size reduction

Strip:

  • Removes debug symbols
  • Smaller binary
  • Harder to reverse engineer

Panic = “abort”:

  • Simpler panic handler
  • No unwinding code
  • Smaller binary

Codegen-units = 1:

  • Better optimisation opportunities
  • Longer compile time
  • Slightly smaller/faster code

Learning Exercises

Common Patterns

Enum Discrimination

; Loading enum discriminant
mov eax, [rbp-8]       ; Load enum value
cmp eax, 0             ; Compare with variant 0
je .variant_john       ; Jump if John
cmp eax, 1             ; Compare with variant 1
je .variant_paul       ; Jump if Paul
; ... etc

String Construction

; String::from() call
lea rdi, [rip + str_data]  ; String data pointer
mov rsi, str_len           ; String length
call _ZN3std6string6String4from

Panic Handler

; Panic location structure
lea rdi, [rip + .Lpanic_loc]
lea rsi, [rip + .Lpanic_msg]
call _ZN4core9panicking9panic_fmt

References


↑ Back to Top

On This Page