Shellcode

Shellcode 是一段设计用于利用计算机系统漏洞的机器代码。它通常是一系列二进制指令，以字节形式表示，并且通常编写为目标特定的机器码，以在目标系统上执行特定的操作。

实验内容

Overview
Shellcode is widely used in many attacks that involve code injection. Writing shellcode is quite challenging.
Although we can easily find existing shellcode from the Internet, there are situations where we have to write
a shellcode that satisfies certain specific requirements. Moreover, to be able to write our own shellcode from
scratch is always exciting. There are several interesting techniques involved in shellcode. The purpose of
this lab is to help students understand these techniques so they can write their own shellcode.

There are several challenges in writing shellcode, one is to ensure that there is no zero in the binary, and
the other is to find out the address of the data used in the command. The first challenge is not very difficult
to solve, and there are several ways to solve it. The solutions to the second challenge led to two typical
approaches to write shellcode. In one approach, data are pushed into the stack during the execution, so their
addresses can be obtained from the stack pointer. In the second approach, data are stored in the code region,
right after a call instruction. When the call instruction is executed, the address of the data is treated as
the return address, and is pushed into the stack. Both solutions are quite elegant, and we hope students can
learn these two techniques. This lab covers the following topics:
• Shellcode
• Assembly code
• Disassembling

本实验给出了一些简单的利用shellcode完成攻击的场景，并介绍了成功编写Shellcode的挑战的注意事项：如何保证二进制中不能出现0x00、如何找到指令中所用到的数据的地址。

实验资料网址：https://seedsecuritylabs.org/Labs_20.04/Software/Shellcode/

实验中各任务的要求参照实验指导PDF。

Task1. Writing shellcode

Task1.a The Entire Process

①使用nasm命令将示例的汇编代码编译成ELF二进制格式。

②使用ld命令将mysh.o编译成可执行二进制文件，-m elf_i386选项意味着生成32位的ELF二进制文件，这个步骤之后，会获得最终的可执行二进制代码文件mysh，运行它，会生成一个shell。

③运行mysh并检测效果。

两次打印的shell的PID不相同，证明mysh确实打开了另外一个shell，功能正常运行。

④在攻击中，起作用的是机器码，而不只是标准的可执行文件。所以我们需要从可执行文件中提取出需要的机器码。

可以使用objdump命令对mysh.o进行反汇编，–Mintel选项是指用Intel的语法模式（另外一种模式是AT&T）。

左边部分是机器码，右边是汇编。

我们再使用xxd命令打印整个二进制程序mysh.o的机器码，在其中找到我们需要的段。

-p 表示使用’plain’输出模式，将文本内容以纯十六机制形式显示，而不包含行号或ASCII文本。

-c 20 指定每行显示20个16进制字符的数量。

这一部分就是我们需要的机器码。

⑤使用实验给出的convert.py程序，将真正的shellcode转换并储存在python的数组中，以便实施攻击。

先将shellcode粘贴到convert.py文件中：

运行convert.py，生成存有shellcode的python数组：

Task1.b Eliminating Zeros(0x00) from the code

Shellcode 广泛用于缓冲区溢出攻击。但很多情况下，攻击会因为字符串复制而失败，例如strcpy()函数。对于这些字符串复制函数，零被视为字符串的结尾。因此，如果我们在 shellcode 中间有一个零，字符串复制将无法将零之后的任何内容从该 shellcode 复制到目标缓冲区，因此攻击将无法成功。尽管并非所有Shellcode都存在零问题，但 shellcode 要求机器码中不能有任何零；否则，shellcode的应用将会受到限制。

在mysh.s中有4个不同的地方都用到了零，找出mysh.s中所有用到零的地方，并解释代码是如何在不出现零的条件下使用零。

其中1、2、3都是通过xor操作将寄存器的值置为0，而不是直接使用mov指令，如果使用mov指令，那么机器码必然包含0x00。

4的作用是将0x0b赋值给al(eax的低8位)，这样eax的值就为0x0000000b，如果用mov eax,0x0b，那么实际的操作数其实是0x0000000b，在内存中有三个0x00。

还有一种避免出现0x00的方法就是“位移”。

如果要将0x007A7978赋值给ebx，如果直接使用mov指令，那么机器码中必然会出现0x00。但是可以先用一个字节的占位符#，即把0x237A7978值赋值给ebx，然后再让ebx左移8位然后又右移8位，这样最终ebx的值会从0x237A7978变成0x007A7978，但是整个指令的机器码中不会含有0x00。

本次任务的要求为：将执行/bin/sh改为执行bin/bash，但是不能通过加多余斜杠的方式来使得补齐push的四字节。

bin/bash一共九个字节，而push是以四个字节为单位进栈的。如果直接push ‘h’会导致最终的机器码中存在0x00。

那么我们可以采用位移的方式解决这个问题。

使用三个占位符#，将’h###’赋值给ecx，然后对让ecx先左移24位然后又右移24位，再push ecx即可。