angr入门

韩乔落

Angr简介

本文主要通过angr_ctf 入门 angr。

符号执行

符号执行 (Symbolic Execution)是一种程序分析技术,它可以通过分析程序来得到让特定代码区域执行的输入。顾名思义,使用符号执行分析一个程序时,该程序会使用符号值作为输入,而非一般执行程序时使用的具体值。在达到目标代码时,分析器可以得到相应的路径约束,然后通过约束求解器来得到可以触发目标代码的具体值。

Angr

Angr是加州大学圣芭芭拉分校基于Python设计的工具,它结合了静态分析技术与动态分析技术是当前符号化执行领域较为先进的工具,其挖掘漏洞效果好,在许多竞赛中表现卓越。Angr总的来说是一个多架构的二进制分析平台,具备对二进制文件的动态符号执行能力和多种静态分析能力。在逆向中,一般使用的其动态符号执行解出Flag,但其实Angr还在诸多领域存在应用,比如对程序脆弱性的分析中。

00_angr_find

IDA 静态分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int __cdecl main(int argc, const char **argv, const char **envp)
{
int i; // [esp+1Ch] [ebp-1Ch]
char s1[9]; // [esp+23h] [ebp-15h] BYREF
unsigned int v6; // [esp+2Ch] [ebp-Ch]

v6 = __readgsdword(0x14u);
printf("Enter the password: ");
__isoc99_scanf("%8s", s1);
for ( i = 0; i <= 7; ++i )
s1[i] = complex_function(s1[i], i);
if ( !strcmp(s1, "JACEJGCS") )
puts("Good Job.");
else
puts("Try again.");
return 0;
}
// 处理函数
int __cdecl complex_function(int a1, int a2)
{
if ( a1 <= 64 || a1 > 90 )
{
puts("Try again.");
exit(1);
}
return (3 * a2 + a1 - 'A') % 26 + 'A';
}

这题基础题主要是熟悉一下Angr的基本使用步骤,一般来说使用Angr的步骤可以分为:

  • 创建 project
  • 设置 state
  • 新建符号量 : BVS (bitvector symbolic ) 或 BVV (bitvector value)
  • 把符号量设置到内存或者其他地方
  • 设置 Simulation Managers , 进行路径探索的对象
  • 运行,探索满足路径需要的值
  • 约束求解,获取执行结果

scaffold00.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import angr
import sys

def main(argv):
# Create an Angr project.
# If you want to be able to point to the binary from the command line, you can
# use argv[1] as the parameter. Then, you can run the script from the command
# line as follows:
# python ./scaffold00.py [binary]
# (!)
path_to_binary = "./00_angr_find" # :string
project = angr.Project(path_to_binary)

# Tell Angr where to start executing (should it start from the main()
# function or somewhere else?) For now, use the entry_state function
# to instruct Angr to start from the main() function.
initial_state = project.factory.entry_state()

# Create a simulation manager initialized with the starting state. It provides
# a number of useful tools to search and execute the binary.
simulation = project.factory.simgr(initial_state)

# Explore the binary to attempt to find the address that prints "Good Job."
# You will have to find the address you want to find and insert it here.
# This function will keep executing until it either finds a solution or it
# has explored every possible path through the executable.
# (!)
print_good_address = 0x8048678 # :integer (probably in hexadecimal)
simulation.explore(find=print_good_address)

# Check that we have found a solution. The simulation.explore() method will
# set simulation.found to a list of the states that it could find that reach
# the instruction we asked it to search for. Remember, in Python, if a list
# is empty, it will be evaluated as false, otherwise true.
if simulation.found:
# The explore method stops after it finds a single state that arrives at the
# target address.
solution_state = simulation.found[0]

# Print the string that Angr wrote to stdin to follow solution_state. This
# is our solution.
print(solution_state.posix.dumps(sys.stdin.fileno()))
else:
# If Angr could not find a path that reaches print_good_address, throw an
# error. Perhaps you mistyped the print_good_address?
raise Exception('Could not find the solution')

if __name__ == '__main__':
main(sys.argv)
1
2
3
4
5
6
7
8
9
10
> python3 scaffold00.py
WARNING | 2024-07-15 15:50:18,672 | angr.storage.memory_mixins.default_filler_mixin | The program is accessing register with an unspecified value. This could indicate unwanted behavior.
WARNING | 2024-07-15 15:50:18,672 | angr.storage.memory_mixins.default_filler_mixin | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2024-07-15 15:50:18,672 | angr.storage.memory_mixins.default_filler_mixin | 1) setting a value to the initial state
WARNING | 2024-07-15 15:50:18,672 | angr.storage.memory_mixins.default_filler_mixin | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2024-07-15 15:50:18,673 | angr.storage.memory_mixins.default_filler_mixin | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to suppress these messages.
WARNING | 2024-07-15 15:50:18,674 | angr.storage.memory_mixins.default_filler_mixin | Filling register edi with 4 unconstrained bytes referenced from 0x80486b1 (__libc_csu_init+0x1 in 00_angr_find (0x80486b1))
WARNING | 2024-07-15 15:50:18,678 | angr.storage.memory_mixins.default_filler_mixin | Filling register ebx with 4 unconstrained bytes referenced from 0x80486b3 (__libc_csu_init+0x3 in 00_angr_find (0x80486b3))
WARNING | 2024-07-15 15:50:23,213 | angr.storage.memory_mixins.default_filler_mixin | Filling memory at 0x7ffeff60 with 4 unconstrained bytes referenced from 0x819fb90 (strcmp+0x0 in libc.so.6 (0x9fb90))
b'JXWVXRKX'

创建 project

1
2
3
# Create an Angr project.
path_to_binary = "./00_angr_find"
project = angr.Project(path_to_binary, auto_load_libs=False)

使用angr的首要步骤就是创建Project加载二进制文件。angr的二进制装载组件是CLE,它负责装载二进制对象(以及它依赖的任何库)和把这个对象以易于操作的方式交给angr的其他组件。angr将这些包含在Project类中。一个Project类是代表了你的二进制文件的实体。你与angr的大部分操作都会经过它。auto_load_libs 设置是否自动载入依赖的库。

  • 如果auto_load_libsTrue(默认值),真正的库函数会被执行。一些libc的函数分析起来过于复杂并且很有可能引起path对其的尝试执行过程中的state数量的爆炸增长。
  • 如果auto_load_libsFalse,且外部函数是无法找到的,并且Project会将它们引用到一个通用的叫做ReturnUnconstrainedSimProcedure上去,它就像它的名字所说的那样:它返回一个不受约束的值

设置 state

1
2
3
4
# Tell Angr where to start executing (should it start from the main()
# function or somewhere else?) For now, use the entry_state function
# to instruct Angr to start from the main() function.
initial_state = project.factory.entry_state()

state 代表程序的一个实例镜像,模拟执行某个时刻的状态,就类似于快照。保存运行状态的上下文信息,如内存/寄存器等,我们这里使用project.factory.entry_state()告诉符号执行引擎从程序的入口点开始符号执行,除了使用.entry_state() 创建 state 对象, 我们还可以根据需要使用其他构造函数创建 state。

设置 Simulation Managers

1
2
3
# Create a simulation manager initialized with the starting state. It provides
# a number of useful tools to search and execute the binary.
simulation = project.factory.simgr(initial_state)

Project 对象仅表示程序一开始的样子,而在执行时,我们实际上是对SimState对象进行操作,它代表程序的一个实例镜像,模拟执行某个时刻的状态。SimState 对象包含程序运行时信息,如内存/寄存器/文件系统数据等。SM(Simulation Managers)是angr中最重要的控制接口,它使你能够同时控制一组状态(state)的符号执行,应用搜索策略来探索程序的状态空间。

运行,探索满足路径需要的值

1
2
3
4
5
6
# Explore the binary to attempt to find the address that prints "Good Job."
# You will have to find the address you want to find and insert it here.
# This function will keep executing until it either finds a solution or it
# has explored every possible path through the executable.
print_good_address = 0x8048678
simulation.explore(find=print_good_address)

符号执行最普遍的操作是找到能够到达某个地址的状态,同时丢弃其他不能到达这个地址的状态。SM为使用这种执行模式提供了.explore()方法。当使用find参数启动.explore()方法时,程序将会一直执行,直到发现了一个和find参数指定的条件相匹配的状态。find参数的内容可以是想要执行到的某个地址、或者想要执行到的地址列表、或者一个获取state作为参数并判断这个state是否满足某些条件的函数。当activestash中的任意状态和find中的条件匹配的时候,它们就会被放到found stash中,执行随即停止。之后你可以探索找到的状态,或者决定丢弃它,转而探索其它状态。若能输出正确的字符串”Good Job”即代表我们的执行路径是正确的

1
2
3
4
.text:08048675                 sub     esp, 0Ch
.text:08048678 push offset aGoodJob ; "Good Job."
.text:0804867D call _puts
.text:08048682 add esp, 10h

获取执行结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Check that we have found a solution. The simulation.explore() method will
# set simulation.found to a list of the states that it could find that reach
# the instruction we asked it to search for. Remember, in Python, if a list
# is empty, it will be evaluated as false, otherwise true.
if simulation.found:
# The explore method stops after it finds a single state that arrives at the
# target address.
solution_state = simulation.found[0]
# Print the string that Angr wrote to stdin to follow solution_state. This
# is our solution.
print(solution_state.posix.dumps(sys.stdin.fileno()))
else:
# If Angr could not find a path that reaches print_good_address, throw an
# error. Perhaps you mistyped the print_good_address?
raise Exception('Could not find the solution')

此时相关的状态已经保存在了simgr当中,我们可以通过simgr.found来访问所有符合条件的分支,这里我们为了解题,就选择第一个符合条件的分支即可。

这里解释一下sys.stdin.fileno(),在UNIX中,按照惯例,三个文件描述符分别表示标准输入、标准输出和标准错误。

1
2
3
4
5
6
7
>>> import sys
>>> sys.stdin.fileno()
0
>>> sys.stdout.fileno()
1
>>> sys.stderr.fileno()
2

所以一般也可以写成:

1
solution = solution_state.posix.dumps(0)

01_angr_avoid

这题主要是引入了.explore()方法的另一个参数void,我们可以看看这个方法的原型:

1
def explore(self, stash='active', n=None, find=None, avoid=None, find_stash='found', avoid_stash='avoid', cfg=None,um_find=1, **kwargs):

只需要让执行流只进入maybe_good函数,而避免进入avoid_me函数即可,现在需要拿到这两个函数的地址,只要用r2分析即可。

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import angr
import sys

def main(argv):
path_to_binary = "./01_angr_avoid"
project = angr.Project(path_to_binary)
initial_state = project.factory.entry_state()
simulation = project.factory.simgr(initial_state)

# Explore the binary, but this time, instead of only looking for a state that
# reaches the print_good_address, also find a state that does not reach
# will_not_succeed_address. The binary is pretty large, to save you some time,
# everything you will need to look at is near the beginning of the address
# space.
# (!)
print_good_address = 0x080485E0
will_not_succeed_address = 0x080485A8
simulation.explore(find=print_good_address, avoid=will_not_succeed_address)

if simulation.found:
solution_state = simulation.found[0]
print(solution_state.posix.dumps(sys.stdin.fileno()))
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main(sys.argv)

02_angr_find_condition

题目本意是教会我们如何根据程序本身的输出来告诉angr应避免或保留的内容。因为有时候打开二进制文件将看到有很多打印“ Good Job”的块,或“Try Again”的块。每次都记录下这些块的所有起始地址是一个麻烦的的问题,这时候我们可以直接根据打印到stdout的内容告诉angr保留或丢弃状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
int __cdecl main(int argc, const char **argv, const char **envp)
{
int i; // [esp+18h] [ebp-40h]
int j; // [esp+1Ch] [ebp-3Ch]
char s1[20]; // [esp+24h] [ebp-34h] BYREF
char s2[20]; // [esp+38h] [ebp-20h] BYREF
unsigned int v8; // [esp+4Ch] [ebp-Ch]

v8 = __readgsdword(0x14u);
for ( i = 0; i <= 19; ++i )
s2[i] = 0;
qmemcpy(s2, "VXRRJEUR", 8);
printf("Enter the password: ");
__isoc99_scanf("%8s", s1);
for ( j = 0; j <= 7; ++j )
s1[j] = complex_function(s1[j], j + 8);
if ( !strcmp(s1, s2) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# It is very useful to be able to search for a state that reaches a certain
# instruction. However, in some cases, you may not know the address of the
# specific instruction you want to reach (or perhaps there is no single
# instruction goal.) In this challenge, you don't know which instruction
# grants you success. Instead, you just know that you want to find a state where
# the binary prints "Good Job."
#
# Angr is powerful in that it allows you to search for a states that meets an
# arbitrary condition that you specify in Python, using a predicate you define
# as a function that takes a state and returns True if you have found what you
# are looking for, and False otherwise.

import angr
import sys

def main(argv):
path_to_binary = "./02_angr_find_condition"
project = angr.Project(path_to_binary)
initial_state = project.factory.entry_state()
simulation = project.factory.simgr(initial_state)

# Define a function that checks if you have found the state you are looking
# for.
def is_successful(state):
# Dump whatever has been printed out by the binary so far into a string.
stdout_output = state.posix.dumps(sys.stdout.fileno())

# Return whether 'Good Job.' has been printed yet.
# (!)
if b'Good Job.' in stdout_output:
return True
else:
return False

# Same as above, but this time check if the state should abort. If you return
# False, Angr will continue to step the state. In this specific challenge, the
# only time at which you will know you should abort is when the program prints
# "Try again."
def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.' in stdout_output:
return True
else:
return False

# Tell Angr to explore the binary and find any state that is_successful identfies
# as a successful state by returning True.
simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]
print(solution_state.posix.dumps(sys.stdin.fileno()))
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main(sys.argv)

03_angr_simbolic_registers

这题主要是因为angr在处理复杂格式的字符串scanf()输入的时候不是很好,我们可以直接将符号注入寄存器,也就是主要学会符号化寄存器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
int __cdecl main(int argc, const char **argv, const char **envp)
{
__int64 user_input; // rax
int v5; // [esp+4h] [ebp-14h]
int v6; // [esp+8h] [ebp-10h]
int v7; // [esp+Ch] [ebp-Ch]
int v8; // [esp+Ch] [ebp-Ch]

printf("Enter the password: ");
user_input = get_user_input();
v7 = HIDWORD(user_input);
v5 = complex_function_1(user_input);
v6 = complex_function_2();
v8 = complex_function_3(v7);
if ( v5 || v6 || v8 )
puts("Try again.");
else
puts("Good Job.");
return 0;
}
int get_user_input()
{
int v1; // [esp+0h] [ebp-18h] BYREF
int v2; // [esp+4h] [ebp-14h] BYREF
int v3[4]; // [esp+8h] [ebp-10h] BYREF

v3[1] = __readgsdword(0x14u);
__isoc99_scanf("%x %x %x", &v1, &v2, v3);
return v1;
}
.text:0804891E lea ecx, [ebp+var_10]
.text:08048921 push ecx
.text:08048922 lea ecx, [ebp+var_14]
.text:08048925 push ecx
.text:08048926 lea ecx, [ebp+var_18]
.text:08048929 push ecx
.text:0804892A push offset aXXX ; "%x %x %x"
.text:0804892F call ___isoc99_scanf
.text:08048934 add esp, 10h
.text:08048937 mov ecx, [ebp+var_18]
.text:0804893A mov eax, ecx ;
.text:0804893C mov ecx, [ebp+var_14]
.text:0804893F mov ebx, ecx ;
.text:08048941 mov ecx, [ebp+var_10]
.text:08048944 mov edx, ecx ;

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# Angr doesn't currently support reading multiple things with scanf (Ex: 
# scanf("%u %u).) You will have to tell the simulation engine to begin the
# program after scanf is called and manually inject the symbols into registers.

import angr
import claripy
import sys

def main(argv):
path_to_binary = "./03_angr_symbolic_registers"
project = angr.Project(path_to_binary)

# Sometimes, you want to specify where the program should start. The variable
# start_address will specify where the symbolic execution engine should begin.
# Note that we are using blank_state, not entry_state.
# (!)
start_address = 0x08048980 # :integer (probably hexadecimal)
initial_state = project.factory.blank_state(addr=start_address)

# Create a symbolic bitvector (the datatype Angr uses to inject symbolic
# values into the binary.) The first parameter is just a name Angr uses
# to reference it.
# You will have to construct multiple bitvectors. Copy the two lines below
# and change the variable names. To figure out how many (and of what size)
# you need, dissassemble the binary and determine the format parameter passed
# to scanf.
# (!)
password_size_in_bits = 32 # :integer
password0 = claripy.BVS('password0', password_size_in_bits)
password1 = claripy.BVS('password1', password_size_in_bits)
password2 = claripy.BVS('password2', password_size_in_bits)

# Set a register to a symbolic value. This is one way to inject symbols into
# the program.
# initial_state.regs stores a number of convenient attributes that reference
# registers by name. For example, to set eax to password0, use:
#
# initial_state.regs.eax = password0
#
# You will have to set multiple registers to distinct bitvectors. Copy and
# paste the line below and change the register. To determine which registers
# to inject which symbol, dissassemble the binary and look at the instructions
# immediately following the call to scanf.
# (!)
initial_state.regs.eax = password0
initial_state.regs.ebx = password1
initial_state.regs.edx = password2

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

# Solve for the symbolic values. If there are multiple solutions, we only
# care about one, so we can use eval, which returns any (but only one)
# solution. Pass eval the bitvector you want to solve for.
# (!)
solution0 = solution_state.se.eval(password0)
solution1 = solution_state.se.eval(password1)
solution2 = solution_state.se.eval(password2)

# Aggregate and format the solutions you computed above, and then print
# the full string. Pay attention to the order of the integers, and the
# expected base (decimal, octal, hexadecimal, etc).
solution = hex(solution0) + " " + hex(solution1) + " " + hex(solution2) # :string
print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main(sys.argv)

states

状态预设

除了使用.entry_state() 创建 state 对象, 我们还可以根据需要使用其他构造函数创建 state:

名称 描述
.entry_state() 构造一个已经准备好从函数入口点执行的状态
.blank_state 构造一个“空状态”,它的大多数数据都是未初始化的。当使用未初始化的的数据时,一个不受约束的符号值将会被返回
.call_state 构造一个已经准备好执行某个函数的状态
.full_init_state 构造一个已经执行过所有与需要执行的初始化函数,并准备从函数入口点执行的状态。比如,共享库构造函数(constructor)或预初始化器。当这些执行完之后,程序将会跳到入口点

位向量(bitvector)

符号位向量是angr用于将符号值注入程序的数据类型。这些将是angr将解决的方程式的“ x”,也就是约束求解时的自变量。可以通过 BVV(value,size)BVS( name, size) 接口创建位向量,也可以用 FPV 和 FPS 来创建浮点值和符号

在这里我们使用claripy通过BVS()方法生成三个位向量。此方法有两个参数:第一个是angr用来引用位向量的名称,第二个是位向量本身的大小(以位为单位)。由于符号值存储在寄存器中,并且寄存器的长度为32位,因此位向量的大小将为32位

访问寄存器

get_user_input()对输入进行了解析并将其放入三个寄存器中,我们可以通过 state.regs 对象的属性访问以及修改寄存器的数据

是时候把我们之前创建的符号位向量(bitvectors)放入属于他们的地方:寄存器EAXEBXEDX。我们将修改initial_state之前创建的内容并更新寄存器的内容。

1
2
3
initial_state.regs.eax = passwd0
initial_state.regs.ebx = passwd1
initial_state.regs.edx = passwd2

约束求解

可以通过使用state.solver.eval(symbol)对各个断言进行评测来求出一个合法的符号值(若有多个合法值,返回其中的一个),我们根据eval()之前注入的三个符号值调用求解器引擎的方法。

1
2
3
4
5
solution0 = format(solution_state.solver.eval(passwd0), 'x')
solution1 = format(solution_state.solver.eval(passwd1), 'x')
solution2 = format(solution_state.solver.eval(passwd2), 'x')
solution = solution0 + " " + solution1 + " " + solution2
print("[+] Success! Solution is: {}".format(solution))

04_angr_symbolic_stack

这题主要是学习如何符号化栈上的值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
int __cdecl main(int argc, const char **argv, const char **envp)
{
printf("Enter the password: ");
handle_user();
return 0;
}
int handle_user()
{
int v1; // [esp+8h] [ebp-10h] BYREF
int v2[3]; // [esp+Ch] [ebp-Ch] BYREF

__isoc99_scanf("%u %u", v2, &v1);
v2[0] = complex_function0(v2[0]);
v1 = complex_function1(v1);
if ( v2[0] == 1999643857 && v1 == -1136455217 )
return puts("Good Job.");
else
return puts("Try again.");
}
.text:08048682 lea eax, [ebp+var_10]
.text:08048685 push eax
.text:08048686 lea eax, [ebp+var_C]
.text:08048689 push eax
.text:0804868A push offset aUU ; "%u %u"
.text:0804868F call ___isoc99_scanf
.text:08048694 add esp, 10h
.text:08048697 mov eax, [ebp+var_C]
.text:0804869A sub esp, 0Ch
.text:0804869D push eax
.text:0804869E call complex_function0

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# This challenge will be more challenging than the previous challenges that you
# have encountered thus far. Since the goal of this CTF is to teach symbolic
# execution and not how to construct stack frames, these comments will work you
# through understanding what is on the stack.
# ! ! !
# IMPORTANT: Any addresses in this script aren't necessarily right! Dissassemble
# the binary yourself to determine the correct addresses!
# ! ! !

import angr
import claripy
import sys

def main(argv):
path_to_binary = "./04_angr_symbolic_stack"
project = angr.Project(path_to_binary)

# For this challenge, we want to begin after the call to scanf. Note that this
# is in the middle of a function.
#
# This challenge requires dealing with the stack, so you have to pay extra
# careful attention to where you start, otherwise you will enter a condition
# where the stack is set up incorrectly. In order to determine where after
# scanf to start, we need to look at the dissassembly of the call and the
# instruction immediately following it:
# sub $0x4,%esp
# lea -0x10(%ebp),%eax
# push %eax
# lea -0xc(%ebp),%eax
# push %eax
# push $0x80489c3
# call 8048370 <__isoc99_scanf@plt>
# add $0x10,%esp
# Now, the question is: do we start on the instruction immediately following
# scanf (add $0x10,%esp), or the instruction following that (not shown)?
# Consider what the 'add $0x10,%esp' is doing. Hint: it has to do with the
# scanf parameters that are pushed to the stack before calling the function.
# Given that we are not calling scanf in our Angr simulation, where should we
# start?
# (!)
start_address = 0x08048697
initial_state = project.factory.blank_state(addr=start_address)

# We are jumping into the middle of a function! Therefore, we need to account
# for how the function constructs the stack. The second instruction of the
# function is:
# mov %esp,%ebp
# At which point it allocates the part of the stack frame we plan to target:
# sub $0x18,%esp
# Note the value of esp relative to ebp. The space between them is (usually)
# the stack space. Since esp was decreased by 0x18
#
# /-------- The stack --------\
# ebp -> | |
# |---------------------------|
# | |
# |---------------------------|
# . . . (total of 0x18 bytes)
# . . . Somewhere in here is
# . . . the data that stores
# . . . the result of scanf.
# esp -> | |
# \---------------------------/
#
# Since we are starting after scanf, we are skipping this stack construction
# step. To make up for this, we need to construct the stack ourselves. Let us
# start by initializing ebp in the exact same way the program does.
initial_state.regs.ebp = initial_state.regs.esp

# scanf("%u %u") needs to be replaced by injecting two bitvectors. The
# reason for this is that Angr does not (currently) automatically inject
# symbols if scanf has more than one input parameter. This means Angr can
# handle 'scanf("%u")', but not 'scanf("%u %u")'.
# You can either copy and paste the line below or use a Python list.
# (!)
password_size_in_bits = 32
password0 = claripy.BVS('password0', password_size_in_bits)
password1 = claripy.BVS('password1', password_size_in_bits)

# Here is the hard part. We need to figure out what the stack looks like, at
# least well enough to inject our symbols where we want them. In order to do
# that, let's figure out what the parameters of scanf are:
# sub $0x4,%esp
# lea -0x10(%ebp),%eax
# push %eax
# lea -0xc(%ebp),%eax
# push %eax
# push $0x80489c3
# call 8048370 <__isoc99_scanf@plt>
# add $0x10,%esp
# As you can see, the call to scanf looks like this:
# scanf( 0x80489c3, ebp - 0xc, ebp - 0x10 )
# format_string password0 password1
# From this, we can construct our new, more accurate stack diagram:
#
# /-------- The stack --------\
# ebp -> | padding |
# |---------------------------|
# ebp - 0x01 | more padding |
# |---------------------------|
# ebp - 0x02 | even more padding |
# |---------------------------|
# . . . <- How much padding? Hint: how
# |---------------------------| many bytes is password0?
# ebp - 0x0b | password0, second byte |
# |---------------------------|
# ebp - 0x0c | password0, first byte |
# |---------------------------|
# ebp - 0x0d | password1, last byte |
# |---------------------------|
# . . .
# |---------------------------|
# ebp - 0x10 | password1, first byte |
# |---------------------------|
# . . .
# |---------------------------|
# esp -> | |
# \---------------------------/
#
# Figure out how much space there is and allocate the necessary padding to
# the stack by decrementing esp before you push the password bitvectors.
padding_length_in_bytes = 0x8 # :integer
initial_state.regs.esp -= padding_length_in_bytes

# Push the variables to the stack. Make sure to push them in the right order!
# The syntax for the following function is:
#
# initial_state.stack_push(bitvector)
#
# This will push the bitvector on the stack, and increment esp the correct
# amount. You will need to push multiple bitvectors on the stack.
# (!)
initial_state.stack_push(password0) # :bitvector (claripy.BVS, claripy.BVV, claripy.BV)
initial_state.stack_push(password1)

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

solution0 = solution_state.se.eval(password0)
solution1 = solution_state.se.eval(password1)
print("[+] Success! Solution is: {0} {1}".format(solution0, solution1))
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main(sys.argv)

eval

  • solver.eval(expression) 将会解出一个可行解
  • solver.eval_one(expression)将会给出一个表达式的可行解,若有多个可行解,则抛出异常。
  • solver.eval_upto(expression, n)将会给出最多n个可行解,如果不足n个就给出所有的可行解。
  • solver.eval_exact(expression, n)将会给出n个可行解,如果解的个数不等于n个,将会抛出异常。
  • solver.min(expression)将会给出最小可行解
  • solver.max(expression)将会给出最大可行解

另外还有还有cast_to可以接收一个参数来指定把结果映射到哪种数据类型。目前这个参数只能是str,它将会以字符串形式展示返回的结果

05_angr_symbolic_memory

程序逻辑

  • 程序将四个8字节长的字符串作为输入
  • 字符串分别位于以下地址[0xA1BA1C0, 0xA1BA1C8, 0xA1BA1D0, 0xA1BA1D8]
  • 输入的字符串循环输入complex_function()函数进行变换
  • 循环变换后的字符串与 "NJPURZPCDYEAXCSJZJMPSOMBFDDLHBVN"比较前0x20个字符
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int __cdecl main(int argc, const char **argv, const char **envp)
{
int i; // [esp+Ch] [ebp-Ch]

memset(user_input, 0, 0x21u);
printf("Enter the password: ");
__isoc99_scanf("%8s %8s %8s %8s", user_input, &unk_A1BA1C8, &unk_A1BA1D0, &unk_A1BA1D8);
for ( i = 0; i <= 31; ++i )
*(_BYTE *)(i + 0xA1BA1C0) = complex_function(*(char *)(i + 169583040), i);
if ( !strncmp(user_input, "NJPURZPCDYEAXCSJZJMPSOMBFDDLHBVN", 0x20u) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import angr
import claripy
import sys

def main(argv):
path_to_binary = "./05_angr_symbolic_memory"
project = angr.Project(path_to_binary)

start_address = 0x8048601
initial_state = project.factory.blank_state(addr=start_address)

# The binary is calling scanf("%8s %8s %8s %8s").
# (!)
password_size_in_bits = 64
password0 = claripy.BVS('passwd0', password_size_in_bits)
password1 = claripy.BVS('passwd1', password_size_in_bits)
password2 = claripy.BVS('passwd2', password_size_in_bits)
password3 = claripy.BVS('passwd3', password_size_in_bits)

# Determine the address of the global variable to which scanf writes the user
# input. The function 'initial_state.memory.store(address, value)' will write
# 'value' (a bitvector) to 'address' (a memory location, as an integer.) The
# 'address' parameter can also be a bitvector (and can be symbolic!).
# (!)
password0_address = 0xA1BA1C0
password1_address = 0xA1BA1C8
password2_address = 0xA1BA1D0
password3_address = 0xA1BA1D8
initial_state.memory.store(password0_address, password0)
initial_state.memory.store(password1_address, password1)
initial_state.memory.store(password2_address, password2)
initial_state.memory.store(password3_address, password3)

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]
# Solve for the symbolic values. We are trying to solve for a string.
# Therefore, we will use eval, with named parameter cast_to=str
# which returns a string instead of an integer.
# (!)
solution0 = solution_state.se.eval(password0,cast_to=bytes)
solution1 = solution_state.se.eval(password1,cast_to=bytes)
solution2 = solution_state.se.eval(password2,cast_to=bytes)
solution3 = solution_state.se.eval(password3,cast_to=bytes)
solution = solution0 + b" " + solution1 + b" " + solution2 + b" " + solution3
print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main(sys.argv)

state.memery

前面提到可以通过 state.mem[index] 访问内存,但对于一段连续内存的操作十分不方便。因此我们也可以使用 state.memory.load(addr, size) 或者 .store(addr, val) 接口读写内存, size 以 bytes 为单位

这些函数的原型:

1
2
3
4
5
6
7
8
9
10
def load(self, addr, size=None, condition=None, fallback=None, add_constraints=None, action=None,                          endness=None,inspect=True, disable_actions=False, ret_on_segv=False):
"""
Loads size bytes from dst.
:param addr: The address to load from. #读取的地址
:param size: The size (in bytes) of the load. #大小
:param condition: A claripy expression representing a condition for a conditional load.
:param fallback: A fallback value if the condition ends up being False.
:param add_constraints: Add constraints resulting from the merge (default: True).
:param action: A SimActionData to fill out with the constraints.
:param endness: The endness to load with. #端序
1
2
3
4
5
6
7
8
def store(self, addr, data, size=None, condition=None, add_constraints=None, endness=None, action=None,
inspect=True, priv=None, disable_actions=False):
"""
Stores content into memory.
:param addr: A claripy expression representing the address to store at. #内存地址
:param data: The data to store (claripy expression or something convertable to a claripy expression).#写入的数据
:param size: A claripy expression representing the size of the data to store. #大小
...

06_angr_symbolic_dynamic_memory

这题主要是学会符号化动态内存,这个题与上题没有太大区别,除了字符串的内存是通过malloc()分配的。

程序逻辑

  • 程序使用malloc()函数分配出了两个大小为9字节的缓冲区,并将其初始化为0
  • 然后将两个字符串以scanf("%8s %8s")作为格式化字符串分别输入进缓冲区内
  • 然后利用complex_function()函数分别对两个字符串进行变换
  • 然后将变换后的字符串分别与**”UODXLZBI”“UAORRAYF”**进行比较
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
int __cdecl main(int argc, const char **argv, const char **envp)
{
char *v3; // ebx
char *v4; // ebx
int v6; // [esp-10h] [ebp-1Ch]
int i; // [esp+0h] [ebp-Ch]
/*
* 我们可以看到malloc()分配了两个缓冲区,因为maclloc()函数只有一个参数,申请9字节大小不满一个chunk,
* 会申请 0x20 大小的chunk。根据mov ds:buffer0, eax和mov ds:buffer1,
* eax得知开辟后的缓冲区被复制到标识为buffer0和buffer1的两个存储区中
*/
buffer0 = (char *)malloc(9u);
buffer1 = (char *)malloc(9u);
memset(buffer0, 0, 9u);
memset(buffer1, 0, 9u);
printf("Enter the password: ");
__isoc99_scanf("%8s %8s", buffer0, buffer1, v6);
for ( i = 0; i <= 7; ++i )
{
v3 = &buffer0[i];
*v3 = complex_function(buffer0[i], i);
v4 = &buffer1[i];
*v4 = complex_function(buffer1[i], i + 32);
}
if ( !strncmp(buffer0, "UODXLZBI", 8u) && !strncmp(buffer1, "UAORRAYF", 8u) )
puts("Good Job.");
else
puts("Try again.");
free(buffer0);
free(buffer1);
return 0;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import angr
import claripy
import sys

def main():
path_to_binary = "./06_angr_symbolic_dynamic_memory"
project = angr.Project(path_to_binary)
# 跳过所有`malloc()`,稍后将在脚本中处理它们。
start_address = 0x8048699
initial_state = project.factory.blank_state(addr=start_address)

# The binary is calling scanf("%8s %8s").
# (!)
# 因为缓冲区的大小是8字节,故换算成比特即为64比特的大小,最后我们初始化两个大小为64位的符号位向量
password_size_in_bits = 64
password0 = claripy.BVS('password0', password_size_in_bits)
password1 = claripy.BVS('password1', password_size_in_bits)

# Instead of telling the binary to write to the address of the memory
# allocated with malloc, we can simply fake an address to any unused block of
# memory and overwrite the pointer to the data. This will point the pointer
# with the address of pointer_to_malloc_memory_address0 to fake_heap_address.
# Be aware, there is more than one pointer! Analyze the binary to determine
# global location of each pointer.
# Note: by default, Angr stores integers in memory with big-endianness. To
# specify to use the endianness of your architecture, use the parameter
# endness=project.arch.memory_endness. On x86, this is little-endian.
# (!)
# 之前buffer指向的是malloc分配好的内存地址,string存在这里。现在是buffer指向的是我们伪造的地址,符号位向量存在这里。
fake_heap_address0 = 0xffffc93c
pointer_to_malloc_memory_address0 = 0xabcc8a4
fake_heap_address1 = 0xffffc95c
pointer_to_malloc_memory_address1 = 0xabcc8ac
# 参数 endness 用于设置端序,angr默认为大端序,总共可选的值如下:
"""
LE – 小端序(little endian, least significant byte is stored at lowest address)
BE – 大端序(big endian, most significant byte is stored at lowest address)
ME – 中间序(Middle-endian. Yep.)
"""
initial_state.memory.store(pointer_to_malloc_memory_address0, fake_heap_address0, endness=project.arch.memory_endness)
initial_state.memory.store(pointer_to_malloc_memory_address1, fake_heap_address1, endness=project.arch.memory_endness)

# Store our symbolic values at our fake_heap_address. Look at the binary to
# determine the offsets from the fake_heap_address where scanf writes.
# (!)
initial_state.memory.store(fake_heap_address0, password0)
initial_state.memory.store(fake_heap_address1, password1)

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

solution0 = solution_state.se.eval(password0,cast_to=bytes)
solution1 = solution_state.se.eval(password1,cast_to=bytes)

solution = solution0 + b" " + solution1
print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

07_angr_symbolic_file

这题主要学习如何符号化一个文件里面的内容:

程序逻辑

我们可以得知程序使用fread函数从文件中加载密码,如果密码正确,则会打印"Good Job"ignore_me主要是把第一个读取的内容存入OJKSQYDP.txt, 不用我们自己创建文件 ,然后从文件OJKSQYDP.txt读取数据存入buff

  • 读取一个名叫’OJKSQYDP.txt’的文件作为密码
  • 我们需要使用Angr模拟一个文件系统,其中该文件被我们自己的模拟文件所替代
  • 然后将该文件进行符号化处理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
int __cdecl __noreturn main(int argc, const char **argv, const char **envp)
{
int i; // [esp+Ch] [ebp-Ch]

memset(buffer, 0, sizeof(buffer));
printf("Enter the password: ");
__isoc99_scanf("%64s", buffer);
ignore_me((int)buffer, 0x40u);
memset(buffer, 0, sizeof(buffer));
fp = fopen("OJKSQYDP.txt", "rb");
fread(buffer, 1u, 0x40u, fp);
fclose(fp);
unlink("OJKSQYDP.txt");
for ( i = 0; i <= 7; ++i )
*(_BYTE *)(i + 134520992) = complex_function(*(char *)(i + 134520992), i);
if ( strncmp(buffer, "AQWLCTXB", 9u) )
{
puts("Try again.");
exit(1);
}
puts("Good Job.");
exit(0);
}
unsigned int __cdecl ignore_me(void *a1, size_t n)
{
void *v2; // esp
_BYTE v4[12]; // [esp+0h] [ebp-28h] BYREF
void *ptr; // [esp+Ch] [ebp-1Ch]
int v6; // [esp+10h] [ebp-18h]
void *s; // [esp+14h] [ebp-14h]
FILE *stream; // [esp+18h] [ebp-10h]
unsigned int v9; // [esp+1Ch] [ebp-Ch]

ptr = a1;
v9 = __readgsdword(0x14u);
v6 = n - 1;
v2 = alloca(16 * ((n + 15) / 0x10));
s = v4;
memset(v4, 0, n);
unlink("OJKSQYDP.txt");
stream = fopen("OJKSQYDP.txt", "a+b");
fwrite(ptr, 1u, n, stream);
fseek(stream, 0, 0);
__isoc99_fscanf(stream, "%64s", s);
fseek(stream, 0, 0);
fwrite(s, 1u, n, stream);
fclose(stream);
return __readgsdword(0x14u) ^ v9;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# This challenge could, in theory, be solved in multiple ways. However, for the
# sake of learning how to simulate an alternate filesystem, please solve this
# challenge according to structure provided below. As a challenge, once you have
# an initial solution, try solving this in an alternate way.
#
# Problem description and general solution strategy:
# The binary loads the password from a file using the fread function. If the
# password is correct, it prints "Good Job." In order to keep consistency with
# the other challenges, the input from the console is written to a file in the
# ignore_me function. As the name suggests, ignore it, as it only exists to
# maintain consistency with other challenges.
# We want to:
# 1. Determine the file from which fread reads.
# 2. Use Angr to simulate a filesystem where that file is replaced with our own
# simulated file.
# 3. Initialize the file with a symbolic value, which will be read with fread
# and propogated through the program.
# 4. Solve for the symbolic input to determine the password.

import angr
import claripy
import sys

def main():
path_to_binary = "./07_angr_symbolic_file"
project = angr.Project(path_to_binary)

start_address = 0x80488EA
initial_state = project.factory.blank_state(
addr=start_address,
add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)

# Specify some information needed to construct a simulated file. For this
# challenge, the filename is hardcoded, but in theory, it could be symbolic.
# Note: to read from the file, the binary calls
# 'fread(buffer, sizeof(char), 64, file)'.
# (!)
filename = 'OJKSQYDP.txt' # :string
symbolic_file_size_bytes = 64

# Construct a bitvector for the password and then store it in the file's
# backing memory. For example, imagine a simple file, 'hello.txt':
#
# Hello world, my name is John.
# ^ ^
# ^ address 0 ^ address 24 (count the number of characters)
# In order to represent this in memory, we would want to write the string to
# the beginning of the file:
#
# hello_txt_contents = claripy.BVV('Hello world, my name is John.', 30*8)
#
# Perhaps, then, we would want to replace John with a
# symbolic variable. We would call:
#
# name_bitvector = claripy.BVS('symbolic_name', 4*8)
#
# Then, after the program calls fopen('hello.txt', 'r') and then
# fread(buffer, sizeof(char), 30, hello_txt_file), the buffer would contain
# the string from the file, except four symbolic bytes where the name would be
# stored.
# (!)
password = claripy.BVS('password', symbolic_file_size_bytes * 8)

# Construct the symbolic file. The file_options parameter specifies the Linux
# file permissions (read, read/write, execute etc.) The content parameter
# specifies from where the stream of data should be supplied. If content is
# an instance of SimSymbolicMemory (we constructed one above), the stream will
# contain the contents (including any symbolic contents) of the memory,
# beginning from address zero.
# Set the content parameter to our BVS instance that holds the symbolic data.
# (!)
password_file = angr.storage.SimFile(filename, content=password)

# Add the symbolic file we created to the symbolic filesystem.
initial_state.fs.insert(filename, password_file)

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

solution = solution_state.solver.eval(password,cast_to=bytes).decode()

print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

状态插件-state plugin

除了刚刚讨论过的选项集,所有存储在SimState中的东西实际上都存储在附加在state上的“插件”中。到目前为止我们讨论的几乎所有state的属性都是一个插件——memoryregistersmemregssolver等等。这种设计带来了代码的模块化和能够便捷地为模拟状态的其他方面实现新的数据存储,或者提供插件的替代实现能力。

比如说,通常memory插件模拟一个平坦地址空间,但是在分析中可以选择开启“抽象内存”插件来支持state.memory,“抽象内存”使用新的数据类型表示地址,以模拟浮动的独立内存空间映射。反过来,插件可以减少代码的复杂性:state.memorystate.registers实际上是同一个插件的不同实例,因为寄存器也是用一块地址空间模拟的。

能够控制仿真程序所看到的环境,包括如何从环境中引入符号数据,这一点非常重要!angr具有一系列可靠的抽象概念,可帮助您设置所需的环境。

仿真文件系统-The Emulated Filesystem

这题的关键是利用了angr强大的仿真文件系统。在angr中与文件系统,套接字,管道或终端的任何交互的根源都是SimFile对象。SimFile是一种存储抽象,它定义符号或其他形式的字节序列。您可以从某个位置读取文件,可以在某个位置写入文件,可以询问文件中当前存储了多少字节,还可以具体化文件,并为其生成测试用例。

简单来说利用SimFile形成符号化的文件的格式:

1
simgr_file = angr.storage.SimFile(filename, content=xxxxxx, size=file_size)

然后需要传给state的初始化过程来影响对文件系统的使用。我们可以利用fs选项以文件名的字典来预配置SimFile对象,也可以fs.insert是将文件插入到文件系统中,需要文件名与符号化的文件

1
initial_state.fs.insert(filename, simgr_file)

08_angr_constraints

该题主要学习通过添加约束条件来解决路径爆炸问题。

程序分析

  • 用户输入的字符串存储在buffer,buffer的地址为:0x804A050
  • 比较函数check_equals_AUPDNNPROEZRJWKB的地址为:0x08048565
  • 其实只要当程序运行到地址0x08048565时,处于buffer地址内的字符串等于AUPDNNPROEZRJWKB即可
  • 添加上述约束条件即可一步得出结果,而不用进入比较函数逐一字符比较而产生路径爆炸问题
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int __cdecl main(int argc, const char **argv, const char **envp)
{
int i; // [esp+Ch] [ebp-Ch]

qmemcpy(&password, "AUPDNNPROEZRJWKB", 16);
memset(&buffer, 0, 0x11u);
printf("Enter the password: ");
__isoc99_scanf("%16s", &buffer);
for ( i = 0; i <= 15; ++i )
*(_BYTE *)(i + 134520912) = complex_function(*(char *)(i + 134520912), 15 - i);
if ( check_equals_AUPDNNPROEZRJWKB(&buffer, 16) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}
_BOOL4 __cdecl check_equals_AUPDNNPROEZRJWKB(int a1, unsigned int a2)
{
int v3; // [esp+8h] [ebp-8h]
unsigned int i; // [esp+Ch] [ebp-4h]

v3 = 0;
for ( i = 0; i < a2; ++i )
{
if ( *(_BYTE *)(i + a1) == *(_BYTE *)(i + 0x804A040) )
++v3;
}
return v3 == a2;
}

路径爆炸

通过我们之前的学习体验感觉到angr这么强大的应用怎么没有在实际的测试生产中大规模应用,这是因为给符号执行技术在复杂程序的测试案例生成的应用中造成阻碍的两个大问题:一个是约束求解问题,另一个就是路径爆炸问题

所谓符号执行就是把程序中的变量符号化去模拟程序运行,搜集路径约束条件并使用约束求解器对其进行求解后得到结果。当一个程序存在循环结构时,即使逻辑十分简单也可能会产生规模十分巨大的执行路径。在符号执行的过程中,每个分支点都会产生两个实例,当程序中存在循环结构展开时,可能会导致程序分支路径数呈指数级增长,即路径爆炸问题。故我们需要提供更多的约束条件控制路径爆照问题。

check_equals_AUPDNNPROEZRJWKB()函数就是一个字符一个字符的比较,就会产生路径爆炸问题,原始也是每次调用循环中的if语句(16次)时,计算机都需要产生判断分支,从而导致2 ^ 16 = 65,536分支,这将花费很长时间来测试并获得我们的答案。我们解决这个问题的答案,直接用约束条件取代这个判断函数,用字符串直接比较约束,从而避免因为循环和判断语句逐一字符比较而产生分支引起路径爆炸问题

约束求解

在angr中提供了可以用加入一个约束条件到一个state中的方法(state.solver.add),将每一个符号化的布尔值作为一个关于符号变量合法性的断言。之后可以通过使用state.solver.eval(symbol)对各个断言进行评测来求出一个合法的符号值(若有多个合法值,返回其中的一个)。简单来说就是通过 .add 对 state 对象添加约束,并使用 .eval 接口求解,得到符号变量的可行解。

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# The binary asks for a 16 character password to which is applies a complex
# function and then compares with a reference string with the function
# check_equals_[reference string]. (Decompile the binary and take a look at it!)
# The source code for this function is provided here. However, the reference
# string in your version will be different than AABBCCDDEEFFGGHH:
#
# #define REFERENCE_PASSWORD = "AABBCCDDEEFFGGHH";
# int check_equals_AABBCCDDEEFFGGHH(char* to_check, size_t length) {
# uint32_t num_correct = 0;
# for (int i=0; i<length; ++i) {
# if (to_check[i] == REFERENCE_PASSWORD[i]) {
# num_correct += 1;
# }
# }
# return num_correct == length;
# }
#
# ...
#
# char* input = user_input();
# char* encrypted_input = complex_function(input);
# if (check_equals_AABBCCDDEEFFGGHH(encrypted_input, 16)) {
# puts("Good Job.");
# } else {
# puts("Try again.");
# }
#
# The function checks if *to_check == "AABBCCDDEEFFGGHH". Verify this yourself.
# While you, as a human, can easily determine that this function is equivalent
# to simply comparing the strings, the computer cannot. Instead the computer
# would need to branch every time the if statement in the loop was called (16
# times), resulting in 2^16 = 65,536 branches, which will take too long of a
# time to evaluate for our needs.
#
# We do not know how the complex_function works, but we want to find an input
# that, when modified by complex_function, will produce the string:
# AABBCCDDEEFFGGHH.
#
# In this puzzle, your goal will be to stop the program before this function is
# called and manually constrain the to_check variable to be equal to the
# password you identify by decompiling the binary. Since, you, as a human, know
# that if the strings are equal, the program will print "Good Job.", you can
# be assured that if the program can solve for an input that makes them equal,
# the input will be the correct password.

import angr
import claripy
import sys

def main():
path_to_binary = "./08_angr_constraints"
project = angr.Project(path_to_binary)

start_address = 0x8048625
initial_state = project.factory.blank_state(
addr=start_address,
add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)
char_size_in_bits = 8
password_len = 16
password = claripy.BVS('password', char_size_in_bits * password_len)

password_address = 0x0804A050
initial_state.memory.store(password_address, password)

simulation = project.factory.simgr(initial_state)

# Angr will not be able to reach the point at which the binary prints out
# 'Good Job.'. We cannot use that as the target anymore.
# (!)
address_to_check_constraint = 0x08048565
simulation.explore(find=address_to_check_constraint)

if simulation.found:
solution_state = simulation.found[0]

# Recall that we need to constrain the to_check parameter (see top) of the
# check_equals_ function. Determine the address that is being passed as the
# parameter and load it into a bitvector so that we can constrain it.
# (!)
constrained_parameter_address = password_address
constrained_parameter_size_bytes = 16
constrained_parameter_bitvector = solution_state.memory.load(
constrained_parameter_address,
constrained_parameter_size_bytes
)
# We want to constrain the system to find an input that will make
# constrained_parameter_bitvector equal the desired value.
# (!)
constrained_parameter_desired_value = 'AUPDNNPROEZRJWKB' # :string (encoded)

# Specify a claripy expression (using Pythonic syntax) that tests whether
# constrained_parameter_bitvector == constrained_parameter_desired_value.
# Add the constraint to the state to let z3 attempt to find an input that
# will make this expression true.
solution_state.add_constraints(constrained_parameter_bitvector == constrained_parameter_desired_value)

# Solve for the constrained_parameter_bitvector.
# (!)
solution = solution_state.solver.eval(password,cast_to=bytes)

print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

09_angr_hooks

这题如题目所言,主要就是学习使用angr的hook技术解决路径爆炸问题,与我们之前利用的约束条件不同,hook技术则更为强大。

程序分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
int __cdecl main(int argc, const char **argv, const char **envp)
{
_BOOL4 v3; // eax
int i; // [esp+8h] [ebp-10h]
int j; // [esp+Ch] [ebp-Ch]

qmemcpy(password, "XYMKBKUHNIQYNQXE", 16);
memset(buffer, 0, 0x11u);
printf("Enter the password: ");
__isoc99_scanf("%16s", buffer);
for ( i = 0; i <= 15; ++i )
*(_BYTE *)(i + 134520916) = complex_function(*(char *)(i + 134520916), 18 - i);
/*
.text:080486B3 call check_equals_XYMKBKUHNIQYNQXE
.text:080486B8 add esp, 10h
*/
equals = check_equals_XYMKBKUHNIQYNQXE(buffer, 16);
for ( j = 0; j <= 15; ++j )
*(_BYTE *)(j + 134520900) = complex_function(*(char *)(j + 134520900), j + 9);
__isoc99_scanf("%16s", buffer);
v3 = equals && !strncmp(buffer, password, 0x10u);
equals = v3;
if ( v3 )
puts("Good Job.");
else
puts("Try again.");
return 0;
}
_BOOL4 __cdecl check_equals_XYMKBKUHNIQYNQXE(int a1, unsigned int a2)
{
int v3; // [esp+8h] [ebp-8h]
unsigned int i; // [esp+Ch] [ebp-4h]

v3 = 0;
for ( i = 0; i < a2; ++i )
{
if ( *(_BYTE *)(i + a1) == *(_BYTE *)(i + 134520900) )
++v3;
}
return v3 == a2;
}
int __cdecl complex_function(signed int a1, int a2)
{
if ( a1 <= 64 || a1 > 90 )
{
puts("Try again.");
exit(1);
}
return (a1 - 65 + 23 * a2) % 26 + 65;
}

其实和上一题并没有什么太大的变化,主要是我们上一题是使用增加条件约束的方法减少路径分支,而这一题我们直接利用hook改写complex_function函数为我们自己的函数

hook

angr使用一系列引擎(SimEngine的子类)来模拟被执行代码对输入状态产生的影响。其中就有hook engine来处理hook的情况。默认情况下,angr 会使用 SimProcedures 中的符号摘要替换库函数,即设置 Hooking,这些 python 函数摘要高效地模拟库函数对状态的影响。可以通过 angr.proceduresangr.SimProcedures 查看列表

SimProcedure 其实就是 Hook 机制,可以通过 proj.hook(addr,hook) 设置,其中 hook 是一个 SimProcedure 实例,第一个参数即需要Hook的调用函数的地址,第二个参数length即指定执行引擎在完成挂钩后应跳过多少字节。具体多少字节由Hook处地址的指令长度确定。 通过 .is_hooked / .unhook / .hook_by 进行管理。将 proj.hook(addr) 作为函数装饰器,可以编写自己的 hook 函数。还可以通过 proj.hook_symbol(name,hook) hook 函数。

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# This level performs the following computations:
#
# 1. Get 16 bytes of user input and encrypt it.
# 2. Save the result of check_equals_AABBCCDDEEFFGGHH (or similar)
# 3. Get another 16 bytes from the user and encrypt it.
# 4. Check that it's equal to a predefined password.
#
# The ONLY part of this program that we have to worry about is #2. We will be
# replacing the call to check_equals_ with our own version, using a hook, since
# check_equals_ will run too slowly otherwise.

import angr
import claripy
import sys

def main():
path_to_binary = './09_angr_hooks'
project = angr.Project(path_to_binary)

# Since Angr can handle the initial call to scanf, we can start from the
# beginning.
initial_state = project.factory.entry_state()

# Hook the address of where check_equals_ is called.
# (!)
check_equals_called_address = 0x80486B3

# The length parameter in angr.Hook specifies how many bytes the execution
# engine should skip after completing the hook. This will allow hooks to
# replace certain instructions (or groups of instructions). Determine the
# instructions involved in calling check_equals_, and then determine how many
# bytes are used to represent them in memory. This will be the skip length.
# (!)
instruction_to_skip_length = 5
"""
然后我们需要在在@project.hook语句之后书写我们的模拟函数。然后如上题一致,
我们利用使用 state.memory 的 .load(addr, size)接口读出buffer处的内存数据,与答案进行比较
"""
@project.hook(check_equals_called_address, length=instruction_to_skip_length)
def skip_check_equals_(state):
# Determine the address where user input is stored. It is passed as a
# parameter ot the check_equals_ function. Then, load the string. Reminder:
# int check_equals_(char* to_check, int length) { ...
user_input_buffer_address = 0x804A054 # :integer, probably hexadecimal
user_input_buffer_length = 16

# Reminder: state.memory.load will read the stored value at the address
# user_input_buffer_address of byte length user_input_buffer_length.
# It will return a bitvector holding the value. This value can either be
# symbolic or concrete, depending on what was stored there in the program.
user_input_string = state.memory.load(
user_input_buffer_address,
user_input_buffer_length
)

# Determine the string this function is checking the user input against.
# It's encoded in the name of this function; decompile the program to find
# it.
check_against_string = 'XKSPZSJKJYQCQXZV' # :string

# gcc uses eax to store the return value, if it is an integer. We need to
# set eax to 1 if check_against_string == user_input_string and 0 otherwise.
# However, since we are describing an equation to be used by z3 (not to be
# evaluated immediately), we cannot use Python if else syntax. Instead, we
# have to use claripy's built in function that deals with if statements.
# claripy.If(expression, ret_if_true, ret_if_false) will output an
# expression that evaluates to ret_if_true if expression is true and
# ret_if_false otherwise.
# Think of it like the Python "value0 if expression else value1".
"""
这个函数是利用EAX寄存器作为返回值,然后成功则返回1,不成功则返回0,
还需要注意在构建符号位向量的时候EAX寄存器是32位寄存器
"""
state.regs.eax = claripy.If(
user_input_string == check_against_string,
claripy.BVV(1, 32),
claripy.BVV(0, 32)
)

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

# Since we are allowing Angr to handle the input, retrieve it by printing
# the contents of stdin. Use one of the early levels as a reference.
solution = solution_state.posix.dumps(0)
print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

10_angr_simprocedures

这题主要学习如何利用函数名进行hook,而不是复杂的利用函数的调用地址。

程序分析

这一题与上一题相似, 我们必须替换check_equals函数 。但是,我们可以发现check_equals被调用了很多次,以致于无法通过地址Hook每个调用位置。 这时我们必须使用SimProcedure编写我们自己的check_equals实现,然后通过函数名Hook所有对check_equals的调用。

image-20240717101443828

hook sym

每一个程序都有一个符号表,angr可以确保从每个导入符号都可以解析出地址,可以使用angr提供的Project.hook_symbolAPI来通过符号名来Hook函数所有的调用地址,但符号表也是可以被去除的(大多数程序都会去除符号表)。

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# This challenge is similar to the previous one. It operates under the same
# premise that you will have to replace the check_equals_ function. In this
# case, however, check_equals_ is called so many times that it wouldn't make
# sense to hook where each one was called. Instead, use a SimProcedure to write
# your own check_equals_ implementation and then hook the check_equals_ symbol
# to replace all calls to scanf with a call to your SimProcedure.
#
# You may be thinking:
# Why can't I just use hooks? The function is called many times, but if I hook
# the address of the function itself (rather than the addresses where it is
# called), I can replace its behavior everywhere. Furthermore, I can get the
# parameters by reading them off the stack (with memory.load(regs.esp + xx)),
# and return a value by simply setting eax! Since I know the length of the
# function in bytes, I can return from the hook just before the 'ret'
# instruction is called, which will allow the program to jump back to where it
# was before it called my hook.
# If you thought that, then congratulations! You have just invented the idea of
# SimProcedures! Instead of doing all of that by hand, you can let the already-
# implemented SimProcedures do the boring work for you so that you can focus on
# writing a replacement function in a Pythonic way.
# As a bonus, SimProcedures allow you to specify custom calling conventions, but
# unfortunately it is not covered in this CTF.

import angr
import claripy
import sys

def main():
path_to_binary = './10_angr_simprocedures'
project = angr.Project(path_to_binary)

initial_state = project.factory.entry_state()

# Define a class that inherits angr.SimProcedure in order to take advantage
# of Angr's SimProcedures.
"""
这里前面的部分都可以直接照抄上面一题的代码,关键是定义一个继承angr.SimProcedure的类,以利用Angr的SimProcedures。
SimProcedure用Python编写的我们自己的函数代替了原来函数。 除了用Python编写之外,该函数的行为与用C编写的任何函数基本相同。
self之后的任何参数都将被视为要替换的函数的参数, 参数将是符号位向量。
另外,Python可以以常用的Python方式返回,Angr将以与原来函数相同的方式对待它
"""
class ReplacementCheckEquals(angr.SimProcedure):
# A SimProcedure replaces a function in the binary with a simulated one
# written in Python. Other than it being written in Python, the function
# acts largely the same as any function written in C. Any parameter after
# 'self' will be treated as a parameter to the function you are replacing.
# The parameters will be bitvectors. Additionally, the Python can return in
# the ususal Pythonic way. Angr will treat this in the same way it would
# treat a native function in the binary returning. An example:
#
# int add_if_positive(int a, int b) {
# if (a >= 0 && b >= 0) return a + b;
# else return 0;
# }
#
# could be simulated with...
#
# class ReplacementAddIfPositive(angr.SimProcedure):
# def run(self, a, b):
# if a >= 0 and b >=0:
# return a + b
# else:
# return 0
#
# Finish the parameters to the check_equals_ function. Reminder:
# int check_equals_AABBCCDDEEFFGGHH(char* to_check, int length) { ...
# (!)
"""
check_equals_AABBCCDDEEFFGGHH(char* to_check, int length)函数的第一个参数是待检测字符串首地址指针,
然后就是字符串的长度,接下来我们就可以开始书写我们的模拟函数
"""
def run(self, to_check, length):
# We can almost copy and paste the solution from the previous challenge.
# Hint: Don't look up the address! It's passed as a parameter.
# (!)
user_input_buffer_address = to_check
user_input_buffer_length = length

# Note the use of self.state to find the state of the system in a
# SimProcedure.
user_input_string = self.state.memory.load(
user_input_buffer_address,
user_input_buffer_length
)

check_against_string = 'ORSDDWXHZURJRBDH'

# Finally, instead of setting eax, we can use a Pythonic return statement
# to return the output of this function.
# Hint: Look at the previous solution.
return claripy.If(
user_input_string == check_against_string,
claripy.BVV(1, 32),
claripy.BVV(0, 32)
)


# Hook the check_equals symbol. Angr automatically looks up the address
# associated with the symbol. Alternatively, you can use 'hook' instead
# of 'hook_symbol' and specify the address of the function. To find the
# correct symbol, disassemble the binary.
# (!)
check_equals_symbol = 'check_equals_ORSDDWXHZURJRBDH' # :string
project.hook_symbol(check_equals_symbol, ReplacementCheckEquals())

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

solution = solution_state.posix.dumps(0)
print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

11_angr_sim_scanf

这题主要是学习如何hookscanf函数,步骤其实与上一题是几乎一致的,得先找到需要hook的函数符号,然后编写一个继承angr.SimProcedure的类,然后利用hook_symbol对函数进行hook

程序分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int __cdecl main(int argc, const char **argv, const char **envp)
{
int i; // [esp+20h] [ebp-28h]
char s[20]; // [esp+28h] [ebp-20h] BYREF
unsigned int v7; // [esp+3Ch] [ebp-Ch]

v7 = __readgsdword(0x14u);
memset(s, 0, sizeof(s));
qmemcpy(s, "SUQMKQFX", 8);
for ( i = 0; i <= 7; ++i )
s[i] = complex_function(s[i], i);
printf("Enter the password: ");
__isoc99_scanf("%u %u", buffer0, buffer1);
if ( !strncmp(buffer0, s, 4u) && !strncmp(buffer1, &s[4], 4u) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# This time, the solution involves simply replacing scanf with our own version,
# since Angr does not support requesting multiple parameters with scanf.

import angr
import claripy
import sys

def main():
path_to_binary = './11_angr_sim_scanf'
project = angr.Project(path_to_binary)

initial_state = project.factory.entry_state()

class ReplacementScanf(angr.SimProcedure):
# Finish the parameters to the scanf function. Hint: 'scanf("%u %u", ...)'.
# (!)
def run(self, format_string, scanf0_address, scanf1_address):
scanf0 = claripy.BVS('scanf0', 32)
scanf1 = claripy.BVS('scanf1', 32)

# The scanf function writes user input to the buffers to which the
# parameters point.
# 这里Scanf是要向内存写入数据的,于是我们利用使用 state.memory 的 .store(addr, val) 接口
# 将符号位向量写入两个字符串的内存区域
self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
self.state.memory.store(scanf1_address, scanf1, endness=project.arch.memory_endness)

# Now, we want to 'set aside' references to our symbolic values in the
# globals plugin included by default with a state. You will need to
# store multiple bitvectors. You can either use a list, tuple, or multiple
# keys to reference the different bitvectors.
# (!)
self.state.globals['solution0'] = scanf0
self.state.globals['solution1'] = scanf1

scanf_symbol = '__isoc99_scanf'
project.hook_symbol(scanf_symbol, ReplacementScanf())

simulation = project.factory.simgr(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(sys.stdout.fileno())
if b'Try again.\n' in stdout_output:
return True
else:
return False

simulation.explore(find=is_successful, avoid=should_abort)

if simulation.found:
solution_state = simulation.found[0]

# Grab whatever you set aside in the globals dict.
stored_solutions0 = solution_state.globals['solution0']
stored_solutions1 = solution_state.globals['solution1']
solution0 = solution_state.solver.eval(stored_solutions0, cast_to=int)
solution1 = solution_state.solver.eval(stored_solutions1, cast_to=int)
print(hex(solution0) + ' ' + hex(solution1))
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

这里的关键我们都知道Python的变量生存周期,在这里scanf0scanf1是函数ReplacementScanf的局部变量,为了让函数外部也能获得我们输入的符号位向量,从而调用求解器获得答案,需要将这两个符号位向量变为全局变量,这里我们需要调用带有全局状态的globals插件中“保存”对我们的符号值的引用。globals插件允许使用列表,元组或多个键的字典来存储多个位向量

12_angr_veritesting

主要学习使用Veritesting的技术解决路径爆炸问题。

程序逻辑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
int __cdecl main(int argc, const char **argv, const char **envp)
{
int v3; // ebx
int v5; // [esp-14h] [ebp-60h]
int v6; // [esp-10h] [ebp-5Ch]
int v7; // [esp-Ch] [ebp-58h]
int v8; // [esp-8h] [ebp-54h]
int v9; // [esp-4h] [ebp-50h]
const char **v10; // [esp+0h] [ebp-4Ch]
int v11; // [esp+4h] [ebp-48h]
int v12; // [esp+8h] [ebp-44h]
int v13; // [esp+Ch] [ebp-40h]
int v14; // [esp+10h] [ebp-3Ch]
int v15; // [esp+10h] [ebp-3Ch]
int v16; // [esp+14h] [ebp-38h]
int i; // [esp+14h] [ebp-38h]
int v18; // [esp+18h] [ebp-34h]
_DWORD v19[9]; // [esp+1Ch] [ebp-30h] BYREF
unsigned int v20; // [esp+40h] [ebp-Ch]
int *p_argc; // [esp+44h] [ebp-8h]

p_argc = &argc;
v10 = argv;
v20 = __readgsdword(0x14u);
memset((char *)v19 + 3, 0, 0x21u);
printf("Enter the password: ");
((void (__stdcall *)(const char *, char *, int, int, int, int, int, const char **, int, int, int, int, int, int, _DWORD))__isoc99_scanf)(
"%32s",
(char *)v19 + 3,
v5,
v6,
v7,
v8,
v9,
v10,
v11,
v12,
v13,
v14,
v16,
v18,
v19[0]);
v15 = 0;
for ( i = 0; i <= 31; ++i )
{
v3 = *((char *)v19 + i + 3);
if ( v3 == complex_function(75, i + 93) )
++v15;
}
if ( v15 != 32 || (_BYTE)v20 )
puts("Try again.");
else
puts("Good Job.");
return 0;
}

Veritesting

动态符号执行(DSE)和静态符号执行(SSE)一个为路径生成公式,一个为语句生成公式。前者生成公式时会产生很高的负载,但生成的公式很容易解;后者生成公式很容易,公式也能覆盖更多的路径,但是公式更长更难解。方法上的区别在于DSE会摘要路径汇合点上两条分支的情况,而SSE为两条分支fork两条独立的执行路径

SSE目前还不能对大规模的程序分析(如Cloud9+state merging),问题主要在于循环的表示、方程复杂度、缺少具体状态、和对syscall等的模拟。Veritesting可以在SSE和DSE之间切换,减少负载和公式求解难度,并解决静态方法需要摘要或其他方法才能处理的系统调用和间接跳转

简单来说就是Veritesting结合了静态符合执行与动态符号执行,减少了路径爆炸的影响,在angr里我们只要在构造模拟管理器时,启用Veritesting了就行。

1
project.factory.simgr(initial_state, veritesting=True)

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# When you construct a simulation manager, you will want to enable Veritesting:
# project.factory.simgr(initial_state, veritesting=True)
# Hint: use one of the first few levels' solutions as a reference.
import angr
import sys
import claripy

def main():
path_to_binary = './12_angr_veritesting'
project = angr.Project(path_to_binary, auto_load_libs=False)
initial_state = project.factory.entry_state()
simgr = project.factory.simgr(initial_state, veritesting=True)

def is_successful(state):
stdout_output = state.posix.dumps(1)
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(1)
if b'Try again.\n' in stdout_output:
return True
else:
return False

simgr.explore(find=is_successful, avoid=should_abort)

if simgr.found:
solution_state = simgr.found[0]

solution = solution_state.posix.dumps(0)
print(solution)
else:
print("No solution found.")

if __name__ == '__main__':
main()

13_angr_static_binary

这题如题就是主要学习如何使用angr解出静态编译的题目,学习如何Hook静态库函数。

程序逻辑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
int __cdecl main(int argc, const char **argv, const char **envp)
{
int i; // [esp+1Ch] [ebp-3Ch]
int j; // [esp+20h] [ebp-38h]
char v6[20]; // [esp+24h] [ebp-34h] BYREF
char v7[20]; // [esp+38h] [ebp-20h] BYREF
unsigned int v8; // [esp+4Ch] [ebp-Ch]

v8 = __readgsdword(0x14u);
for ( i = 0; i <= 19; ++i )
v7[i] = 0;
qmemcpy(v7, "PYIEFPIC", 8);
printf("Enter the password: ");
_isoc99_scanf("%8s", v6);
for ( j = 0; j <= 7; ++j )
v6[j] = complex_function(v6[j], j);
if ( !strcmp(v6, v7) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}

hook static_bin

通常,Angr会自动地用工作速度快得多的simprocedure代替标准库函数,但是这题中库函数都已经因为静态编译成了静态函数了,angr没法自动替换。要解决这题,需要手动Hook所有使用标准库的C函数,angr已经在simprocedure中为我们提供了这些静态函数。我们只需要手动找到程序中用到静态函数的地址,将其利用simprocedure提供的函数Hook掉即可。

exp

这题解题真正需要用的函数也就是__libc_start_main``printfscnafputs,即完成了angr需要的输出、输入、路径选择的功能。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# This challenge is the exact same as the first challenge, except that it was
# compiled as a static binary. Normally, Angr automatically replaces standard
# library functions with SimProcedures that work much more quickly.
#
# Here are a few SimProcedures Angr has already written for you. They implement
# standard library functions. You will not need all of them:
# angr.SIM_PROCEDURES['libc']['malloc']
# angr.SIM_PROCEDURES['libc']['fopen']
# angr.SIM_PROCEDURES['libc']['fclose']
# angr.SIM_PROCEDURES['libc']['fwrite']
# angr.SIM_PROCEDURES['libc']['getchar']
# angr.SIM_PROCEDURES['libc']['strncmp']
# angr.SIM_PROCEDURES['libc']['strcmp']
# angr.SIM_PROCEDURES['libc']['scanf']
# angr.SIM_PROCEDURES['libc']['printf']
# angr.SIM_PROCEDURES['libc']['puts']
# angr.SIM_PROCEDURES['libc']['exit']
# angr.SIM_PROCEDURES['glibc']['__libc_start_main']
#
# As a reminder, you can hook functions with something similar to:
# project.hook(malloc_address, angr.SIM_PROCEDURES['libc']['malloc']())
#
# There are many more, see:
# https://github.com/angr/angr/tree/master/angr/procedures/libc
#
# Additionally, note that, when the binary is executed, the main function is not
# the first piece of code called. In the _start function, __libc_start_main is
# called to start your program. The initialization that occurs in this function
# can take a long time with Angr, so you should replace it with a SimProcedure.
# angr.SIM_PROCEDURES['glibc']['__libc_start_main']
# Note 'glibc' instead of 'libc'.
import angr
import sys
import claripy

def main():
# Load the binary
path_to_binary = './13_angr_static_binary'
project = angr.Project(path_to_binary, auto_load_libs=False)

initial_state = project.factory.entry_state()

project.hook(0x804ed40, angr.SIM_PROCEDURES['libc']['printf']())
project.hook(0x804ed80, angr.SIM_PROCEDURES['libc']['scanf']())
project.hook(0x804f350, angr.SIM_PROCEDURES['libc']['puts']())
project.hook(0x8048d10, angr.SIM_PROCEDURES['glibc']['__libc_start_main']())

simgr = project.factory.simulation_manager(initial_state)

def is_successful(state):
stdout_output = state.posix.dumps(1)
if b'Good Job.\n' in stdout_output:
return True
else:
return False

def should_abort(state):
stdout_output = state.posix.dumps(1)
if b'Try again.\n' in stdout_output:
return True
else:
return False

simgr.explore(find=is_successful, avoid=should_abort)

if simgr.found:
solution_state = simgr.found[0]
solution = solution_state.posix.dumps(0)
print(solution)
else:
print("No solution found.")

if __name__ == '__main__':
main()

14_angr_shared_library

这题如题主要是学习如何使用angr求解函数是外部导入在动态库(.so)里的题目,这题我们有了两个文件,一个是主程序14_angr_shared_library,另一个就是库文件lib14_angr_shared_library.so

程序分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
int __cdecl main(int argc, const char **argv, const char **envp)
{
char s; // [esp+1Ch] [ebp-1Ch]
unsigned int v5; // [esp+2Ch] [ebp-Ch]

v5 = __readgsdword(0x14u);
memset(&s, 0, 0x10u);
print_msg();
printf("Enter the password: ");
__isoc99_scanf("%8s", &s);
if ( validate(&s, 8) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}
_BOOL4 __cdecl validate(char *s1, int a2)
{
char *v3; // esi
char s2[4]; // [esp+4h] [ebp-24h]
int v5; // [esp+8h] [ebp-20h]
int j; // [esp+18h] [ebp-10h]
int i; // [esp+1Ch] [ebp-Ch]

if ( a2 <= 7 )
return 0;
for ( i = 0; i <= 19; ++i )
s2[i] = 0;
*(_DWORD *)s2 = 'GKLW';
v5 = 'HWJL';
for ( j = 0; j <= 7; ++j )
{
v3 = &s1[j];
*v3 = complex_function(s1[j], j);
}
return strcmp(s1, s2) == 0;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# The shared library has the function validate, which takes a string and returns
# either true (1) or false (0). The binary calls this function. If it returns
# true, the program prints "Good Job." otherwise, it prints "Try again."
#
# Note: When you run this script, make sure you run it on
# lib14_angr_shared_library.so, not the executable. This level is intended to
# teach how to analyse binary formats that are not typical executables.

import angr
import claripy
import sys

def main():
path_to_binary = './lib14_angr_shared_library.so'

# The shared library is compiled with position-independent code. You will need
# to specify the base address. All addresses in the shared library will be
# base + offset, where offset is their address in the file.
# (!)
base = 0x4000000
project = angr.Project(path_to_binary, load_options={
'main_opts' : {
'custom_base_addr' : base
}
})

# Initialize any symbolic values here; you will need at least one to pass to
# the validate function.
buffer_pointer = claripy.BVV(0x3000000, 32)

# Begin the state at the beginning of the validate function, as if it was
# called by the program. Determine the parameters needed to call validate and
# replace 'parameters...' with bitvectors holding the values you wish to pass.
# Recall that 'claripy.BVV(value, size_in_bits)' constructs a bitvector
# initialized to a single value.
# Remember to add the base value you specified at the beginning to the
# function address!
# Hint: int validate(char* buffer, int length) { ...
# Another hint: the password is 8 bytes long.
# (!)
validate_function_address = base + 0x6d7
initial_state = project.factory.call_state(validate_function_address, buffer_pointer, claripy.BVV(8, 32))

# You will need to add code to inject a symbolic value into the program at the
# end of the function that constrains eax to equal true (value of 1) just
# before the function returns. There are multiple ways to do this:
# 1. Use a hook.
# 2. Search for the address just before the function returns and then
# constrain eax (this may require putting code elsewhere)
password = claripy.BVS('password', 8*8)
initial_state.memory.store(buffer_pointer, password)

simulation = project.factory.simgr(initial_state)

success_address = base + 0x783
simulation.explore(find=success_address)

if simulation.found:
solution_state = simulation.found[0]

# Determine where the program places the return value, and constrain it so
# that it is true. Then, solve for the solution and print it.
# (!)
solution_state.add_constraints(solution_state.regs.eax != 0)
solution = solution_state.solver.eval(password,cast_to=bytes)
print(solution)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

15_angr_arbitrary_read

主要是学习如何利用Angr实现内存地址的任意读。

程序逻辑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
int __cdecl main(int argc, const char **argv, const char **envp)
{
char v4; // [esp+Ch] [ebp-1Ch]
char *s; // [esp+1Ch] [ebp-Ch]

s = try_again;
print_msg();
printf("Enter the password: ");
__isoc99_scanf("%u %20s", &key, &v4);
if ( key == 19511649 )
puts(s);
else
puts(try_again);
return 0;
}
//.rodata:484F4A47 0000000A C Good Job.

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
# This binary takes both an integer and a string as a parameter. A certain
# integer input causes the program to reach a buffer overflow with which we can
# read a string from an arbitrary memory location. Our goal is to use Angr to
# search the program for this buffer overflow and then automatically generate
# an exploit to read the string "Good Job."
#
# What is the point of reading the string "Good Job."?
# This CTF attempts to replicate a simplified version of a possible vulnerability
# where a user can exploit the program to print a secret, such as a password or
# a private key. In order to keep consistency with the other challenges and to
# simplify the challenge, the goal of this program will be to print "Good Job."
# instead.
#
# The general strategy for crafting this script will be to:
# 1) Search for calls of the 'puts' function, which will eventually be exploited
# to print out "Good Job."
# 2) Determine if the first parameter of 'puts', a pointer to the string to be
# printed, can be controlled by the user to be set to the location of the
# "Good Job." string.
# 3) Solve for the input that prints "Good Job."
#
# Note: The script is structured to implement step #2 before #1.

# Some of the source code for this challenge:
#
# #include <stdio.h>
# #include <stdlib.h>
# #include <string.h>
# #include <stdint.h>
#
# // This will all be in .rodata
# char msg[] = "${ description }$";
# char* try_again = "Try again.";
# char* good_job = "Good Job.";
# uint32_t key;
#
# void print_msg() {
# printf("%s", msg);
# }
#
# uint32_t complex_function(uint32_t input) {
# ...
# }
#
# struct overflow_me {
# char buffer[16];
# char* to_print;
# };
#
# int main(int argc, char* argv[]) {
# struct overflow_me locals;
# locals.to_print = try_again;
#
# print_msg();
#
# printf("Enter the password: ");
# scanf("%u %20s", &key, locals.buffer);
#
# key = complex_function(key);
#
# switch (key) {
# case ?:
# puts(try_again);
# break;
#
# ...
#
# case ?:
# // Our goal is to trick this call to puts to print the "secret
# // password" (which happens, in our case, to be the string
# // "Good Job.")
# puts(locals.to_print);
# break;
#
# ...
# }
#
# return 0;
# }

import angr
import claripy
import sys

def main():
path_to_binary = './15_angr_arbitrary_read'
project = angr.Project(path_to_binary)

# You can either use a blank state or an entry state; just make sure to start
# at the beginning of the program.
# (!)
initial_state = initial_state = project.factory.entry_state()

# Again, scanf needs to be replaced.
class ReplacementScanf(angr.SimProcedure):
# Hint: scanf("%u %20s")
def run(self, format_string, param0, param1):
# %u
scanf0 = claripy.BVS('scanf0', 32)

# %20s
scanf1 = claripy.BVS('scanf1', 20*8)

# The bitvector.chop(bits=n) function splits the bitvector into a Python
# list containing the bitvector in segments of n bits each. In this case,
# we are splitting them into segments of 8 bits (one byte.)
for char in scanf1.chop(bits=8):
# Ensure that each character in the string is printable. An interesting
# experiment, once you have a working solution, would be to run the code
# without constraining the characters to the capital letters.
# Even though the solution will technically work without this, it's more
# difficult to enter in a solution that contains character you can't
# copy, paste, or type into your terminal or the web form that checks
# your solution.
# If you are using the web form to submit answers, your solution must be
# entirely alphanumeric except for spaces.
# (!)
self.state.add_constraints(char >= 'A', char <= 'Z')

# Warning: Endianness only applies to integers. If you store a string in
# memory and treat it as a little-endian integer, it will be backwards.
scanf0_address = param0
self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
scanf1_address = param1
self.state.memory.store(scanf1_address, scanf1, endness=project.arch.memory_endness)

self.state.globals['solutions'] = (scanf0,scanf1)

scanf_symbol = '__isoc99_scanf' # :string
project.hook_symbol(scanf_symbol, ReplacementScanf())

# We will call this whenever puts is called. The goal of this function is to
# determine if the pointer passed to puts is controllable by the user, such
# that we can rewrite it to point to the string "Good Job."
def check_puts(state):
# Recall that puts takes one parameter, a pointer to the string it will
# print. If we load that pointer from memory, we can analyse it to determine
# if it can be controlled by the user input in order to point it to the
# location of the "Good Job." string.
#
# Treat the implementation of this function as if puts was just called.
# The stack, registers, memory, etc should be set up as if the x86 call
# instruction was just invoked (but, of course, the function hasn't copied
# the buffers yet.)
# The stack will look as follows:
# ...
# esp + 7 -> /----------------\
# esp + 6 -> | puts |
# esp + 5 -> | parameter |
# esp + 4 -> \----------------/
# esp + 3 -> /----------------\
# esp + 2 -> | return |
# esp + 1 -> | address |
# esp -> \----------------/
#
# Hint: Look at level 08, 09, or 10 to review how to load a value from a
# memory address. Remember to use the correct endianness in the future when
# loading integers; it has been included for you here.
# (!)
puts_parameter = state.memory.load(state.regs.esp + 4, 4, endness=project.arch.memory_endness)

# The following function takes a bitvector as a parameter and checks if it
# can take on more than one value. While this does not necessary tell us we
# have found an exploitable state, it is a strong indication that the
# bitvector we checked may be controllable by the user.
# Use it to determine if the pointer passed to puts is symbolic.
# (!)
if state.se.symbolic(puts_parameter):
# Determine the location of the "Good Job." string. We want to print it
# out, and we will do so by attempting to constrain the puts parameter to
# equal it. (Hint: look at .rodata).
# Hint: use 'objdump -s <binary>' to look for the string's address.
# (!)
good_job_string_address = 0x484F4A47 # :integer, probably hexadecimal

# Create an expression that will test if puts_parameter equals
# good_job_string_address. If we add this as a constraint to our solver,
# it will try and find an input to make this expression true. Take a look
# at level 08 to remind yourself of the syntax of this.
# (!)
is_vulnerable_expression = puts_parameter == good_job_string_address # :boolean bitvector expression

# Have Angr evaluate the state to determine if all the constraints can
# be met, including the one we specified above. If it can be satisfied,
# we have found our exploit!
#
# When doing this, however, we do not want to edit our state in case we
# have not yet found what we are looking for. To test if our expression
# is satisfiable without editing the original, we need to clone the state.
copied_state = state.copy()

# We can now play around with the copied state without changing the
# original. We need to add our vulnerable expression as a state to test it.
# Look at level 08 and compare this call to how it is called there.
copied_state.add_constraints(is_vulnerable_expression)

# Finally, we test if we can satisfy the constraints of the state.
if copied_state.satisfiable():
# Before we return, let's add the constraint to the solver for real.
state.add_constraints(is_vulnerable_expression)
return True
else:
return False
else: # not state.se.symbolic(???)
return False

simulation = project.factory.simgr(initial_state)

# In order to determine if we have found a vulnerable call to 'puts', we need
# to run the function check_puts (defined above) whenever we reach a 'puts'
# call. To do this, we will look for the place where the instruction pointer,
# state.addr, is equal to the beginning of the puts function.
def is_successful(state):
# We are looking for puts. Check that the address is at the (very) beginning
# of the puts function. Warning: while, in theory, you could look for
# any address in puts, if you execute any instruction that adjusts the stack
# pointer, the stack diagram above will be incorrect. Therefore, it is
# recommended that you check for the very beginning of puts.
# (!)
puts_address = 0x8048370
if state.addr == puts_address:
# Return True if we determine this call to puts is exploitable.
return check_puts(state)
else:
# We have not yet found a call to puts; we should continue!
return False

simulation.explore(find=is_successful)

if simulation.found:
solution_state = simulation.found[0]
(scanf0, scanf1) = solution_state.globals['solutions']
solution0 = (solution_state.solver.eval(scanf0,cast_to=bytes))
solution1 = (solution_state.solver.eval(scanf1,cast_to=bytes))
print(solution0 + b' ' + solution1)
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

16_angr_arbitrary_write

学习如何任意写。

程序分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int __cdecl main(int argc, const char **argv, const char **envp)
{
char s[16]; // [esp+Ch] [ebp-1Ch] BYREF
char *dest; // [esp+1Ch] [ebp-Ch]

dest = unimportant_buffer;
memset(s, 0, sizeof(s));
strncpy(password_buffer, "PASSWORD", 0xCu);
printf("Enter the password: ");
__isoc99_scanf("%u %20s", &key, s);
if ( key == 0xB11403 )
strncpy(dest, s, 0x10u);
else
strncpy(unimportant_buffer, s, 0x10u);
if ( !strncmp(password_buffer, "NDYNWEUJ", 8u) )
puts("Good Job.");
else
puts("Try again.");
return 0;
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Essentially, the program does the following:
#
# scanf("%d %20s", &key, user_input);
# ...
# // if certain unknown conditions are true...
# strncpy(random_buffer, user_input);
# ...
# if (strncmp(secure_buffer, reference_string)) {
# // The secure_buffer does not equal the reference string.
# puts("Try again.");
# } else {
# // The two are equal.
# puts("Good Job.");
# }
#
# If this program has no bugs in it, it would _always_ print "Try again." since
# user_input copies into random_buffer, not secure_buffer.
#
# The question is: can we find a buffer overflow that will allow us to overwrite
# the random_buffer pointer to point to secure_buffer? (Spoiler: we can, but we
# will need to use Angr.)
#
# We want to identify a place in the binary, when strncpy is called, when we can:
# 1) Control the source contents (not the source pointer!)
# * This will allow us to write arbitrary data to the destination.
# 2) Control the destination pointer
# * This will allow us to write to an arbitrary location.
# If we can meet both of those requirements, we can write arbitrary data to an
# arbitrary location. Finally, we need to contrain the source contents to be
# equal to the reference_string and the destination pointer to be equal to the
# secure_buffer.

import angr
import claripy
import sys

def main():
path_to_binary = './16_angr_arbitrary_write'
project = angr.Project(path_to_binary)

# You can either use a blank state or an entry state; just make sure to start
# at the beginning of the program.
initial_state = project.factory.entry_state()

class ReplacementScanf(angr.SimProcedure):
# Hint: scanf("%u %20s")
def run(self, format_string, param0, param1):
# %u
scanf0 = claripy.BVS('scanf0', 32)

# %20s
scanf1 = claripy.BVS('scanf1', 20*8)

for char in scanf1.chop(bits=8):
self.state.add_constraints(char >= 'A', char <= 'Z')

scanf0_address = param0
self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
scanf1_address = param1
self.state.memory.store(scanf1_address, scanf1, endness=project.arch.memory_endness)

self.state.globals['solutions'] = (scanf0, scanf1)

scanf_symbol = '__isoc99_scanf' # :string
project.hook_symbol(scanf_symbol, ReplacementScanf())

# In this challenge, we want to check strncpy to determine if we can control
# both the source and the destination. It is common that we will be able to
# control at least one of the parameters, (such as when the program copies a
# string that it received via stdin).
def check_strncpy(state):
# The stack will look as follows:
# ... ________________
# esp + 15 -> / \
# esp + 14 -> | param2 |
# esp + 13 -> | len |
# esp + 12 -> \________________/
# esp + 11 -> / \
# esp + 10 -> | param1 |
# esp + 9 -> | src |
# esp + 8 -> \________________/
# esp + 7 -> / \
# esp + 6 -> | param0 |
# esp + 5 -> | dest |
# esp + 4 -> \________________/
# esp + 3 -> / \
# esp + 2 -> | return |
# esp + 1 -> | address |
# esp -> \________________/
# (!)
strncpy_src = state.memory.load(state.regs.esp + 8, 4, endness=project.arch.memory_endness)
strncpy_dest = state.memory.load(state.regs.esp + 4, 4, endness=project.arch.memory_endness)
strncpy_len = state.memory.load(state.regs.esp + 12, 4, endness=project.arch.memory_endness)

# We need to find out if src is symbolic, however, we care about the
# contents, rather than the pointer itself. Therefore, we have to load the
# the contents of src to determine if they are symbolic.
# Hint: How many bytes is strncpy copying?
# (!)
src_contents = state.memory.load(strncpy_src, strncpy_len)

# Our goal is to determine if we can write arbitrary data to an arbitrary
# location. This means determining if the source contents are symbolic
# (arbitrary data) and the destination pointer is symbolic (arbitrary
# destination).
# (!)
if state.solver.symbolic(src_contents) and state.solver.symbolic(strncpy_dest):
# Use ltrace to determine the reference string. Decompile the binary to
# determine the address of the buffer it checks the password against. Our
# goal is to overwrite that buffer to store the password.
# (!)
password_string = 'NDYNWEUJ' # :string
buffer_address = 0x57584344 # :integer, probably in hexadecimal

# Create an expression that tests if the first n bytes is length. Warning:
# while typical Python slices (array[start:end]) will work with bitvectors,
# they are indexed in an odd way. The ranges must start with a high value
# and end with a low value. Additionally, the bits are indexed from right
# to left. For example, let a bitvector, b, equal 'ABCDEFGH', (64 bits).
# The following will read bit 0-7 (total of 1 byte) from the right-most
# bit (the end of the string).
# b[7:0] == 'H'
# To access the beginning of the string, we need to access the last 16
# bits, or bits 48-63:
# b[63:48] == 'AB'
# In this specific case, since we don't necessarily know the length of the
# contents (unless you look at the binary), we can use the following:
# b[-1:-16] == 'AB', since, in Python, -1 is the end of the list, and -16
# is the 16th element from the end of the list. The actual numbers should
# correspond with the length of password_string.
# (!)
does_src_hold_password = src_contents[-1:-64] == password_string

# Create an expression to check if the dest parameter can be set to
# buffer_address. If this is true, then we have found our exploit!
# (!)
does_dest_equal_buffer_address = strncpy_dest == buffer_address

# In the previous challenge, we copied the state, added constraints to the
# copied state, and then determined if the constraints of the new state
# were satisfiable. Since that pattern is so common, Angr implemented a
# parameter 'extra_constraints' for the satisfiable function that does the
# exact same thing:
if state.satisfiable(extra_constraints=(does_src_hold_password, does_dest_equal_buffer_address)):
state.add_constraints(does_src_hold_password, does_dest_equal_buffer_address)
return True
else:
return False
else: # not state.se.symbolic(???)
return False

simulation = project.factory.simgr(initial_state)

def is_successful(state):
strncpy_address = 0x8048410
if state.addr == strncpy_address:
return check_strncpy(state)
else:
return False

simulation.explore(find=is_successful)

if simulation.found:
solution_state = simulation.found[0]
scanf0, scanf1 = solution_state.globals['solutions']
solution0 = (solution_state.solver.eval(scanf0))
solution1 = (solution_state.solver.eval(scanf1,cast_to=bytes))
print("[+] Success! Solution is: {0} {1}".format(solution0, solution1))
else:
raise Exception('Could not find the solution')

if __name__ == '__main__':
main()

17_angr_arbitrary_jump

这题主要是学会任意地址跳转,即利用Angr处理无约束状态。

程序逻辑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int __cdecl main(int argc, const char **argv, const char **envp)
{
printf("Enter the password: ");
read_input();
puts("Try again.");
return 0;
}
int read_input()
{
char v1[32]; // [esp+28h] [ebp-20h] BYREF

return __isoc99_scanf("%s", v1);
}
void __noreturn print_good() // 0x42585249
{
puts("Good Job.");
exit(0);
}

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import angr
import claripy

def main():
path_to_binary = "./17_angr_arbitrary_jump"
proj = angr.Project(path_to_binary)

class SimScanfProcedure(angr.SimProcedure):
def run(self, fmtstr, input_addr):
input_bvs = claripy.BVS('input_addr', 200 * 8)
for chr in input_bvs.chop(bits = 8):
self.state.add_constraints(chr >= 'A', chr <= 'Z')
self.state.memory.store(input_addr, input_bvs)
self.state.globals['input_val'] = input_bvs

proj.hook_symbol('__isoc99_scanf', SimScanfProcedure())
init_state = proj.factory.entry_state()
# 不丢弃unconstrained中的state
simgr = proj.factory.simgr(init_state,
save_unconstrained=True,
stashes={
'active': [init_state],
'unconstrained': [],
'found': [],
})

# 下一条指令是print_good的情况下有解的state符合条件
def filter_func(state):
print_good_addr = 0x42585249
return state.satisfiable(
extra_constraints=(state.regs.eip == print_good_addr,))

while not simgr.found:
# 如果没有可执行的state,或者没找到unconstrained的state,就退出
if (not simgr.active) and (not simgr.unconstrained):
break

# 把符合filter_func的unconstrained转移到found中
simgr.move(from_stash='unconstrained',
to_stash='found',
filter_func=filter_func)

simgr.step()

if simgr.found:

solution_state = simgr.found[0]
print_good_addr = 0x42585249
solution_state.add_constraints(solution_state.regs.eip == print_good_addr)
input_val = solution_state.solver.eval(solution_state.globals['input_val'],
cast_to=bytes)

print('password: {}'.format(input_val))
else:
raise Exception('Could not find the solution!')

if __name__ == '__main__':
main()
  • Title: angr入门
  • Author: 韩乔落
  • Created at : 2024-07-15 15:18:14
  • Updated at : 2024-07-18 16:41:00
  • Link: https://jelasin.github.io/2024/07/15/angr入门/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments