前言
这是南大软件分析课程的第五个作业,需要完成非上下文敏感的过程间指针分析。详细实验说明见官方网站。涉及到的主要课程知识如下:
虽然这是第一个指针分析相关的动手实践,一开始上手可能需要一段时间才能进入状态,但是个人感觉它实际上并不是很难,需要考虑的边缘情况也不多,认真把算法实现即可。在本地测试通过后,我的代码也通过了OJ平台上的所有测试用例。
除了课上学的指针分析相关知识外,这个实验本身还引入了两个有意思的东西:在分析范围上,增加了对静态字段、数组索引和静态方法调用的支持;介绍并采用访问者模式来设计部分功能。
如果一开始有些东西没想明白,可以参考gonghr的博客,质量很高。另外,这个博客分享了A1到A5共五个实验的OJ满分源码,可以用来对比分析自己写的代码存在的问题。
实验用到的过程间指针分析算法如下图所示:
新增的分析规则
我们在第10课中的分析规则表的基础上新增针对静态字段、数组索引和静态方法调用的规则,在这里给出一张完整的表,方便参考:
种类 | 语句 | 规则 | PFG边 |
---|---|---|---|
New | i: x = new T() |
$\frac{}{o_{i} \in pt(x)}$ | N/A |
Assign | x = y |
$\frac{o_{i} \in pt(y)}{o_{i} \in pt(x)}$ | $x \leftarrow y$ |
Store | x.f = y |
$\frac{o_{i} \in pt(x),\ o_{j} \in pt(y)}{o_{j} \in pt(o_{i}.f)}$ | $o_{i}.f \leftarrow y$ |
Load | y = x.f |
$\frac{o_{i} \in pt(x),\ o_{j} \in pt(o_{i}.f)}{o_{j} \in pt(y)}$ | $y \leftarrow o_{i}.f$ |
Call | l: r = x.k(a1, ..., an) |
||
Static Store | T.f = y |
$\frac{o_{i} \in pt(y)}{o_{i} \in pt(T.f)}$ | $T.f \leftarrow y$ |
Static Load | y = T.f |
$\frac{o_{i} \in pt(T.f)}{o_{i} \in pt(y)}$ | $y \leftarrow T.f$ |
Array Store | x[i] = y |
$\frac{o_{u} \in pt(x),\ o_{v} \in pt(y)}{o_{v} \in pt(o_{u}[*])}$ | $o_{u}[*] \leftarrow y$ |
Array Load | y = x[i] |
$\frac{o_{u} \in pt(x),\ o_{v} \in pt(o_{u}[*])}{o_{v} \in pt(y)}$ | $y \leftarrow o_{u}[*]$ |
Static Call | l: r = T.m(a1, ..., an) |
实验内容
接下来,结合算法和实验手册分步实现即可。
void addReachable(JMethod)
这个方法实现了算法伪代码中的AddReachable
函数。这里我们采用了“访问者模式”设计实现,addReachable
会遍历方法的所有语句:
private void addReachable(JMethod method) {
if (callGraph.contains(method)) {
return;
}
callGraph.addReachableMethod(method);
method.getIR().forEach(stmt -> {
stmt.accept(stmtProcessor);
});
}
核心逻辑实现代码位于StmtProcessor
中:
private class StmtProcessor implements StmtVisitor<Void> {
@Override
public Void visit(Invoke stmt) {
if (stmt.isStatic()) { // static invocation
JMethod m = resolveCallee(null, stmt);
Edge<Invoke, JMethod> edge = new Edge<>(CallKind.STATIC, stmt, m);
if (callGraph.addEdge(edge)) { // the call graph changes
addReachable(m);
for (int i = 0; i < m.getParamCount(); i++) {
Var arg = stmt.getRValue().getArg(i);
Var param = m.getIR().getParam(i);
addPFGEdge(pointerFlowGraph.getVarPtr(arg), pointerFlowGraph.getVarPtr(param));
}
Var lv = stmt.getLValue();
if (lv != null) {
for (Var ret: m.getIR().getReturnVars()) {
addPFGEdge(pointerFlowGraph.getVarPtr(ret), pointerFlowGraph.getVarPtr(lv));
}
}
}
}
return null;
}
public Void visit(New stmt) {
VarPtr p = pointerFlowGraph.getVarPtr(stmt.getLValue());
Obj o = heapModel.getObj(stmt);
PointsToSet objects = new PointsToSet(o);
workList.addEntry(p, objects);
return null;
}
public Void visit(Copy stmt) {
addPFGEdge(pointerFlowGraph.getVarPtr(stmt.getRValue()), pointerFlowGraph.getVarPtr(stmt.getLValue()));
return null;
}
public Void visit(StoreField stmt) {
JField field = stmt.getFieldRef().resolve();
if (field.isStatic()) {
addPFGEdge(pointerFlowGraph.getVarPtr(stmt.getRValue()), pointerFlowGraph.getStaticField(field));
}
return null;
}
public Void visit(LoadField stmt) {
JField field = stmt.getFieldRef().resolve();
if (field.isStatic()) {
addPFGEdge(pointerFlowGraph.getStaticField(field), pointerFlowGraph.getVarPtr(stmt.getLValue()));
}
return null;
}
public Void visit(StoreArray stmt) {
return StmtVisitor.super.visit(stmt);
}
public Void visit(LoadArray stmt) {
return StmtVisitor.super.visit(stmt);
}
}
void addPFGEdge(Pointer, Pointer)
这里对应算法的AddEdge
函数,也比较简单:
private void addPFGEdge(Pointer source, Pointer target) {
if (pointerFlowGraph.addEdge(source, target)) {
if (!source.getPointsToSet().isEmpty()) {
workList.addEntry(target, source.getPointsToSet());
}
}
}
void analyze()
这里对应Solve
函数的主要部分——while循环:
private void analyze() {
while (!workList.isEmpty()) {
WorkList.Entry entry = workList.pollEntry();
PointsToSet delta = propagate(entry.pointer(), entry.pointsToSet());
if (entry.pointer() instanceof VarPtr varptr) {
Var var = varptr.getVar();
delta.forEach(obj -> {
var.getStoreFields().forEach(stmt -> {
VarPtr rv = pointerFlowGraph.getVarPtr(stmt.getRValue());
JField f = stmt.getFieldRef().resolve();
addPFGEdge(rv, pointerFlowGraph.getInstanceField(obj, f));
});
var.getLoadFields().forEach(stmt -> {
VarPtr lv = pointerFlowGraph.getVarPtr(stmt.getLValue());
JField f = stmt.getFieldRef().resolve();
addPFGEdge(pointerFlowGraph.getInstanceField(obj, f), lv);
});
var.getStoreArrays().forEach(stmt -> {
VarPtr rv = pointerFlowGraph.getVarPtr(stmt.getRValue());
ArrayIndex ai = pointerFlowGraph.getArrayIndex(obj);
addPFGEdge(rv, ai);
});
var.getLoadArrays().forEach(stmt -> {
VarPtr lv = pointerFlowGraph.getVarPtr(stmt.getLValue());
ArrayIndex ai = pointerFlowGraph.getArrayIndex(obj);
addPFGEdge(ai, lv);
});
processCall(var, obj);
});
}
}
}
PointsToSet propagate(Pointer,PointsToSet)
这里对应算法的Propagate
函数。需要注意的是,差集的计算也被放入了这个函数:
private PointsToSet propagate(Pointer pointer, PointsToSet pointsToSet) {
PointsToSet delta = new PointsToSet();
pointsToSet.forEach(obj -> {
if (!pointer.getPointsToSet().contains(obj)) {
delta.addObject(obj);
}
});
if (!delta.isEmpty()) {
delta.forEach(obj -> {
pointer.getPointsToSet().addObject(obj);
});
Set<Pointer> succ = pointerFlowGraph.getSuccsOf(pointer);
succ.forEach(pointer1 -> {
workList.addEntry(pointer1, delta);
});
}
return delta;
}
void processCall(Var,Obj)
最后是算法中ProcessCall
函数的实现:
private void processCall(Var var, Obj recv) {
var.getInvokes().forEach(stmt -> {
JMethod m = resolveCallee(recv, stmt);
VarPtr vp = pointerFlowGraph.getVarPtr(m.getIR().getThis());
workList.addEntry(vp, new PointsToSet(recv));
Edge<Invoke, JMethod> edge = null;
if (stmt.isDynamic()) {
edge = new Edge<>(CallKind.DYNAMIC, stmt, m);
} else if (stmt.isInterface()) {
edge = new Edge<>(CallKind.INTERFACE, stmt, m);
} else if (stmt.isSpecial()) {
edge = new Edge<>(CallKind.SPECIAL, stmt, m);
} else if (stmt.isVirtual()) {
edge = new Edge<>(CallKind.VIRTUAL, stmt, m);
}
if (callGraph.addEdge(edge)) {
addReachable(m);
for (int i = 0; i < m.getParamCount(); i++) {
Var arg = stmt.getRValue().getArg(i);
Var param = m.getIR().getParam(i);
addPFGEdge(pointerFlowGraph.getVarPtr(arg), pointerFlowGraph.getVarPtr(param));
}
Var lv = stmt.getLValue();
if (lv != null) {
for (Var ret: m.getIR().getReturnVars()) {
addPFGEdge(pointerFlowGraph.getVarPtr(ret), pointerFlowGraph.getVarPtr(lv));
}
}
}
});
}
测试用例
下面是一个本地测试用例:
class StaticField {
public static void main(String[] args) {
A.b = new B();
B b = A.b;
}
}
class A {
static B b;
}
class B {
}
我们来看一看前面程序分析的结果到底长什么样:
Points-to sets of all variables
<B: void <init>()>/%this -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
<StaticField: void main(java.lang.String[])>/b -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
<StaticField: void main(java.lang.String[])>/temp$0 -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
<java.lang.Object: void <init>()>/%this -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
Points-to sets of all static fields
<A: B b> -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
Points-to sets of all instance fields
Points-to sets of all array indexes
总结与思考
时隔一年,恢复之前中断的程序分析学习,再一次感受到这门课的质量之高。另外,确确实实,之前的笔记和实验记录给了我快速“恢复函数执行现场”的机会,节省了不少时间。