前言

这是南大软件分析课程的第五个作业,需要完成非上下文敏感的过程间指针分析。详细实验说明见官方网站。涉及到的主要课程知识如下:

虽然这是第一个指针分析相关的动手实践,一开始上手可能需要一段时间才能进入状态,但是个人感觉它实际上并不是很难,需要考虑的边缘情况也不多,认真把算法实现即可。在本地测试通过后,我的代码也通过了OJ平台上的所有测试用例。

除了课上学的指针分析相关知识外,这个实验本身还引入了两个有意思的东西:在分析范围上,增加了对静态字段、数组索引和静态方法调用的支持;介绍并采用访问者模式来设计部分功能。

如果一开始有些东西没想明白,可以参考gonghr的博客,质量很高。另外,这个博客分享了A1到A5共五个实验的OJ满分源码,可以用来对比分析自己写的代码存在的问题。

实验用到的过程间指针分析算法如下图所示:

Xnip2023-03-29_22-52-32

新增的分析规则

我们在第10课中的分析规则表的基础上新增针对静态字段、数组索引和静态方法调用的规则,在这里给出一张完整的表,方便参考:

种类 语句 规则 PFG边
New i: x = new T() $\frac{}{o_{i} \in pt(x)}$ N/A
Assign x = y $\frac{o_{i} \in pt(y)}{o_{i} \in pt(x)}$ $x \leftarrow y$
Store x.f = y $\frac{o_{i} \in pt(x),\ o_{j} \in pt(y)}{o_{j} \in pt(o_{i}.f)}$ $o_{i}.f \leftarrow y$
Load y = x.f $\frac{o_{i} \in pt(x),\ o_{j} \in pt(o_{i}.f)}{o_{j} \in pt(y)}$ $y \leftarrow o_{i}.f$
Call l: r = x.k(a1, ..., an) image-2023032920150796 image-2023032920153346
Static Store T.f = y $\frac{o_{i} \in pt(y)}{o_{i} \in pt(T.f)}$ $T.f \leftarrow y$
Static Load y = T.f $\frac{o_{i} \in pt(T.f)}{o_{i} \in pt(y)}$ $y \leftarrow T.f$
Array Store x[i] = y $\frac{o_{u} \in pt(x),\ o_{v} \in pt(y)}{o_{v} \in pt(o_{u}[*])}$ $o_{u}[*] \leftarrow y$
Array Load y = x[i] $\frac{o_{u} \in pt(x),\ o_{v} \in pt(o_{u}[*])}{o_{v} \in pt(y)}$ $y \leftarrow o_{u}[*]$
Static Call l: r = T.m(a1, ..., an) image-20240410224753572 image-20240410224822885

实验内容

接下来,结合算法和实验手册分步实现即可。

void addReachable(JMethod)

这个方法实现了算法伪代码中的AddReachable函数。这里我们采用了“访问者模式”设计实现,addReachable会遍历方法的所有语句:

private void addReachable(JMethod method) {
    if (callGraph.contains(method)) {
        return;
    }
    callGraph.addReachableMethod(method);
    method.getIR().forEach(stmt -> {
        stmt.accept(stmtProcessor);
    });
}

核心逻辑实现代码位于StmtProcessor中:

private class StmtProcessor implements StmtVisitor<Void> {
    @Override
    public Void visit(Invoke stmt) {
        if (stmt.isStatic()) { // static invocation
            JMethod m = resolveCallee(null, stmt);
            Edge<Invoke, JMethod> edge = new Edge<>(CallKind.STATIC, stmt, m);
            if (callGraph.addEdge(edge)) { // the call graph changes
                addReachable(m);
                for (int i = 0; i < m.getParamCount(); i++) {
                    Var arg = stmt.getRValue().getArg(i);
                    Var param = m.getIR().getParam(i);
                    addPFGEdge(pointerFlowGraph.getVarPtr(arg), pointerFlowGraph.getVarPtr(param));
                }
                Var lv = stmt.getLValue();
                if (lv != null) {
                    for (Var ret: m.getIR().getReturnVars()) {
                        addPFGEdge(pointerFlowGraph.getVarPtr(ret), pointerFlowGraph.getVarPtr(lv));
                    }
                }
            }
        }
        return null;
    }
    public Void visit(New stmt) {
        VarPtr p = pointerFlowGraph.getVarPtr(stmt.getLValue());
        Obj o = heapModel.getObj(stmt);
        PointsToSet objects = new PointsToSet(o);
        workList.addEntry(p, objects);
        return null;
    }
    public Void visit(Copy stmt) {
        addPFGEdge(pointerFlowGraph.getVarPtr(stmt.getRValue()), pointerFlowGraph.getVarPtr(stmt.getLValue()));
        return null;
    }
    public Void visit(StoreField stmt) {
        JField field = stmt.getFieldRef().resolve();
        if (field.isStatic()) {
            addPFGEdge(pointerFlowGraph.getVarPtr(stmt.getRValue()), pointerFlowGraph.getStaticField(field));
        }
        return null;
    }
    public Void visit(LoadField stmt) {
        JField field = stmt.getFieldRef().resolve();
        if (field.isStatic()) {
            addPFGEdge(pointerFlowGraph.getStaticField(field), pointerFlowGraph.getVarPtr(stmt.getLValue()));
        }
        return null;
    }
    public Void visit(StoreArray stmt) {
        return StmtVisitor.super.visit(stmt);
    }
    public Void visit(LoadArray stmt) {
        return StmtVisitor.super.visit(stmt);
    }
}

void addPFGEdge(Pointer, Pointer)

这里对应算法的AddEdge函数,也比较简单:

private void addPFGEdge(Pointer source, Pointer target) {
    if (pointerFlowGraph.addEdge(source, target)) {
        if (!source.getPointsToSet().isEmpty()) {
            workList.addEntry(target, source.getPointsToSet());
        }
    }
}

void analyze()

这里对应Solve函数的主要部分——while循环:

private void analyze() {
    while (!workList.isEmpty()) {
        WorkList.Entry entry = workList.pollEntry();
        PointsToSet delta = propagate(entry.pointer(), entry.pointsToSet());
        if (entry.pointer() instanceof VarPtr varptr) {
            Var var = varptr.getVar();
            delta.forEach(obj -> {
                var.getStoreFields().forEach(stmt -> {
                    VarPtr rv = pointerFlowGraph.getVarPtr(stmt.getRValue());
                    JField f = stmt.getFieldRef().resolve();
                    addPFGEdge(rv, pointerFlowGraph.getInstanceField(obj, f));
                });
                var.getLoadFields().forEach(stmt -> {
                    VarPtr lv = pointerFlowGraph.getVarPtr(stmt.getLValue());
                    JField f = stmt.getFieldRef().resolve();
                    addPFGEdge(pointerFlowGraph.getInstanceField(obj, f), lv);
                });
                var.getStoreArrays().forEach(stmt -> {
                    VarPtr rv = pointerFlowGraph.getVarPtr(stmt.getRValue());
                    ArrayIndex ai = pointerFlowGraph.getArrayIndex(obj);
                    addPFGEdge(rv, ai);
                });
                var.getLoadArrays().forEach(stmt -> {
                    VarPtr lv = pointerFlowGraph.getVarPtr(stmt.getLValue());
                    ArrayIndex ai = pointerFlowGraph.getArrayIndex(obj);
                    addPFGEdge(ai, lv);
                });
                processCall(var, obj);
            });
        }
    }
}

PointsToSet propagate(Pointer,PointsToSet)

这里对应算法的Propagate函数。需要注意的是,差集的计算也被放入了这个函数:

private PointsToSet propagate(Pointer pointer, PointsToSet pointsToSet) {
    PointsToSet delta = new PointsToSet();
    pointsToSet.forEach(obj -> {
        if (!pointer.getPointsToSet().contains(obj)) {
            delta.addObject(obj);
        }
    });
    if (!delta.isEmpty()) {
        delta.forEach(obj -> {
            pointer.getPointsToSet().addObject(obj);
        });
        Set<Pointer> succ = pointerFlowGraph.getSuccsOf(pointer);
        succ.forEach(pointer1 -> {
            workList.addEntry(pointer1, delta);
        });
    }
    return delta;
}

void processCall(Var,Obj)

最后是算法中ProcessCall函数的实现:

private void processCall(Var var, Obj recv) {
    var.getInvokes().forEach(stmt -> {
        JMethod m = resolveCallee(recv, stmt);
        VarPtr vp = pointerFlowGraph.getVarPtr(m.getIR().getThis());
        workList.addEntry(vp, new PointsToSet(recv));
        Edge<Invoke, JMethod> edge = null;
        if (stmt.isDynamic()) {
            edge = new Edge<>(CallKind.DYNAMIC, stmt, m);
        } else if (stmt.isInterface()) {
            edge = new Edge<>(CallKind.INTERFACE, stmt, m);
        } else if (stmt.isSpecial()) {
            edge = new Edge<>(CallKind.SPECIAL, stmt, m);
        } else if (stmt.isVirtual()) {
            edge = new Edge<>(CallKind.VIRTUAL, stmt, m);
        }
        if (callGraph.addEdge(edge)) {
            addReachable(m);
            for (int i = 0; i < m.getParamCount(); i++) {
                Var arg = stmt.getRValue().getArg(i);
                Var param = m.getIR().getParam(i);
                addPFGEdge(pointerFlowGraph.getVarPtr(arg), pointerFlowGraph.getVarPtr(param));
            }
            Var lv = stmt.getLValue();
            if (lv != null) {
                for (Var ret: m.getIR().getReturnVars()) {
                    addPFGEdge(pointerFlowGraph.getVarPtr(ret), pointerFlowGraph.getVarPtr(lv));
                }
            }
        }
    });
}

测试用例

下面是一个本地测试用例:

class StaticField {
    public static void main(String[] args) {
        A.b = new B();
        B b = A.b;
    }
}
class A {
    static B b;
}
class B {
}

我们来看一看前面程序分析的结果到底长什么样:

Points-to sets of all variables
<B: void <init>()>/%this -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
<StaticField: void main(java.lang.String[])>/b -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
<StaticField: void main(java.lang.String[])>/temp$0 -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]
<java.lang.Object: void <init>()>/%this -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]

Points-to sets of all static fields
<A: B b> -> [NewObj{<StaticField: void main(java.lang.String[])>[0@L4] new B}]

Points-to sets of all instance fields

Points-to sets of all array indexes

总结与思考

时隔一年,恢复之前中断的程序分析学习,再一次感受到这门课的质量之高。另外,确确实实,之前的笔记和实验记录给了我快速“恢复函数执行现场”的机会,节省了不少时间。