前言

这是南大软件分析课程的第八个实验,需要在实验A6上下文敏感的过程间指针分析)的基础上实现污点分析。详细实验说明见官方网站。涉及到的主要课程知识为:

实验说明中包含很多课上没有提到的细节,需要仔细阅读。大体的实现策略如下:

  • 在原指针分析中处理invoke语句的地方添加对source和污点传播规则的实现。
  • 借助原指针分析的工作流实现“taint作为普通对象”的常规传播。
  • 指针分析结束后,在collectTaintFlows方法中进行sink规则的实现(生成taintflow)。

我最初实现的版本能够通过除了StringAppend.java测试用例外的所有本地测试,并找到OJ平台上所有测试用例一共29条taintflow中的23条,但是我始终不知道问题出在哪里。经过一天多的调试(中间的健身、休息、跟彭老师的诗词交流起到了重要作用!),我终于发现问题出在污点传播上,详情见后文。最终,我的代码能够通过OJ平台上所有测试用例,FP和FN均为0。当然,这并不能证明代码是完全正确的——至少,我的代码并没有主动考虑污点类型的变化(如StringBuilderString)。

Source、Sink与污点传播规则

Source和sink的生成规则比较简单:

类型 语句 规则(上下文$c$)
Call (Source) l: r = x.k(a1, ..., an) image-20240420075747700
Call (Sink) l: r = x.k(a1, ..., an) image-20240420075822539

除了常规的对象传播外,污点传播还包括基于invoke语句的传播,因为目前的分析算法并不了解调用的API的语义。这部分的传播规则需要我们直接给出。对此,实验手册给出了三条传播规则,很好理解:

类型 语句 规则(上下文$c$)
Call (base-to-result) l: r = x.k(a1, ... ,an) image-20240420080221081
Call (arg-to-base) l: r = x.k(a1, ... ,an) image-20240420080432881
Call (arg-to-result) l: r = x.k(a1, ... ,an) image-20240420080455042

初次实现的版本

Solver.java

StmtProcessor

在处理静态方法调用时增加对source和污点传播规则的处理:

private class StmtProcessor implements StmtVisitor<Void> {
    // ...
    public Void visit(Invoke stmt) {
        if (stmt.isStatic()) { // static invocation
            // ...
            if (callGraph.addEdge(edge)) { // the call graph changes
                // ...
                if (lVar != null) {
                    // ...                    
                    if (taintAnalysis.isSource(m, m.getReturnType())) { // source
                        Obj taint = taintAnalysis.makeTaint(stmt, m.getReturnType());
                        CSObj csTaint = csManager.getCSObj(taintAnalysis.getEmptyContext(), taint);
                        PointsToSet pts = PointsToSetFactory.make(csTaint);
                        workList.addEntry(lv, pts);
                    }
                    // transfer if applicable
                    taintAnalysis.doTaintTransfer(null, this.context, stmt, m);
        // ... } ...
        return null;
    }
    // ...
}

analyze

Taint是虚拟出来的,我们不需要调用污点对象的对应方法。因此,在指针分析的工作流中需要增加对目标对象是否为taint的判断:

private void analyze() {
    while (!workList.isEmpty()) {
        WorkList.Entry entry = workList.pollEntry();
        PointsToSet delta = propagate(entry.pointer(), entry.pointsToSet());
        if (entry.pointer() instanceof CSVar csVar) {
            Var var = csVar.getVar();
            delta.forEach(csObj -> {
                // ...
                if (!taintAnalysis.isTaint(csObj.getObject())) {
                    processCall(csVar, csObj);
                }
            });
// ... } ...

processCall

在处理非静态方法调用时增加对source和污点传播规则的处理:

private void processCall(CSVar recv, CSObj recvObj) {
    Var var = recv.getVar();
    var.getInvokes().forEach(stmt -> {
        // ...
        if (callGraph.addEdge(edge)) {
            // ...
            if (lVar != null) {
                // ...
                if (taintAnalysis.isSource(m, m.getReturnType())) { // source
                    Obj taint = taintAnalysis.makeTaint(stmt, m.getReturnType());
                    CSObj csTaint = csManager.getCSObj(taintAnalysis.getEmptyContext(), taint);
                    PointsToSet pts = PointsToSetFactory.make(csTaint);
                    workList.addEntry(lv, pts);
        // ... } ...
        taintAnalysis.doTaintTransfer(recv, recv.getContext(), stmt, m);
    });
}

TaintAnalysiss.java

doTaintTransfer

doTaintTransfer方法实现了污点传播规则——分成三种情况分别讨论即可。其中,静态方法的invoke只会涉及arg-to-result的情况。

public void doTaintTransfer(CSVar recv, Context callerContext, Invoke stmt, JMethod jMethod) {
    if (!this.transMethods.contains(jMethod.getSubsignature())) { return; }
    for (TaintTransfer trans: this.config.getTransfers()) {
        if (trans.method().getSubsignature() == jMethod.getSubsignature()) {
            // case #1 arg-to-result
            if (trans.from() >= 0 && trans.to() == TaintTransfer.RESULT) {
                Var lVar = stmt.getLValue();
                if (lVar != null) {
                    CSVar lv = csManager.getCSVar(callerContext, lVar);
                    CSVar ai = csManager.getCSVar(callerContext, stmt.getInvokeExp().getArg(trans.from()));
                    PointsToSet pts = PointsToSetFactory.make();
                    ai.getPointsToSet().forEach(csobj -> {
                        if (manager.isTaint(csobj.getObject())) { pts.addObject(csobj); }
                    });
                    if (!pts.isEmpty()) { solver.addWorklistEntry(lv, pts); }
                }
            }
            // check if the method is static, just return
            if (stmt.isStatic()) { continue; }
            // case #2 base-to-result
            if (trans.from() == TaintTransfer.BASE){
                Var lVar = stmt.getLValue();
                if (lVar != null) {
                    CSVar lv = csManager.getCSVar(callerContext, lVar);
                    PointsToSet pts = PointsToSetFactory.make();
                    recv.getPointsToSet().forEach(csobj -> {
                        if (manager.isTaint(csobj.getObject())) { pts.addObject(csobj); }
                    });
                    if (!pts.isEmpty()) { solver.addWorklistEntry(lv, pts); }
                }
            }
            // case #3 arg-to-base
            if (trans.from() >= 0 && trans.to() == TaintTransfer.BASE) {
                CSVar ai = csManager.getCSVar(callerContext, stmt.getInvokeExp().getArg(trans.from()));
                PointsToSet pts = PointsToSetFactory.make();
                ai.getPointsToSet().forEach(csobj -> {
                    if (manager.isTaint(csobj.getObject())) { pts.addObject(csobj); }
                });
                if (!pts.isEmpty()) { solver.addWorklistEntry(recv, pts); }
// ... } ...

collectTaintFlows

这里只需要处理一下sink即可:

private Set<TaintFlow> collectTaintFlows() {
    Set<TaintFlow> taintFlows = new TreeSet<>();
    PointerAnalysisResult result = solver.getResult();
    result.getCSCallGraph().edges().forEach(csCallSiteCSMethodEdge -> {
        CSCallSite csCallSite = csCallSiteCSMethodEdge.getCallSite();
        CSMethod csMethod = csCallSiteCSMethodEdge.getCallee();
        JMethod jMethod = csMethod.getMethod();
        if (isSink(jMethod)) {
            for (Sink sink: this.config.getSinks()) {
                if (jMethod.getSubsignature() == sink.method().getSubsignature()) {
                    Var var = csCallSite.getCallSite().getInvokeExp().getArg(sink.index());
                    CSVar csVar = csManager.getCSVar(csCallSite.getContext(), var);
                    for (CSObj csObj: result.getPointsToSet(csVar)) {
                        if (manager.isTaint(csObj.getObject())) {
                            MockObj mockObj = (MockObj) csObj.getObject();
                            Invoke sourceInvoke = (Invoke) mockObj.getAllocation();
                            Invoke sinkInvoke = csCallSite.getCallSite();
                            TaintFlow taintFlow = new TaintFlow(sourceInvoke, sinkInvoke, sink.index());
                            taintFlows.add(taintFlow);
	// ... } ...
    return taintFlows;
}

辅助变量和方法

下面是一些辅助变量的处理:

public TaintAnalysiss(Solver solver) {
	// ...
    transMethods = new HashSet<>();
    config.getTransfers().forEach(trans -> {
        transMethods.add(trans.method().getSubsignature());
    });
    sinkMethods = new HashSet<>();
    config.getSinks().forEach(sink -> {
        sinkMethods.add(sink.method().getSubsignature());
    });
    // ...
}

下面是一些辅助方法:

public boolean isSource(JMethod jMethod, Type type) {
    Source source = new Source(jMethod, type);
    return this.config.getSources().contains(source);
}
public boolean isSink(JMethod jMethod) { return this.sinkMethods.contains(jMethod.getSubsignature()); }
public Obj makeTaint(Invoke source, Type type) { return this.manager.makeTaint(source, type); }
public boolean isTaint(Obj obj) { return manager.isTaint(obj); }
public Context getEmptyContext() { return this.emptyContext; }

如何通过StringAppend.java测试

问题分析

StringAppend.java的源码如下:

class StringAppend {
    public static void main(String[] args) {
        stringAdd();
        stringBuffer();
        stringBuilder();
    }
    static void stringAdd() {
        String taint = SourceSink.source();
        String s = "abc" + taint + "xyz";
        SourceSink.sink(s);
    }
    static void stringBuffer() {
        String taint = SourceSink.source();
        StringBuffer sb = new StringBuffer();
        sb.append("abc");
        sb.append(taint);
        sb.append("xyz");
        String s = sb.toString();
        SourceSink.sink(s); // taint
    }
    static void stringBuilder() {
        String taint = SourceSink.source();
        StringBuilder sb = new StringBuilder();
        sb.append("abc");
        sb.append(taint);
        sb.append("xyz");
        String s = sb.toString();
        SourceSink.sink(s); // taint
    }
}

我的代码一条taintflow也没有找到。正确的TaintFlow结果如下:

TaintFlow{<StringAppend: void stringAdd()>[0@L10] temp$0 = invokestatic <SourceSink: java.lang.String source()>(); -> <StringAppend: void stringAdd()>[11@L12] invokestatic <SourceSink: void sink(java.lang.String)>(s);/0}
TaintFlow{<StringAppend: void stringBuffer()>[0@L16] temp$0 = invokestatic <SourceSink: java.lang.String source()>(); -> <StringAppend: void stringBuffer()>[12@L22] invokestatic <SourceSink: void sink(java.lang.String)>(s);/0}
TaintFlow{<StringAppend: void stringBuilder()>[0@L26] temp$0 = invokestatic <SourceSink: java.lang.String source()>(); -> <StringAppend: void stringBuilder()>[12@L32] invokestatic <SourceSink: void sink(java.lang.String)>(s);/0}

一开始,我的思路不太清晰,主要是通过添加打印语句、在脑子里按照代码逻辑推理污点传播情况等方式来debug。然而,这样效果并不好。一方面,打印输出的内容太多(即使是上下文不敏感),难以发现问题;另一方面,算法不是单一逻辑直接推进的,有多个程序点会向worklist中添加待处理项,很难在脑海中模拟这个过程。

后来,我得到了一个启示——既然污点分析是依托于指针分析进行的,那可以通过研究指针分析产出的变量指向集合来找出是哪一步出了问题。以stringAdd方法为例(下方列出了相关IR),在添加相关打印语句并输出结果后,我很快发现,taint变量指向的污点对象并没有传递给temp$1

static void stringAdd() {
    java.lang.String temp$0, taint, %stringconst0, %stringconst1, temp$2, s;
    java.lang.StringBuffer temp$1;
    [0@L10] temp$0 = invokestatic <SourceSink: java.lang.String source()>();
    [1@L10] taint = temp$0;
    [2@L11] temp$1 = new java.lang.StringBuffer;
    [3@L11] invokespecial temp$1.<java.lang.StringBuffer: void <init>()>();
    [4@L11] %stringconst0 = "abc";
    [5@L11] invokevirtual temp$1.<java.lang.StringBuffer: java.lang.StringBuffer append(java.lang.Object)>(%stringconst0);
    [6@L11] invokevirtual temp$1.<java.lang.StringBuffer: java.lang.StringBuffer append(java.lang.Object)>(taint);
    [7@L11] %stringconst1 = "xyz";
    [8@L11] invokevirtual temp$1.<java.lang.StringBuffer: java.lang.StringBuffer append(java.lang.Object)>(%stringconst1);
    [9@L11] temp$2 = invokevirtual temp$1.<java.lang.StringBuffer: java.lang.String toString()>();
    [10@L11] s = temp$2;
    [11@L12] invokestatic <SourceSink: void sink(java.lang.String)>(s);
    [12@L12] return;
}

继续分析可以发现,根本问题是我没有为基于invoke语句的污点传播规则涉及到的“source”和“target”变量像在PFG中那样连上某种“污点传播边”;另外,在propagate方法中,当delta中包含污点对象时,应该基于这些“污点传播边”向后继节点传播污点对象。

解决方案

明确问题后,解决思路就很清晰了——缺什么补什么。我们需要一个类似PFG的东西去记录“污点传播边”,姑且叫它TFG(Taint Flow Graph),然后在这个TFG上做污点对象的propagation。

首先在TaintAnalysis.java中增加TFG的实现(复制PFG的实现即可):

class TaintFlowGraph {
    private final MultiMap<Pointer, Pointer> successors = Maps.newMultiMap();
    boolean addEdge(Pointer source, Pointer target) { return successors.put(source, target); }
    Set<Pointer> getSuccsOf(Pointer pointer) { return successors.get(pointer); }
}

并为TaintAnalysiss类增加TFG和辅助方法:

private final TaintFlowGraph taintFlowGraph;
public TaintAnalysiss(Solver solver) {
	// ...
    taintFlowGraph = new TaintFlowGraph();
    // ...
}
public void addTFGEdge(Pointer source, Pointer target) {
    if (taintFlowGraph.addEdge(source, target)) {
        if (!source.getPointsToSet().isEmpty()) {
            PointsToSet pts = PointsToSetFactory.make();
            source.getPointsToSet().forEach(csObj -> {
                if (manager.isTaint(csObj.getObject())) { pts.addObject(csObj); }
            });
            if (!pts.isEmpty()) { solver.addWorklistEntry(target, pts); }
        }
    }
}
public Set<Pointer> getSuccsOfTFG(Pointer pointer) { return this.taintFlowGraph.getSuccsOf(pointer); }

修改doTaintTransfer,变成基于TFG的实现(也简洁了许多):

public void doTaintTransfer(CSVar recv, Context callerContext, Invoke stmt, JMethod jMethod) {
    if (!this.transMethods.contains(jMethod.getSubsignature())) { return; }
    for (TaintTransfer trans: this.config.getTransfers()) {
        if (trans.method().getSubsignature() == jMethod.getSubsignature()) {
            // arg-to-result
            if (trans.from() >= 0 && trans.to() == TaintTransfer.RESULT) {
                Var lVar = stmt.getLValue();
                if (lVar != null) {
                    CSVar lv = csManager.getCSVar(callerContext, lVar);
                    CSVar ai = csManager.getCSVar(callerContext, stmt.getInvokeExp().getArg(trans.from()));
                    this.addTFGEdge(ai, lv);
                }
            }
            // check if the method is static, just return
            if (stmt.isStatic()) { continue; }
            // base-to-result
            if (trans.from() == TaintTransfer.BASE){
                Var lVar = stmt.getLValue();
                if (lVar != null) {
                    CSVar lv = csManager.getCSVar(callerContext, lVar);
                    this.addTFGEdge(recv, lv);
                }
            }
            // arg-to-base
            if (trans.from() >= 0 && trans.to() == TaintTransfer.BASE) {
                CSVar ai = csManager.getCSVar(callerContext, stmt.getInvokeExp().getArg(trans.from()));
                this.addTFGEdge(ai, recv);
// ... } ...

最后,在propagate方法中添加基于TFG的传播:

private PointsToSet propagate(Pointer pointer, PointsToSet pointsToSet) {
	// ...
    if (!delta.isEmpty()) {
        PointsToSet taintObjs = PointsToSetFactory.make();
        delta.forEach(csobj -> {
            pointer.getPointsToSet().addObject(csobj);
            if (taintAnalysis.isTaint(csobj.getObject())) { taintObjs.addObject(csobj); }
        });
        Set<Pointer> succ = pointerFlowGraph.getSuccsOf(pointer);
        succ.forEach(pointer1 -> { workList.addEntry(pointer1, delta); });
        // Taint Analysis
        if (!taintObjs.isEmpty()) {
            if (pointer instanceof CSVar) {
                Set<Pointer> succTaint = this.taintAnalysis.getSuccsOfTFG(pointer);
                succTaint.forEach(tp -> { workList.addEntry(tp, taintObjs); });
    // ... } ...
    return delta;
}

总结与思考

遇到问题不要放弃,另外污点分析确实很有意思。