南大软分实验笔记｜A2 常量传播与Worklist求解器

前言

这是南大软件分析课程的第二个作业，对应第六节课的内容，实现常量传播模块和Worklist求解器的关键方法。详细实验说明见官方网站。

有了第一个实验作业的基础，这个作业其实不算难。然而，其中需要注意的编程细节要更多，边缘情况也更复杂，这也导致了课程OJ平台上本实验的通过率是所有八个实验中最低的，只有16%。通过率最高的是第一个实验（58%）。我的代码也暂时未能通过所有测试用例（50/52）：

2023-03-19 20:30:39 (GMT+08:00)	ERROR	analysis has "Exception" on [hidden] testcase(s)
Analyze 52 methods, pass 50 methods
There are 481 Stmts in all test cases
Your submission correctly analyzes 472 Stmts

[!] Failed on [hidden] testcase(s)
Tips: we will not give you any information about [hidden] testcase(s)

由于未通过的两个是非公开测试用例，我们无法根据反馈进行针对性的调试。如果你通过了所有测试用例，或者遇到了和我类似的问题，还请告知我，十分感谢。

更新 - 2024-04-13

通过与另一位同学的代码对比，我找到了问题，修改代码通过了所有测试用例，见下文“更新”部分。

任务一：实现常量传播

任务是将ConstantPropagation类的六个方法：

CPFact newBoundaryFact(CFG)
CPFact newInitialFact()
void meetInto(CPFact, CPFact)
boolean transferNode(Stmt, CPFact, CPFact)
Value meetValue(Value, Value)
Value evaluate(Exp, CPFact)

其中，前四个方法分别对应常量传播算法中的边界初始化、非边界基本块初始化、control flow handling和transfer function。meetValue和evaluate则分别是被meetInto和transferNode调用的方法。在任务二中，前四个方法将被我们的Worklist求解器调用。

其中，在实现newBoundaryFact时，我们需要注意将待分析方法的参数初始化为NAC，因为它们是由调用者传入的，一定不是常量。newBoundaryFact和newInitialFact的代码实现如下：

public CPFact newBoundaryFact(CFG<Stmt> cfg) {
    CPFact cpf = new CPFact();
    for (Var var: cfg.getIR().getParams()) {
        if (canHoldInt(var)) {
            cpf.update(var, Value.getNAC());
        }
    }
    return cpf;
}
public CPFact newInitialFact() {
    return new CPFact();
}

接下来是meetInto及相关方法。这里完全按照课上讲的meet方法编程实现即可：

public void meetInto(CPFact fact, CPFact target) {
    for(Var var: fact.keySet()) {
        target.update(var, meetValue(fact.get(var), target.get(var)));
    }
}
public Value meetValue(Value v1, Value v2) {
    if (v1.isNAC() || v2.isNAC()) {
        return Value.getNAC();
    }
    if (v1.isUndef()) {
        return v2;
    } else if (v2.isUndef()) {
        return v1;
    }
    if (v1.getConstant() == v2.getConstant()) {
        return Value.makeConstant(v1.getConstant());
    } else {
        return Value.getNAC();
    }
}

接下来是transferNode方法。这里需要注意的是，应该用.equals()方法来判断CPFact对象是否相等；另外，如果当前语句非赋值语句，那么transferNode里实现恒等逻辑即可。transferNode方法的代码实现如下：

public boolean transferNode(Stmt stmt, CPFact in, CPFact out) {
    if (stmt.getDef().isPresent()) {
        if ((stmt.getDef().get() instanceof Var) && canHoldInt((Var)stmt.getDef().get())) {
            CPFact outTmp = in.copy();
            outTmp.update((Var)stmt.getDef().get(), evaluate(((DefinitionStmt<Var, RValue>)stmt).getRValue(), in));
            if (!outTmp.equals(out)) {
                for (Var var: outTmp.keySet()) {
                    out.update(var, outTmp.get(var));
                }
            }
            return outTmp != out;
        }
    }
    // for other cases, just assign IN to OUT
    if (!in.equals(out)) {
        for(Var var: in.keySet()) {
            out.update(var, in.get(var));
        }
        return true;
    }
    return false;
}

其中，transferNode将右侧表达式求值的任务委托给了evaluate方法。该方法虽然看起来比较复杂，但实际上只需要分情况实现即可：

public static Value evaluate(Exp exp, CPFact in) {
    if (exp instanceof Var) {
        return in.get((Var)exp);
    }
    if (exp instanceof IntLiteral) {
        return Value.makeConstant(((IntLiteral) exp).getValue());
    }
    if (exp instanceof BinaryExp) {
        in.get(((BinaryExp) exp).getOperand2());
        Value v1 = in.get(((BinaryExp) exp).getOperand1());
        Value v2 = in.get(((BinaryExp) exp).getOperand2());

        if (v1.isNAC() || v2.isNAC()) {
            if (exp instanceof ArithmeticExp) {
                // consider mod and div by zero
                if (((ArithmeticExp) exp).getOperator() == ArithmeticExp.Op.DIV && v2.isConstant() && v2.getConstant() == 0) {
                    return Value.getUndef();
                }
                if (((ArithmeticExp) exp).getOperator() == ArithmeticExp.Op.REM && v2.isConstant() && v2.getConstant() == 0) {
                    return Value.getUndef();
                }
            }
            return Value.getNAC();
        }
        if (v1.isConstant() && v2.isConstant()) {
            if (exp instanceof ArithmeticExp) {
                if (((ArithmeticExp) exp).getOperator() == ArithmeticExp.Op.DIV && v2.getConstant() == 0) {
                    return Value.getUndef();
                }
                return switch (((ArithmeticExp) exp).getOperator()) {
                    case ADD -> Value.makeConstant(v1.getConstant() + v2.getConstant());
                    case SUB -> Value.makeConstant(v1.getConstant() - v2.getConstant());
                    case MUL -> Value.makeConstant(v1.getConstant() * v2.getConstant());
                    case DIV -> Value.makeConstant(v1.getConstant() / v2.getConstant());
                    case REM -> Value.makeConstant(v1.getConstant() % v2.getConstant());
                };
            }
            if (exp instanceof ConditionExp) {
                return switch (((ConditionExp) exp).getOperator()) {
                    case EQ -> Value.makeConstant(v1.getConstant() == v2.getConstant() ? 1 : 0);
                    case NE -> Value.makeConstant(v1.getConstant() != v2.getConstant() ? 1 : 0);
                    case GE -> Value.makeConstant(v1.getConstant() >= v2.getConstant() ? 1 : 0);
                    case GT -> Value.makeConstant(v1.getConstant() > v2.getConstant() ? 1 : 0);
                    case LE -> Value.makeConstant(v1.getConstant() <= v2.getConstant() ? 1 : 0);
                    case LT -> Value.makeConstant(v1.getConstant() < v2.getConstant() ? 1 : 0);
                };
            }
            if (exp instanceof ShiftExp) {
                return switch (((ShiftExp) exp).getOperator()) {
                    case SHL -> Value.makeConstant(v1.getConstant() << v2.getConstant());
                    case SHR -> Value.makeConstant(v1.getConstant() >> v2.getConstant());
                    case USHR -> Value.makeConstant(v1.getConstant() >>> v2.getConstant());
                };
            }
            if (exp instanceof BitwiseExp) {
                return switch (((BitwiseExp) exp).getOperator()) {
                    case OR -> Value.makeConstant(v1.getConstant() | v2.getConstant());
                    case AND -> Value.makeConstant(v1.getConstant() & v2.getConstant());
                    case XOR -> Value.makeConstant(v1.getConstant() ^ v2.getConstant());
                };
            }
        }
        return Value.getUndef();
    }
    return Value.getNAC();
}

更新 - 2024-04-13

上面的evaluate方法的考虑不太到位，具体可以参考下面的代码diff：

@@ -172,11 +171,17 @@ public class ConstantPropagation extends
                 }
                 return Value.getNAC();
             }
+            if (v1.isUndef() || v2.isUndef()) {
+                return Value.getUndef();
+            }
             if (v1.isConstant() && v2.isConstant()) {
                 if (exp instanceof ArithmeticExp) {
                     if (((ArithmeticExp) exp).getOperator() == ArithmeticExp.Op.DIV && v2.getConstant() == 0) {
                         return Value.getUndef();
                     }
+                    if (((ArithmeticExp) exp).getOperator() == ArithmeticExp.Op.REM && v2.getConstant() == 0) {
+                        return Value.getUndef();
+                    }
                     return switch (((ArithmeticExp) exp).getOperator()) {
                         case ADD -> Value.makeConstant(v1.getConstant() + v2.getConstant());
                         case SUB -> Value.makeConstant(v1.getConstant() - v2.getConstant());
@@ -210,7 +215,7 @@ public class ConstantPropagation extends
                     };
                 }
             }
-            return Value.getUndef();
+            return Value.getNAC();
         }
         return Value.getNAC();
     }

任务二：实现Worklist求解器

任务二涉及两个方法的实现：

Solver中的void initializeForward(CFG, DataflowResult)
WorklistSolver中的void doSolveForward(CFG, DataflowResult)

其中，Solver是求解器的基类，WorklistSolver继承了Solver类。

有了实验一积累的经验，这里就比较简单了。首先是initializeForward方法的实现：

protected void initializeForward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    result.setOutFact(cfg.getEntry(), analysis.newBoundaryFact(cfg));
    for (Node node: cfg) {
        if (cfg.isEntry(node)) {
            continue;
        }
        result.setInFact(node, analysis.newInitialFact());
        result.setOutFact(node, analysis.newInitialFact());
    }
}

接着是doSolveForward，也很直观，按照课上讲的算法伪代码实现即可：

protected void doSolveForward(CFG<Node> cfg, DataflowResult<Node, Fact> result) {
    ArrayList<Node> nodeArrayList = new ArrayList<Node>();
    for (Node node: cfg) {
        if (cfg.isEntry(node)) {
            continue;
        }
        nodeArrayList.add(node);
    }
    while(!nodeArrayList.isEmpty()) {
        Node node = nodeArrayList.remove(0);
        for (Node p: cfg.getPredsOf(node)) {
            analysis.meetInto(result.getOutFact(p), result.getInFact(node));
        }
        boolean changed = analysis.transferNode(node, result.getInFact(node), result.getOutFact(node));
        if (changed) {
            nodeArrayList.addAll(cfg.getSuccsOf(node));
        }
    }
}

测试示例

我们首先用JUnit来测试一下所有本地测试用例，非常方便：

这里我们展示一下对SimpleBranch用例的分析结果。SimpleBranch类的源码如下所示：

class SimpleBranch {
    static void NAC(int p) {
        int x;
        if (p > 0) {
            x = 1;
        } else {
            x = 2;
        }
        int y = x;
    }
}

分析结果如下：

-------------------- <SimpleBranch: void <init>()> (constprop) --------------------
[0@L1] invokespecial %this.<java.lang.Object: void <init>()>(); {}
[1@L1] return; {}

-------------------- <SimpleBranch: void NAC(int)> (constprop) --------------------
[0@L5] %intconst0 = 0; {%intconst0=0, p=NAC}
[1@L5] if (p > %intconst0) goto 3; {%intconst0=0, p=NAC}
[2@L5] goto 6; {%intconst0=0, p=NAC}
[3@L5] nop; {%intconst0=0, p=NAC}
[4@L6] x = 1; {%intconst0=0, p=NAC, x=1}
[5@L5] goto 8; {%intconst0=0, p=NAC, x=1}
[6@L5] nop; {%intconst0=0, p=NAC}
[7@L8] x = 2; {%intconst0=0, p=NAC, x=2}
[8@L8] nop; {%intconst0=0, p=NAC, x=NAC}
[9@L10] y = x; {%intconst0=0, p=NAC, x=NAC, y=NAC}
[10@L10] return; {%intconst0=0, p=NAC, x=NAC, y=NAC}

总结与思考

这个实验再一次证明了“纸上得来终觉浅”。看起来很简单的算法，在实现时有许多边边角角需要考虑。对于我来说，关键是那两个未通过的非公开测试用例。噫吁嚱，意难平！