Group of Software Security In Progress

GoSSIP @ LoCCS.Shanghai Jiao Tong University

Automatically Learning Semantic Features for Defect Prediction

论文下载

Abstract && Introduction

  • Defect Prediction: predicting defective code regions
  • 文章使用 Deep Belief Network (DBN) 对代码做深度学习
  • (DBN本质就是神经网络)

Approach

Fig

  • 基于JAVA

Parsing Source Code

  • 将代码表示成AST,而后抽取三类AST nodes:
    • method invocations、class instance creations
    • declaration nodes
    • control flow nodes

Experimental Setup

Metrics

Fig

Two Baselines of Traditional Features

  • PROMISE data
    • 包括 LOC、operand and operator counts、class中方法数量、继承树中的位置等传统特征
  • AST nodes
    • each instance is : a vector of term frequencies of the AST nodes

Training DBN and Generating Features

  • 主要确定3个参数:
    • number of layers
    • 每层节点数
    • number of training iterations
  • Data Sets
    • Fig
  • 先固定 training iterations = 50
    • Fig
    • hidden layers = 10
    • nodes = 100
    • Fig
    • iteration = 200,此时 error rate = 0.098,时间平均15s

Results

Within-project Defect prediction

Fig

Different Classification Algorithms

Fig

Cross-project Defect Prediction

Fig

Costs

Fig