[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools - 代码天地

[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools

其他 2018-06-18 14:39:46 阅读次数: 2

Basic Information

Publication: ICSE'17
Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, Abhik Roychoudhury
Language: C Program
Source: Codeforces Programming Contest (Reject/Accept)
Description: a set of 3902 defects from 7436 programs automatically classified across 39 defect classes
Dataset Homepage

Summary

Existing benchmarks (like ManyBugs and IntroClass) on automated program repairs do not allow thorough investigation of the relationship between fault types and the effectiveness of repair tools.
Four criterias for a benchmark that allows extensive evaluation of repair tools:

C1: Diverse types of real defects.
C2: Large number of defects.
C3: Large number of programs.
C4: Programs that are algorithmically complex
C5: Large held-out test suite for patch correctness verification

Overall, author crawled over 10000 webpages from Codeforces programming contest. For each rejected submission r, they find another accepted submission a by the same user for the same programming problem in the crawled data. Each fault is represented by the submission pair (r, a). In total, they obtain 5544 defects. Then they further exclude 924 defects due to inadequate held-out tests, 677 defects due to non-reproducible bugs, and 41 defects due to a known CIL bugs2 in handling variable sized multidimensional array.

All defects are divided into 39 classes by using Gumtree on AST-level syntactic differences between buggy program and patched program.

猜你喜欢

转载自www.cnblogs.com/XBWer/p/9195057.html

[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools

Benchmark

autoML综述：Benchmark and Survey of Automated Machine Learning Frameworks

apache benchmark

Benchmark简介

benchmark & baseline

benchmark问题？

R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason

优秀论文阅读——An Investigation into the Use of Mutation Analysis for Automated Program Repair [SSBSE 2017]

TUM Visual-Inertial Dataset介绍与尝试 The TUM VI Benchmark for Evaluating Visual-Inertial Odometry论文翻译

软工APR方向论文——A correlation study between automated program repair and test-suite metrics [来自EMSE 2018]

Java Benchmark 基准测试

Mysql Scalability(5)benchmark

【转】benchmark测试

mysql的BENCHMARK（）函数

流行的rpc框架benchmark

APACHE AB BENCHMARK

swift benchmark 转载

RPC Benchmark，Dubbo垫底

Video Segmentation Dataset and Benchmark

How to Benchmark a Hadoop Cluster

Overview of the TPC Benchmark C

Spark组件的benchmark

POJ 2661 Factstone Benchmark

Caltech Pedestrian Detection Benchmark

Extending the Yahoo! Streaming Benchmark

Benchmark 性能测试简介

Visual Tracker Benchmark

基准测试(benchmark)

tensorflow benchmark 疑问记录

今日推荐

周排行

成为C++高手之宏与枚举

在CAD二次开发中使用进度条

Js插件ECharts，HighCharts学习网址整理

Celery提交任务出错(on windows.)

cephfs内核客户端性能追踪

thinkphp中PHPExcel用法

EntityFramework动态组合多排序字段

汇编语言（八）实验9 根据材料编程

安装ubuntu后必须做的事情（对我而言）

JS函数式编程

每日归档

更多

2024-10-22(0)

2024-10-21(0)

2024-10-20(0)

2024-10-19(0)

2024-10-18(0)

2024-10-17(0)

2024-10-16(0)

2024-10-15(0)

2024-10-14(0)

2024-10-13(0)