Cooperative Bug Isolation

The resources available for improving software are always limited, and through sheer numbers a program’s user community will uncover many flaws not captured during the program’s development. We propose a low-overhead sampling infrastructure for gathering information from the runs experienced by a program’s user community. Monitoring is both sparse and random, so complete information is never available about any single run. However, the instrumentation strategy is also lightweight and therefore practical to deploy to user communities numbering in the thousands or millions. Additionally we have developed a body of techniques, called “statistical debugging,” for isolating the causes of bugs using this sparsely sampled data. Statistical debugging combines statistical modeling, machine learning, and program analysis to identify those program behaviors which are highly predictive of program failure. We will also briefly consider other potential applications of sampling deployed software.