System fixes bugs by importing functionality

At the Association for Computing Machinery’s Programming Language Design and Implementation conference this month, MIT researchers presented a new system that repairs dangerous software bugs by automatically importing functionality from other, more secure applications.

Remarkably, the system, dubbed CodePhage, doesn’t require access to the source code of the applications whose functionality it’s borrowing. Instead, it analyzes the applications’ execution and characterizes the types of security checks they perform. As a consequence, it can import checks from applications written in programming languages other than the one in which the program it’s repairing was written.

Once it’s imported code into a vulnerable application, CodePhage can provide a further layer of analysis that guarantees that the bug has been repaired.

“We have tons of source code available in open-source repositories, millions of projects, and a lot of these projects implement similar specifications,” says Stelios Sidiroglou-Douskos, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) who led the development of CodePhage. “Even though that might not be the core functionality of the program, they frequently have subcomponents that share functionality across a large number of projects.”

With CodePhage, he says, “over time, what you’d be doing is building this hybrid system that takes the best components from all these implementations.”

Sidiroglou-Douskos and his coauthors — MIT professor of computer science and engineering Martin Rinard, graduate student Fan Long, and Eric Lahtinen, a researcher in Rinard’s group — refer to the program CodePhage is repairing as the “recipient” and the program whose functionality it’s borrowing as the “donor.” To begin its analysis, CodePhage requires two sample inputs: one that causes the recipient to crash and one that doesn’t. A bug-locating program that the same group reported in March, dubbed DIODE, generates crash-inducing inputs automatically. But a user may simply have found that trying to open a particular file caused a crash.

Carrying the past

First, CodePhage feeds the “safe” input — the one that doesn’t induce crashes — to the donor. It then tracks the sequence of operations the donor executes and records them using a symbolic expression, a string of symbols that describes the logical constraints the operations impose.

At some point, for instance, the donor may check to see whether the size of the input is below some threshold. If it is, CodePhage will add a term to its growing symbolic expression that represents the condition of being below that threshold. It doesn’t record the actual size of the file — just the constraint imposed by the check.

Next, CodePhage feeds the donor the crash-inducing input. Again, it builds up a symbolic expression that represents the operations the donor performs. When the new symbolic expression diverges from the old one, however, CodePhage interrupts the process. The divergence represents a constraint that the safe input met and the crash-inducing input does not. As such, it could be a security check missing from the recipient.

CodePhage then analyzes the recipient to find locations at which the input meets most, but not quite all, of the constraints described by the new symbolic expression. The recipient may perform different operations in a different order than the donor does, and it may store data in different forms. But the symbolic expression describes the state of the data after it’s been processed, not the processing itself.