Memory without sacrificing speed

Random-access memory, or RAM, is where computers like to store the data they’re working on. A processor can retrieve data from RAM tens of thousands of times more rapidly than it can from the computer’s disk drive.

But in the age of big data, data sets are often much too large to fit in a single computer’s RAM. Sequencing data describing a single large genome could take up the RAM of somewhere between 40 and 100 typical computers.

Flash memory — the type of memory used by most portable devices — could provide an alternative to conventional RAM for big-data applications. It’s about a tenth as expensive, and it consumes about a tenth as much power.

The problem is that it’s also a tenth as fast. But at the International Symposium on Computer Architecture in June, MIT researchers presented a new system that, for several common big-data applications, should make servers using flash memory as efficient as those using conventional RAM, while preserving their power and cost savings.

The researchers also presented experimental evidence showing that, if the servers executing a distributed computation have to go to disk for data even 5 percent of the time, their performance falls to a level that’s comparable with flash, anyway.

In other words, even without the researchers’ new techniques for accelerating data retrieval from flash memory, 40 servers with 10 terabytes’ worth of RAM couldn’t handle a 10.5-terabyte computation any better than 20 servers with 20 terabytes’ worth of flash memory, which would consume only a fraction as much power.

“This is not a replacement for DRAM [dynamic RAM] or anything like that,” says Arvind, the Johnson Professor of Computer Science and Engineering at MIT, whose group performed the new work. “But there may be many applications that can take advantage of this new style of architecture. Which companies recognize: Everybody’s experimenting with different aspects of flash. We’re just trying to establish another point in the design space.”

Joining Arvind on the new paper are Sang Woo Jun and Ming Liu, MIT graduate students in computer science and engineering and joint first authors; their fellow grad student Shuotao Xu; Sungjin Lee, a postdoc in Arvind’s group; Myron King and Jamey Hicks, who did their PhDs with Arvind and were researchers at Quanta Computer when the new system was developed; and one of their colleagues from Quanta, John Ankcorn — who is also an MIT alumnus.

Outsourced computation

The researchers were able to make a network of flash-based servers competitive with a network of RAM-based servers by moving a little computational power off of the servers and onto the chips that control the flash drives. By preprocessing some of the data on the flash drives before passing it back to the servers, those chips can make distributed computation much more efficient. And since the preprocessing algorithms are wired into the chips, they dispense with the computational overhead associated with running an operating system, maintaining a file system, and the like.

With hardware contributed by some of their sponsors — Quanta, Samsung, and Xilinx — the researchers built a prototype network of 20 servers. Each server was connected to a field-programmable gate array, or FPGA, a kind of chip that can be reprogrammed to mimic different types of electrical circuits. Each FPGA, in turn, was connected to two half-terabyte — or 500-gigabyte — flash chips and to the two FPGAs nearest it in the server rack.