Memory management scheme could help enable chips

In a modern, multicore chip, every core — or processor — has its own small memory cache, where it stores frequently used data. But the chip also has a larger, shared cache, which all the cores can access.

If one core tries to update data in the shared cache, other cores working on the same data need to know. So the shared cache keeps a directory of which cores have copies of which data.

That directory takes up a significant chunk of memory: In a 64-core chip, it might be 12 percent of the shared cache. And that percentage will only increase with the core count. Envisioned chips with 128, 256, or even 1,000 cores will need a more efficient way of maintaining cache coherence.

At the International Conference on Parallel Architectures and Compilation Techniques in October, MIT researchers unveil the first fundamentally new approach to cache coherence in more than three decades. Whereas with existing techniques, the directory’s memory allotment increases in direct proportion to the number of cores, with the new approach, it increases according to the logarithm of the number of cores.

In a 128-core chip, that means that the new technique would require only one-third as much memory as its predecessor. With Intel set to release a 72-core high-performance chip in the near future, that’s a more than hypothetical advantage. But with a 256-core chip, the space savings rises to 80 percent, and with a 1,000-core chip, 96 percent.

When multiple cores are simply reading data stored at the same location, there’s no problem. Conflicts arise only when one of the cores needs to update the shared data. With a directory system, the chip looks up which cores are working on that data and sends them messages invalidating their locally stored copies of it.

“Directories guarantee that when a write happens, no stale copies of the data exist,” says Xiangyao Yu, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “After this write happens, no read to the previous version should happen. So this write is ordered after all the previous reads in physical-time order.”

Time travel

What Yu and his thesis advisor — Srini Devadas, the Edwin Sibley Webster Professor in MIT’s Department of Electrical Engineering and Computer Science — realized was that the physical-time order of distributed computations doesn’t really matter, so long as their logical-time order is preserved. That is, core A can keep working away on a piece of data that core B has since overwritten, provided that the rest of the system treats core A’s work as having preceded core B’s.

The ingenuity of Yu and Devadas’ approach is in finding a simple and efficient means of enforcing a global logical-time ordering. “What we do is we just assign time stamps to each operation, and we make sure that all the operations follow that time stamp order,” Yu says.

With Yu and Devadas’ system, each core has its own counter, and each data item in memory has an associated counter, too. When a program launches, all the counters are set to zero. When a core reads a piece of data, it takes out a “lease” on it, meaning that it increments the data item’s counter to, say, 10. As long as the core’s internal counter doesn’t exceed 10, its copy of the data is valid. (The particular numbers don’t matter much; what matters is their relative value.)

Diagnose brain function

For all of the advances in medical technology, many of the world’s most widely-used diagnostic tools essentially involve just two things: pen and paper.

Tests such as the Montreal Cognitive Assessment (MoCA) and the Clock Drawing Test (CDT) are used to detect cognitive change arising from a wide range of causes, from strokes and concussions to dementias such as Alzheimer’s disease.

What’s disconcerting, though, is that, with dementia and other disorders growing in prevalence, most current diagnostic methods detect cognitive impairment only after it starts affecting people’s lives. In Alzheimer’s, for example, changes in the brain may occur 10 or more years before the cognitive change becomes noticeable, and no easily administered test can detect these changes at the very earliest stage.

At least, not yet.

This month researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) were part of a team that published a paper demonstrating a predictive model that, coupled with existing hardware, opens up the possibility of detecting disorders such as dementia earlier than ever before.

Clock-drawing test

For several decades, doctors have screened for conditions including Parkinson’s and Alzheimer’s with the CDT, which asks subjects to draw an analog clock-face showing a specified time, and to copy a pre-drawn clock.

But the test has limitations, because its benchmarks rely on doctors’ subjective judgments, such as determining whether a clock circle has “only minor distortion.”

CSAIL researchers were particularly struck by the fact that CDT analysis was typically based on the person’s final drawing rather than on the process as a whole.

Enter the Anoto Live Pen, a digitizing ballpoint pen that measures its position on the paper upwards of 80 times a second, using a camera built into the pen. The pen provides data that are far more precise than can be measured on an ordinary drawing, and captures timing information that allows the system to analyze each and every one of a subject’s movements and hesitations.

Research at Lahey Hospital and Medical Center and CSAIL produced novel software for analyzing this version of the test, producing what the team calls the digital Clock Drawing Test (dCDT).

Predictive power of drawings

Working with a collection of 2,600 tests administered over the past nine years, the team developed computational models that show early promise in being able to better detect whether someone has a cognitive impairment, and even determine precisely what they may have.

They tested their models against standard methods used by physicians and found that the machine learning models were significantly more accurate.

“We’ve improved the analysis so that it is automated and objective,” says CSAIL principal investigator Cynthia Rudin, a professor at the Sloan School of Management and co-author of the paper. “With the right equipment, you can get results wherever you want, quickly, and with higher accuracy.”

Some of the machine learning techniques they used were designed to produce “transparent” classifiers, which provide insights into what factors are important for screening and diagnosis.

“These examples help calibrate the predictive power of each part of the drawing,” says first author William Souillard-Mandar, a graduate student at CSAIL. “They allow us to extract thousands of features from the drawing process that give hints about the subject’s cognitive state, and our algorithms help determine which ones can make the most accurate prediction.”

Souillard-Mandar and Rudin co-wrote the paper with MIT Professor Randall Davis and researchers Dana Penney of Lahey Hospital, Rhoda Au of Boston University, David Libon of Drexel University, Catherine Price of the University of Florida, Melissa Lamar of the University of Illinois Chicago, and Rod Swenson of the University of North Dakota Medical School.

Different disorders reveal themselves in different ways on the CDT, which asks people to draw a clock showing 10 minutes after 11, and then asks them to copy a pre-drawn clock showing that time.

Physical models in a few hours

Researchers at MIT and Boston Children’s Hospital have developed a system that can take MRI scans of a patient’s heart and, in a matter of hours, convert them into a tangible, physical model that surgeons can use to plan surgery.

The models could provide a more intuitive way for surgeons to assess and prepare for the anatomical idiosyncrasies of individual patients. “Our collaborators are convinced that this will make a difference,” says Polina Golland, a professor of electrical engineering and computer science at MIT, who led the project. “The phrase I heard is that ‘surgeons see with their hands,’ that the perception is in the touch.”

This fall, seven cardiac surgeons at Boston Children’s Hospital will participate in a study intended to evaluate the models’ usefulness.

Golland and her colleagues will describe their new system at the International Conference on Medical Image Computing and Computer Assisted Intervention in October. Danielle Pace, an MIT graduate student in electrical engineering and computer science, is first author on the paper and spearheaded the development of the software that analyzes the MRI scans. Mehdi Moghari, a physicist at Boston Children’s Hospital, developed new procedures that increase the precision of MRI scans tenfold, and Andrew Powell, a cardiologist at the hospital, leads the project’s clinical work.

The work was funded by both Boston Children’s Hospital and by Harvard Catalyst, a consortium aimed at rapidly moving scientific innovation into the clinic.

MRI data consist of a series of cross sections of a three-dimensional object. Like a black-and-white photograph, each cross section has regions of dark and light, and the boundaries between those regions may indicate the edges of anatomical structures. Then again, they may not.

Determining the boundaries between distinct objects in an image is one of the central problems in computer vision, known as “image segmentation.” But general-purpose image-segmentation algorithms aren’t reliable enough to produce the very precise models that surgical planning requires.

Human factors

Typically, the way to make an image-segmentation algorithm more precise is to augment it with a generic model of the object to be segmented. Human hearts, for instance, have chambers and blood vessels that are usually in roughly the same places relative to each other. That anatomical consistency could give a segmentation algorithm a way to weed out improbable conclusions about object boundaries.

The problem with that approach is that many of the cardiac patients at Boston Children’s Hospital require surgery precisely because the anatomy of their hearts is irregular. Inferences from a generic model could obscure the very features that matter most to the surgeon.

Analyzes data from multiple sources

All activity on your social media accounts contributes to your “social graph,” which maps your interconnected online relationships, likes, preferred activities, and affinity for certain brands, among other things.

Now MIT spinout Infinite Analytics is leveraging these social graphs, and other sources of data, for very precise recommendation software that better predicts customers’ buying preferences. Consumers get a more personalized online-buying experience, while e-commerce businesses see more profit, the startup says.

The neat trick behind the software — packaged as a plug-in for websites — is breaking down various “data silos,” isolated data that cannot easily be integrated with other data. Basically, the software merges disparate social media, personal, and product information to rapidly build a user profile and match that user with the right product. The algorithm also follows users’ changing tastes.

Think of the software as a digital salesman, says Chief Technology Officer Purushotham Botla SM ’13, who co-founded Infinite Analytics and co-developed the software with Akash Bhatia MBA ’12. A real-world salesperson will ask consumers questions about their background, financial limits, and preferences to find an affordable and relevant product. “In the online world, we try to do that by looking at all these different data sources,” Botla says.

Launched in 2012, Infinite Analytics has now processed more than 100 million users for 15 clients, including Airbnb, Comcast, and eBay. According to the company, clients have seen around a 25 percent increase in user engagement.

Bhatia says the software also makes online-shopping searches incredibly specific. Users could, for instance, search for products based on color shade, textures, and popularity, among other details. “Someone could go [online] and search for ‘the most trending, 80 percent blue dress,’ and find that product,” Bhatia says.

Dismantling data silos

The two co-founders met and designed the software in course 6.932J (Linked Data Ventures), co-taught by Tim Berners-Lee, the 3Com Founders Professor of Engineering. Berners-Lee later joined Infinite Analytics as an advisor, along with Deb Roy, an associate professor of media arts and sciences, and Erik Brynjolfsson, the Schussel Family Professor of Management Science at the MIT Sloan School of Management.

As a class project, Bhatia and Botla, along with several classmates, designed software meant to dismantle data silos — a major theme in the class. “There’s so much data around us, but all the data is in silos, disconnected,” Botla says. “The goal was to take this data and make it more machine-readable and associate semantic meanings to it.”

But this first prototype wasn’t for finding products — it was for finding people. Looking at social media and other data, the software would determine the best way to reach out to a specific person, whether through mutual friends on LinkedIn or through other online channels. For instance, you could search for and find how to connect with any person who happens to, say, golf at a specific course.

Quickly and cheaply wind farm

When a power company wants to build a new wind farm, it generally hires a consultant to make wind speed measurements at the proposed site for eight to 12 months. Those measurements are correlated with historical data and used to assess the site’s power-generation capacity.

At the International Joint Conference on Artificial Intelligence later this month, MIT researchers will present a new statistical technique that yields better wind-speed predictions than existing techniques do — even when it uses only three months’ worth of data. That could save power companies time and money, particularly in the evaluation of sites for offshore wind farms, where maintaining measurement stations is particularly costly.

“We talked with people in the wind industry, and we found that they were using a very, very simplistic mechanism to estimate the wind resource at a site,” says Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author on the new paper. In particular, Veeramachaneni says, standard practice in the industry is to model correlations in wind-speed data using a so-called Gaussian distribution — the “bell curve” familiar from basic statistics.

“The data here is non-Gaussian; we all know that,” Veeramachaneni says. “You can fit a bell curve to it, but that’s not an accurate representation of the data.”

Typically, a wind energy consultant will find correlations between wind speed measurements at a proposed site and those made, during the same period, at a nearby weather station where records stretch back for decades. On the basis of those correlations, the consultant will adjust the weather station’s historical data to provide an approximation of wind speeds at the new site.

The correlation model is what’s known in statistics as a joint distribution. That means that it represents the probability not only of a particular measurement at one site, but of that measurement’s coincidence with a particular measurement at the other. Wind-industry consultants, Veeramachaneni says, usually characterize that joint distribution as a Gaussian distribution.

Different curves

The first novelty of the model that Veeramachaneni developed with his colleagues — Una-May O’Reilly, a principal research scientist at CSAIL, and Alfredo Cuesta-Infante of the Universidad Rey Juan Carlos in Madrid — is that it can factor in data from more than one weather station. In some of their analyses, the researchers used data from 15 or more other sites.

But its main advantage is that it’s not restricted to Gaussian probability distributions. Moreover, it can use different types of distributions to characterize data from different sites, and it can combine them in different ways. It can even use so-called nonparametric distributions, in which the data are described not by a mathematical function, but by a collection of samples, much the way a digital music file consists of discrete samples of a continuous sound wave.

Another aspect of the model is that it can find nonlinear correlations between data sets. Standard regression analysis, of the type commonly used in the wind industry, identifies the straight line that best approximates a scattering of data points, according to some distance measure. But often, a curved line would offer a better approximation. The researchers’ model allows for that possibility.

Human annotation of training data

Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. Depending on how you count, English has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.

In the 2015 volume of Transactions of the Association for Computational Linguistics, MIT researchers describe a new machine-learning system that, like several systems before it, can learn to distinguish spoken words. But unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.

As such, it could aid in the development of speech-processing systems for languages that are not widely spoken and don’t have the benefit of decades of linguistic research on their phonetic systems. It could also help make speech-processing systems more portable, since information about lower-level phonetic units could help iron out distinctions between different speakers’ pronunciations.

Unlike the machine-learning systems that led to, say, the speech recognition algorithms on today’s smartphones, the MIT researchers’ system is unsupervised, which means it acts directly on raw speech files: It doesn’t depend on the laborious hand-annotation of its training data by human experts. So it could prove much easier to extend to new sets of training data and new languages.

Finally, the system could offer some insights into human speech acquisition. “When children learn a language, they don’t learn how to write first,” says Chia-ying Lee, who completed her PhD in computer science and engineering at MIT last year and is first author on the paper. “They just learn the language directly from speech. By looking at patterns, they can figure out the structures of language. That’s pretty much what our paper tries to do.”

Lee is joined on the paper by her former thesis advisor, Jim Glass, a senior research scientist at the Computer Science and Artificial Intelligence Laboratory and head of the Spoken Language Systems Group, and Timothy O’Donnell, a postdoc in the MIT Department of Brain and Cognitive Sciences.

Shaping up

Since the researchers’ system doesn’t require annotation of the data on which it’s trained, it needs to make a few assumptions about the structure of the data in order to draw coherent conclusions. One is that the frequency with which words occur in speech follows a standard distribution known as a power-law distribution, which means that a small number of words will occur very frequently but that the majority of words occur infrequently — the statistical phenomenon of the “long tail.” The exact parameters of that distribution — its maximum value and the rate at which it tails off — are unknown, but its general shape is assumed.

The key to the system’s performance, however, is what Lee describes as a “noisy-channel” model of phonetic variability. English may have fewer than 50 phonemes, but any given phoneme may correspond to a wide range of sounds, even in the speech of a single person. For example, Lee says, “depending on whether ‘t’ is at the beginning of the word or the end of the word, it may have a different phonetic realization.”

To model this phenomenon, the researchers borrowed a notion from communication theory. They treat an audio signal as if it were a sequence of perfectly regular phonemes that had been sent through a noisy channel — one subject to some corrupting influence. The goal of the machine-learning system is then to learn the statistical correlations between the “received” sound — the one that may have been corrupted by noise — and the associated phoneme. A given sound, for instance, may have an 85 percent chance of corresponding to the ‘t’ phoneme but a 15 percent chance of corresponding to a ‘d’ phoneme.