When Israel invaded the Gaza Strip in July, I turned a profit of $146.55. That same month, I made another $152.36 by successfully predicting that Joko Widodo would be elected president of Indonesia. And when Congress failed to approve President Obama’s request for $3.7 billion to deal with unaccompanied child migrants at the border, I collected a tidy $293.96.
The events were real, of course, but alas, the payouts were not. My earnings were phony digital currency awarded by the website Inkling Markets, a public “prediction market” in which people use fake cash to buy fake shares in the outcomes of various developments in world news, politics, sports, and entertainment. Guess correctly often enough and you watch your screen name rise up the site’s leaderboard.
But if the shares have no monetary value, the results, collected and analyzed by Inkling, increasingly do. Mass-prediction models such as Inkling’s have existed as long as the internet has. For years, Intrade, a platform that did accept legal tender, was a favorite among politicos for besting pollsters in predicting vote totals, until it suspended trading in 2013 because of regulatory issues.
In recent years, as Americans spend more and more time on the internet, major corporations have increasingly turned to online markets to gauge bestselling products and choose which pharmaceuticals to develop. And now, even as public unease rises about the government’s prowling around our digital lives, the most shadowy federal agencies are asking what they can find out simply by soliciting our collective opinion.
Inkling Markets is the brainchild of Adam Siegel and Nathan Kontny, former consultants for Accenture. (Kontny is now CEO of Highrise, a business networking site.) Since its founding in 2006, Inkling’s clients have included Fortune 500 companies such as Chevron, Ford, and Lockheed Martin, the Defense Department, and several unspecified agencies in the US intelligence community.
A political-science major at Indiana University, Siegel, now 41, began informally studying user-interface design—how humans relate to computers—and became, as he puts it, “a hack” at front-end software development. As his consultant work expanded, he often found that executives weren’t incorporating enough feedback from their employees in decisions. He came to believe in prediction markets as a novel way of trying to attack the problem.
Inkling caught a break early on when it was contacted by officials at Intelligence Advanced Research Projects Activity (IARPA), the research arm of the Office of the Director of National Intelligence (ODNI).
The first meeting was hardly the stuff of spy novels. Siegel—who splits his time between Washington and Chicago—got a call from the Mitre Corporation, a nonprofit that manages research-and-development centers for the Defense Department and other agencies, inviting him to its offices in McLean. But if the initial meeting was “nothing that sexy,” Siegel won’t reveal what was discussed or how Inkling software might be applied for national-security purposes.
“I can’t talk about it,” he says. “I can’t confirm that it’s even happening.”
The US spent more than $48 billion in 2014 to finance the National Intelligence Program, which funds the conglomeration of three-letter agencies that includes the CIA, NSA, and FBI. These agencies’ analysts collect information from many sources—human contacts, computer networks, satellite imagery—and draw conclusions that go into purportedly objective, apolitical assessments presented to military commanders and State Department officials across the world, all the way up to the President in his daily briefing.
The problem, says Jason Matheny, who started a program at IARPA called Aggregative Contingent Estimation, is that “people who have the traditional markers of expertise are typically not the most accurate forecasters.” Experts, Matheny explains, are prone to highlight information that confirms their experience. This tendency is called “the paradox of expertise.”
A more troubling pitfall is our natural reluctance to give our bosses bad news. “There’s a culture in government organizations and in companies where it is just not welcome to report bad information,” says Siegel.
Recent history has only amplified the human static in the system. Intelligence agencies are “in a quite awkward situation,” says Barbara Mellers, a University of Pennsylvania expert in decision-making who works with IARPA. “When they under-connect the dots, they get extreme death, like 9/11. That can change their threshold, so that they over-connect the dots, like with weapons of mass destruction [in Iraq], and they get slapped back in the other direction.”
No matter how impartial analysts strive to be, in other words, human frailty and outright bias inevitably cloud their judgments.
In 2010, IARPA and Matheny put out a call for academic researchers to help improve “accuracy of judgment-based forecasts” by “mathematically aggregating many independent judgments.” Mellers, fellow UPenn psychologist Philip Tetlock, and Berkeley organizational-behavior scientist Don Moore responded with a proposal that won them the job. With funding from ODNI, they created a forecasting tournament called the Good Judgment Project. Roughly 3,000 amateur forecasters were recruited through listservs, social media, and the project’s own website to compete against one another over the course of a year in a forecasting tournament.
In the program’s second year, the most accurate 2 percent of Good Judgment players, dubbed “super-forecasters,” were grouped into teams that pooled their knowledge. Their predictions were roughly 30 percent more accurate than those of their counterparts in the agency, Mellers heard—despite the fact that the super-forecasters weren’t using classified information.
What sets successful amateur forecasters apart, according to Matheny, is “a need to challenge your own beliefs and update beliefs frequently,” a process he says is less common among experts. Good Judgment Project participants tend to be political-news junkies who “find things on the internet that are obscure and interesting and wonderful,” adds Mellers.
The Good Judgment Project doesn’t just identify super-forecasters—it cultivates them. Some participants receive training in “probabilistic reasoning.” Mellers says the best treat it as “a skill to be learned.”
One of the program’s most successful guinea pigs is Jay Ulfelder. The 45-year-old Takoma Park resident has worked with the CIA-funded Political Instability Task Force, which assesses the risk of state failures around the globe, and he currently leads a project with the US Holocaust Memorial Museum’s Center for the Prevention of Genocide to build an “early-warning system” for mass atrocities. Ulfelder’s expert background makes him different from many of the other Good Judgment forecasters, but his curiosity, along with a somewhat jaundiced attitude toward his own analysis, is in line with the program’s principles. Ulfelder, who writes a blog titled Dart-Throwing Chimp, cautions that what he does isn’t an exact science. Of the 30 nations he might say are at high risk of a coup, he explains, only a handful may actually try to overthrow their governments in a given year: “That would be a relatively successful model in our field. In other fields, people would look at that and go, ‘Wow, that’s terrible.’ ”
Ulfelder says his goal in analyzing something like genocide is not to predict events definitively but to raise awareness in order to get responsible parties talking about “things we can do to mitigate the risk.”
Michael Birmingham, a spokesman for ODNI, clarifies that analysis is different from making predictions. Rather, analysts use “estimative language” to describe the likelihood of a certain outcome, given the available data. “A prediction is a certain definitive statement about something happening at a certain place and time,” says Birmingham. “That’s not necessarily what analysts do. It’s not like picking the winner of the Preakness on Saturday in Race 10.”
But if intelligence analysts are just guessing, the question remains: How do we guess better?
The basic theory behind crowd-sourcing predictions dates to 1906, when a British scholar named Francis Galton stumbled on an idea later dubbed “the wisdom of crowds.” Legend has it that Galton was attending a livestock fair that featured a prize for the person who could come closest to guessing the weight of a dead ox. Galton tallied the average guess of each person and estimated that the carcass would tip the scales at 1,197 pounds. His aggregated estimate was supposedly just one pound shy of the correct weight.
That’s essentially what happens at Inkling Markets. Click the “predictions” tab at the top of the website’s home page and a lengthy list of questions appears asking users to buy into the outcomes of sporting events, elections, economic trends, and weather. With 50,000 registered users, the public site sees anywhere from 750 to 1,000 “trades,” or forecasts, a day. Inkling’s corporate clients use a more sophisticated version of the software to poll employees about sales forecasts and other business questions. SciCast, another IARPA-funded project, run by George Mason University, uses a model similar to Inkling’s to forecast developments in science and technology. SciCast has a mere 200 active users in a given month, according to project leader Charles Twardy, and the top 15 of those carry one-third of the site’s activity.
The lack of mass appeal is a persistent problem for crowd-sourcing sites, which typically require large numbers to yield accurate forecasts. “The whole thing hinges on participation,” says Sean J. Taylor, a member of Facebook’s data-science team and a creator of the public forecasting site Creds. “You need a lot of people for a market to function efficiently.”
The Good Judgment Project’s results suggest that its super-forecaster model solves this glitch—a small team of trained forecasters seems to outperform the mob every time. Ulfelder points out that Galton’s dead-ox fable is valid but can be misinterpreted, arguing that the “wisdom of the crowd” works best when the crowd knows what it’s talking about.
“If you asked a bunch of people from DC to do that today, they’d be terrible because they have no frame of reference,” Ulfelder says. “A bunch of people in rural England are going to know about animal husbandry.”
The flip side to his argument is the philosophy that underpins sites such as Inkling Markets and Creds: If enough users participate, they naturally gravitate toward topics that suit their knowledge and interests.
This was true of my predictions. As a journalist who follows world news and domestic politics closely, I was drawn to make bets on those topics while avoiding questions about the intricacies of business, technology, and science. A few news stories about subsiding tensions between Japan and China convinced me to wager heavily against a statement posted on Creds that “the Chinese military will open fire on a Japanese vehicle (boat, plane, etc.) in the Chinese ADIZ (Air Defense Identification Zone) by June 27, 2014.” I enjoyed a stint atop the site’s leaderboard when China held its fire.
My reign was brief: It’s hard to maintain one’s spot on the board because one is rarely alone in being right. The majority of fellow respondents also thought violence wouldn’t erupt over the ADIZ. In fact, after making several dozen forecasts over a three-month span, I was impressed how often the collective wisdom got things right on obscure topics.
“One of the big bets we made with Creds was if we could create the right system, where people would self-select about certain kinds of statements, we might get the best of both worlds,” Taylor says. “The Jay Ulfelders of the world show up and bet on statements they’re best suited to bet on.”
However, I also wagered on sports questions, typically with disastrous results. My Inkling Markets picks on the World Cup, NBA finals, and Triple Crown horseraces were all incorrect. The crowd appeared to be less accurate here, too: Brazil was picked to win in the World Cup right up until they were annihilated, 7-1, by Germany. Fandom, it might be surmised, is akin to expertise—it drives people to cling to their beliefs. On the other hand, the best data and objective analysis can predict a win for teams that suffer shocking upsets. As the adage goes, that’s why they play the games.
“We know there are limits,” IARPA’s Matheny says. “There are some events we can’t accurately forecast because they’re effectively random.”
The scientific theory behind prediction markets goes back to the 19th century, but their practical power to tap collective hunches was vastly increased by the internet. Today governments, universities, and corporations use websites like these to crowd-source their intelligence.
Created by a current Facebook employee*, Creds asks its relatively small number of active users to click simple plus and minus buttons to rate the likelihood of events in politics, culture, news, and sports to generate scores of 1 to 100. Predictions that go against the grain are the most valuable, vaulting newbies to the top of the user rankings.
Teams compete annually in a months-long tournament to predict events in the spheres of national security, global affairs, and economics. Anyone can apply to take part in this federally funded experiment, but the competition requires a substantial time commitment.
Using a stock-market model similar to Inkling Markets, this project from George Mason University focuses on science, technology, and the environment. Geared toward serious experts, SciCast can be intimidating but occasionally touches on accessible topics such as flu season, hurricanes, and space.
Part corporate consulting tool, part addictive time-suck, this slick, free site is one of the web’s most popular forecasting games. Users trade virtual stocks in future events such as the success of Hillary Clinton’s presidential bid and which NFL coach will be fired first.
Keegan Hamilton (firstname.lastname@example.org) is a journalist in New York City who covers drugs, crime, and conflict.
*A version of this article incorrectly stated that Creds was created by a former Facebook employee.
This article appears in the January 2015 issue of Washingtonian.