Price was very critical of casualties’ maps being used more and more by the press to get attention and by aid agencies to get funds. These are based on raw data, and are likely to give completely misleading information. “You are not obliged to put whatever data you have on a map. Maybe you don’t need a map to tell the story” she observed. “I would like to see more education in the activist groups and more commitment on the academic side to speak out that certain things are wrong,” she added.
“We live in a world that is obsessed with Big Data’s promises of salvation for human life, but it’s important to introduce statistics in Big Data,” commented Abdul Wasai, a PhD student in data systems at the Harvard School of Science and Engineering who reported to the plenary on Price’s workshop. “The challenge is: preventing data and technology to tell incorrect stories,” he concluded.
Towards a black box society
Big Data is increasingly fed into software that makes decisions about people and can even nudge their behaviour in one direction or another. Mistaken decisions and discrimination can be prevented with more transparency and regulation of algorithmic black boxes, taking examples from public health.
Scientific Director and leader of the Data Science Laboratory at the ISI Foundation (Italy). His research focuses on behavioural social networks, digital epidemiology, online social networks and web science. He also leads SocioPatterns, a research project aimed at measuring and mapping human spatial behaviour.
“Q: Can I drive late at night with the Smartbox installed? A: You can drive at anytime of the night. However if you drive a lot between 11pm and 6am you may find that your night-time score is negatively affected. […] You can still […] get […] a discount providing that all your other scores are very good.”
The above is a real Q&A from a website of an insurance company, that installs a GPS box in cars to monitor driving behaviour and charge the driver accordingly. It seems a smart way to calculate insurance fees by means of data and software. However, the company admits that their algorithm can negatively rate driving at night, a minority behaviour, although a completely innocent one, for people like night guards or DJs.
“We are increasingly tracked by black boxes like mobile phones, fitness trackers, smart watches… And we are moving towards the Internet of Things and Smart Cities, with sensors being planted everywhere. The places we visit, the people we interact with, what we like, what we say, how we behave, are being tracked. Wearable devices, in the near future, will collect and store an ever increasing amount of bio-medical signals and data,” said Ciro Cattuto in his workshop.
It’s easy to de-anonymize these data and it’s difficult to trace how they are shared with third parties, for example through ad networks. However, this is just the tip of the iceberg, according to Cattuto. His concerns focus on a second, less material black box, through which many of these data are passed. “They are fed into machine learning black boxes: algorithms that categorize us under certain profiles, assign us a score on a scale, make predictions and even make decisions on us,” he explained. The amount of money to charge a person for car insurance, whether to grant a credit to a person or not, of even the likelihood of a person committing a crime are already being calculated through algorithmic black boxes.
On occasion, this software can even provide feedback to the users, in such a way that they may push their behaviour in one direction or another, what is known as “behavioural nudging”. “Even in standard marketing, sociodemographic data are used to segment and target the population. But now we have more behavioural signals and we can segment in a higher dimensional space,” Cattuto explained.
“We are having more and more data intensive, algorithmic decision making in many products and service,” Cattuto insisted. But as the car insurance boxes example shows, things can sometimes go wrong. “The fact that decisions are taken from a machine does not mean that they are unbiased. For example, they can be biased in towards the features of the representative sample used to build the original model,” Cattuto pointed out. “Or if they learn from data, they are most of the times biased towards majority groups. Their predictions may be considered accurate, because errors in minority groups are by definition negligible in quantitative terms,” Cattuto observed.
This problem does not have an easy solution. One could try to correct the bias by digging into data for hidden parameters that identify minority groups. However, beyond the fact that minorities’ boundaries may be fuzzy, this technique may expose sensitive demographics. “[We need to] ensure that by using big data algorithms [firms] are not accidentally classifying people based on categories that society has decided— by law or ethics— not to use, such as race, ethnic background, gender, and sexual orientation,” said Cattuto, reading a quote from Edith Ramirez, chair of the US Federal Trade Commission.
A different approach is increasing black boxes’ transparency. That is, making explicit to users what parameters they are scored upon and how they can improve their score. But also this approach has pitfalls. “Transparency may increase competition among users, help cheaters to deceive the system, and hamper competition among companies,” Cattuto noted.
Inspiration to sort out this complicated problem can be found in public health, according to Cattuto. This discipline has a long tradition of making decisions based on sensitive human data, with complex behavioural feedback. “The public health community has in their culture the fact that they are dealing with humans,” he remarked. In addition, Cattuto thinks that regulation is essential. “It is too optimistic to think that industry will self-regulate to protect user’s data. We need privacy by default, rather than privacy by design, and this must be enforced top-down by strong regulation. Privacy is a constitutional right,” he said.