Big Data is a Weapon of Math Destruction – Ep. 12

Big Data is a Weapon of Math Destruction – Ep. 12

Show Notes:

Episode 12
Big Data is a Weapon of Math Destruction
15 August 2017

Big data is bandied about as the silver bullet to countless problems.

From helping us make smarter business decisions through to choosing which suburb we should live in, big data is a catch-all solution to issues around the world.

Mix in a little artificial intelligence and Bob’s your uncle, right?

Not necessarily.

Cathy O’Neil, an American mathematician, data scientist, Ted Talk speaker and author (her latest book is called Weapons of Math Destruction), argues big data contains more human bias than we care to admit. She warns that it could be toxic to our very democracy.

More information:
Host: Mike Lynch
Guest Co-host: Andy McLean

Guests:
Cathy O’Neil

Links:

Weapons of Math Destruction

https://mathbabe.org/

Tech Bootcamp Online Conference:
Are you prepared for the future of finance? Discover the possibilities that data and technology can bring to your business.

Sponsor:

Accountancy Insurance Australia: https://www.accountancyinsurance.com.au/products-services/audit-shield

Accountancy Insurance New Zealand: https://www.accountancyinsurance.co.nz/products-services

Mike Lynch:                        Welcome to Episode 12 of the Acuity Magazine Podcast, Big Data is a Weapon of Math Destruction. This episode is sponsored by Accountancy Insurance, providers of Audit Shield, the preeminent tax audit insurance solution for accountants in Australia and New Zealand. I’m your host, Mike Lynch.

Andy McLean:                   And I’m your co-host, Andy McLean. Hello. Big data is bandied around as the silver bullet [00:00:30] to countless problems from helping us to make smarter business decisions through to choosing which suburb we should live in. Big data is the catch cry that’s heard all around the world and if you stir in a little bit of artificial intelligence, well, Bob’s your uncle, right? Not necessarily.

Cathy O’Neil, an American mathematician, data scientist and TED Talk speaker in her latest book, Weapons of Math Destruction, argues big data contains more human bias than we care to admit [00:01:00] and that’s leading to a toxic cocktail for our very democracy. She joins us on the line now.

Mike Lynch:                        Cathy, thanks for joining us on the Acuity Podcast.

Cathy O’Neil:                      My pleasure.

Mike Lynch:                        Perhaps we should start with defining what we mean when we talk about big data.

Cathy O’Neil:                      It really depends on who you are when you’re trying to answer that question. If you’re a technologist, then big data probably is actually like a question of how do you deal with terabytes of data so you’re talking [00:01:30] about doing computations on the cloud and crunching numbers overnight and batch jobs and stuff like that, but for the most part, that’s not what we mean when we talk about big data. Instead, we really mean what kind of insights we’re expected to glean from massive data that is sort of incidentally collected.

What we typically try to do is we try to use proxies like how long you spend on a web page to figure out insights, which are things like how interested you are on a product. [00:02:00] It’s not an old fashioned sort of statistical concept where you go up and ask somebody, “Are you interested in this product?” Rather you sort of try to use the data that is collectible and easy to gather to infer something that you actually want to know. The promise of big data if you will is that we have so much incidental data now collected about us that the people who are good at big data techniques can infer all sorts of things about us.

Andy McLean:                   [00:02:30] With so much data sloshing about connected to just about everything, how can we be sure that the information we are seeing is truly accurate and reliable?

Cathy O’Neil:                      Well, we can’t. That’s exactly, sort of almost the point, right, of this promise that we’re taking stuff that is not directly relevant to what we want to know and we’re inferring things that we want to know. Often, the things that we actually want to know are really hard to measure directly. [00:03:00] They’re very expensive if you will. It’s really expensive to go around to all consumers and asking them if they’re interested in a specific brand. So we essentially have given up on that, at least, for some kinds of brands, maybe not for enormous international brands like Coca-Cola, you can probably still do that, but if you’re talking about niche brands, those niche brands are never going to do that kind of direct pulling. So they’re only going to have this indirect stuff.

Likewise, we sort of have given up the concept [00:03:30] in a larger sense of getting direct information about things. So instead, we rely on these proxies. So, for example, political campaigns, they do focus groups where they actually ask specific people that they’ve brought in and paid for their time. They ask them like, “What do you think of this candidate? What would you think if the candidate said this?” For that group of 25 people, they actually have direct information, but then, they sort of put those 25 people in marketing [00:04:00] silos and they often assume that other people in the same marketing silos will have the same opinions. That’s a very indirect way of inferring people’s opinions about things, but that is the way big data works.

Andy McLean:                   Because it’s humans that are inputting the data, isn’t it? We’re human. We’re prone to mistakes. How correct is the data and are we asking for trouble to rely solely on this information?

Cathy O’Neil:                      It’s very important, this question. Is accuracy [00:04:30] an issue? Accuracy is absolutely an issue, right? But at the same time, it’s still worth it. In other words, these are vastly inaccurate approaches to understanding people’s opinions. There’s really not very good reason to think that just because I like the same websites as someone else that I will have the same political opinions of that person, and yet it is actually a better [00:05:00] guess about my political opinions than what we had before, which was really nothing or if you will, it was simply demographics like how old am I, what gender am I, where do I live. So now what we have is we have demographics, of course, but we also have browsing history and we also have consumer behaviour and it gets much, much better as a wild guess.

I think instead of thinking about accuracy for most big data algorithms, you should think about the wild guesses getting less [00:05:30] wild and if you will, I would like to strike an analogy from when I was a quant in finance. I was a hedge fund quant like trying to predict the futures market. The futures market is heavily traded, enormously fast trades at all times when it’s open anyway. When we build the trading algorithm, almost half the time, a successful trading algorithm will be wrong. It’s wildly inaccurate, but it doesn’t have [00:06:00] to be 99% accurate for it to make money. It only has to be 51% accurate. It has to be right a little bit more than half the time.

I think that’s the kind of metaphor you should think about when you’re talking about dealing with consumers. We’re not trying to get perfect accuracy. We will never know exactly what this person wants to buy, but we’ll have a better guess than nothing and as we get more and more information about people and we have more and more history of people like you doing things like this, then [00:06:30] we’ll be becoming less and less wild guesses and more and more accurate, but it’ll never approach 99% accuracy ever.

Mike Lynch:                        Can you give me some examples where you’ve seen the inaccuracy in data go horribly wrong?

Cathy O’Neil:                      When you’re talking about big data as relatively inaccurate, it’s already a scary thing to think that it’s being applied in places where we might assume very high standards and that’s kind of exactly what’s happened. So things [00:07:00] like hiring people for jobs or deciding how long they should go to prison or deciding whether a given person is good at their job and whether they deserve to get fired, how much people should pay for insurance, all those things, and as I mentioned already, political opinions, all those things have been actually applied to real world situations. I think of those as high stake situations and when these algorithms are being applied to hundreds, [00:07:30] if not thousands, of people in high stakes situations, I think everyone deserves to assume a sort of standard of accuracy, but that is not what’s happened.

For example, there’s been a sort of national attempt to improve education in the US for the last couple of presidencies, not including Trump because he doesn’t seem to be focused on that, but a few presidencies before that, there were these calls for [00:08:00] accountability for education and it sort of fell to the teacher, at the teacher level, like let’s make teachers accountable. The idea was that teachers are ruining education, bad teachers are ruining education so, let’s find those bad teachers and let’s get rid of them. They found the bad teachers using an algorithm that was based solely on student test scores.

Now a given teacher, by the way, would get a score between 0 [00:08:30] and 100 at the end of the year for how they had done that year and it was based on like basically how much did they raise these students’ test scores for the students in their class beyond what was expected for those students.

The underlying model, what was the expected score for a student was essentially an algorithm, a big data algorithm that relied on things like what school were [00:09:00] they in, how many kids were in its classroom, what did they get last year at the end of the year for the test scores. It wasn’t accurate and, therefore, the teacher’s score, like the question of how much did they raise these students’ scores above what was expected or below what was expected, they were also inaccurate and actually almost a random number generator in fact. And this is the thing we haven’t talked about yet, but because people are afraid of math, people are intimidated [00:09:30] by math, people are intimidated by big data algorithms, the teachers themselves, they were not given the power to push back on their scores, even though they would get wildly different scores from year to year or even in the same year for two different classes.

Moreover, again, it was a high stakes situation so some of the teachers were denied tenure based on bad scores. Other teachers were even fired for bad scores. They weren’t [00:10:00] explained what the algorithm was doing, how it worked. They couldn’t even appeal if they thought they had the wrong score. It was an absolutely opaque system with power behind it and, again, it was all based on this primary algorithm that we have no reason to think was accurate, trying to guess a student’s score at the end of the year based on information that they knew a year before that. Altogether, we see a high stakes situation. By the way, I should mention that this [00:10:30] is happening in more than half the states in the United States, mostly in urban school districts.

Long story short, the attempt to get improved education by getting rid of bad teachers actually had the opposite effect because instead of the bad teachers leaving, they randomly sort of chose teachers to target and because good teachers don’t want to live in a system, don’t want to work in a system where random colleagues, [00:11:00] possibly them, are fired for no good reason, many of the best teachers in those systems have actually left instead of the worst teachers. We’ve had the opposite effect that we’ve actually been going for.

Andy McLean:                   It’s ironic, isn’t it? When we talk about big data solutions from companies like Microsoft and IBM by way of example, there’s a high level of confidence that’s kind of assumed about these products. It’s a silver bullet. It sounds like it’s actually a bit of a false sense of security.

Cathy O’Neil:                      [00:11:30] Yeah, that’s exactly right. It’s a silver bullet. I would go further. I would say that after observing many, many of these very flawed, very powerful algorithms destroying what they’re attempting to do, undermining their own goals, after seeing so many examples that I have, I would go further … It’s not only just that there’s a silver bullet that everybody’s glad to see, but it’s a silver bullet that replaces a difficult conversation.

Instead of actually having a conversation about what [00:12:00] makes a teacher a good teacher, like what do they actually want in a teacher, they say, “We don’t have to think about that difficult conversation anymore since we have trouble agreeing on that question. Instead, we’re just going to replace this entire conversation with a silver bullet algorithm and we’ll be able to stop thinking because we could just hand the reins over to an opaque, secret algorithm that no one understands.” It’s almost like they want to do that.

Let me give you [00:12:30] another example to give you a flavour for what this looks like in a corporate setting. There was a personality test that this young man named [Kyle Beam 00:12:40] took and failed to get a job at a grocery store in Atlanta, Georgia. The personality test … By the way, I should mention that 70% of job seekers in the states have to take personality tests. This is incredibly widespread and the same personality test might be used in many, many different chain stores. It’s very [00:13:00] scaled as well. There aren’t that many personality test makers. It’s used extensively. Actually, Kyle took this test at seven different places in Atlanta, Georgia area. He failed all of them.

Now the next wrinkle in the story is that Kyle actually has bipolar disorder and when he took the test, he realised that many of the questions of the personality test were akin or close to the same questions that he received in the hospital when he was being treated as a mental health assessment. It’s called the five-factor [00:13:30] model. That’s actually illegal under the American Disability Act. It’s illegal to force someone to take a health exam, including a mental health exam as part of a hiring process.

His father who’s a lawyer actually filed class action lawsuits against all seven of these companies on the basis that they were violating the ADA rights. That’s an incredible story and it’s still in the courts. It’s passed the first hurdle, which is [00:14:00] that there was no business reason to ask these questions and the next question is whether it was actually filtering out people with mental health status.

The larger question, the larger point I was trying to make is why did Kroger grocery store and those seven other big chain stores in Atlanta, why did they choose to use this algorithm? There’s a couple of really simple reasons. One is that it saves time. Another one is that it saves money, lots of money, right. They don’t have to hire a bunch of HR [00:14:30] people to interview people, to interview the applicants.

There’s time and money, but then there’s another question, which is why didn’t they make sure that these algorithms were legal? There’s all sorts of regulations on the kinds of discriminations you’re not allowed to do and Kroger’s grocery … Let’s say it this way, their corporate lawyers were certainly aware that when you take a highly regulated procedure like a process like hiring and then, you replace it with an algorithm [00:15:00] that they should’ve been careful about to make sure that this new process was also legal, but they didn’t do it. My feeling is that they just didn’t want to think about an algorithm being illegal. Not that they thought it would be impossible, but they assumed that if it happened, that they would sort of be able to claim plausible deniability. It’s just assumed that the algorithm was legal and fair.

Now when [Roland Beam 00:15:25], the father, when he filed these class action lawsuits, he got [00:15:30] emails and phone calls from these corporate lawyers explaining that they didn’t have liability here, which is a false statement. Their argument was that they had, some of them anyway argued that they had actually signed indemnification contracts with the big data company that built the algorithm. Now, that’s actually a sign that they knew that there was an issue here, right. They signed this indemnification contract, which meant that any kinds of costs related to unfairness or discrimination [00:16:00] or any illegalities would be borne by the big data company, but the big data company, of course, even though it’s called a big data company, it’s a small company. It couldn’t possibly pay out the kinds of money that the courts might be charging these big companies for using illegal hiring processes.

Anyway, long story short, I feel like algorithms, big data algorithms, although they’re extremely flawed and inaccurate, they’re being used in these high stakes decisions [00:16:30] by people in high stakes decision systems, contexts. They’re being used by people who really don’t want to think about what could go wrong for whatever reason. They just don’t feel like they are going to be responsible. That brings up the sort of most important question of all, like who holds these algorithms accountable? How can we as a society make sure that these algorithms are held accountable?

Andy McLean:                   A couple of things spring to mind when you cite these examples. Could you touch on [00:17:00] the issue of privacy? Does the age of big data mean we’re completely exposed?

Cathy O’Neil:                      Yeah, there’s lots of privacy issues. As a teacher, you don’t have privacy in your classroom. There’s no privacy rights. As somebody applying for a job, you don’t have privacy rights. If you don’t answer the questions that you’re asked, then you’re not going to get the job. Now, of course, there are rules like I just discussed. You’re not allowed to be given a health exam, but you’re still allowed to be given [00:17:30] a questionnaire and the questionnaire can ask all sorts of questions and [like you say 00:17:34], in the age of big data, questionnaires that seem irrelevant can actually end up allowing people to infer things like health status.

I should mention last example, we didn’t talk about this yet, but with respect to recidivism risk algorithms, which is an algorithm scoring system that judges use to decide whether defendants should go to [00:18:00] prison for longer or shorter, those are also contexts where the defendant has no privacy rights. Now, there are also examples in my book where it is a question of privacy and there are examples of terrible algorithms that would not be able to exist in other contexts like in Europe where they have much stricter data privacy, online data privacy laws.

I think political micro-targeting is a good example of that like profiling [00:18:30] that’s used, the extensive profiling and consumer behaviour that’s being used to decide how you would you vote for different candidates. I think that kind of thing doesn’t happen in other countries that have better privacy laws. But I think the long term privacy questions are very, very important.

I wrote in Bloomberg View recently about what I worry about with future health risk scores, future health risk scores. Now, this is not the same thing as a preexisting [00:19:00] condition where you already have diabetes and you’re looking for healthcare. As you know, the US is in turmoil over the healthcare debate, but what I worry about with big data ,and it’s not just for the US citizens, like for everyone in the world, like to what extent are your consumer behaviours, are your online behaviours, are your demographics, what kind of job you have, your education level, your financials, are those data implying what [00:19:30] your future health will look like? Who’s going to get sick in the future?

I’m not talking about DNA data. I’m not talking about actual medical data that is protected by law because of privacy that your doctor has about you. I’m talking about stuff that is incidental data. I think the power of big data is that it will eventually become very good at inferring who is going to be sick and who is going to stay healthy and, in particular, who’s going to cost a lot [00:20:00] for insurance companies. Those insurance companies, that’s their job is to separate people into high expense and low expense. In the context we have of no universal healthcare, this is going to be a real problem for us.

Andy McLean:                   Now this year and last year, we’ve heard a lot about big data being used to get votes in political elections. I’m thinking about Cambridge Analytica in particular. How concerned should we be about this?

Cathy O’Neil:                      I do think political micro targeting is potentially a very [00:20:30] dangerous tool because it’s a propaganda tool, right, but if you look at it that way, I would say that in this country, like Fox News was just as powerful as any political campaign at streaming and disseminating propaganda as well as all the fake news stuff that wasn’t Cambridge Analytica at all.

So in the context of propaganda though, I do think that political targeting, micro targeting campaigns have an enormous amount of power [00:21:00] and it’s going to get stronger as dark money comes in and they send all whatever they want, literally whatever they want to individual voters. The point here is that it’s not informative, right. It’s not like they’re sending information, “Oh, here’s how your candidate’s going to vote on this issue.” It’s much more emotional.

I think the worst example, which I do think maybe had something [00:21:30] to do with Cambridge Analytica of this kind of emotional cue-ing was that they actually … and they bragged about this, which is the only reason we know about it, they sent out voter suppression ads right before the election. They actually sent out … There’s a well-established history of get the vote out campaigns, like trying to convince people who are on your side to [00:22:00] go to the polls and actually vote. Get out the vote, it’s called, but now that we have information on all voters, and everybody has information on all voters, we also know who’s likely not to vote for you.

This new dark campaign, which happened on Facebook, dark ads on Facebook, we can’t see them, Facebook will not show them to us, the campaign will not show them to us, were sent out specifically to African Americans to convince them not to vote at all and we can’t see the ads. [00:22:30] The word on the street is that these voter suppression ads had something to do with Clinton and it was in the style of South Park and it was supposed to be like a funny, but anti-Clinton, I feel like that really undermines democracy, right. We don’t know if it’s even correct information, right. We don’t even know if there was information in it. All we know is that these were intended to keep people from voting, to get people depressed [00:23:00] and for that matter, we don’t even need it to be about the opposing candidate.

Maybe people will find out in the future that it’s effective to just make people depressed, period, to keep them from voting, that they just won’t have that extra bit of spark and energy to get out and go vote. Maybe you could just send the messages that they look fat today. It’s like I don’t know because I’m not an expert in propaganda, but I do know that what we’ve built, this internet thing, [00:23:30] especially the Facebook part of the internet is a perfect mechanism to deliver propaganda and that’s a threat to democracy.

Andy McLean:                   This relates to the idea in your book about how big data can be weaponized, which sounds very dramatic. Can you expand a little bit on that theme for us?

Cathy O’Neil:                      Right. I will tell you the story of how I learned about the teacher value added model, which is I learned it from a good friend of mine who’s a principal of a high school in Brooklyn, New York. [00:24:00] She’s known me since college and so she knew I was a mathematician so she said, “Hey, my teachers are being scored by this newfangled algorithm that I don’t understand and my teachers don’t understand. Can you help?” I said, “Why don’t you get me the formula and I’ll look at it?” I figured it was just a simple formula I could look at and explain to her in simple English. She said, “I asked for that formula, but [00:24:30] my Department of Education contact told me that it was math and I wouldn’t understand it.”

Of course, as a mathematician, that’s the last thing you want to hear that math is being brandished as a weapon, as an authority of the inscrutable, Like “It’s so complicated that you couldn’t possibly understand it so just trust us.” That’s what I mean by weaponization. You’re sort of flashing your badge, your math badge like, “It’s math, you can trust and you [00:25:00] should sure as hell be intimidated by us.”

I see that in all sorts of ways. Normally, it’s just slick marketing that says, “Oh, we’re using big data techniques and they’re really sophisticated and they have to do with earthquake predictions and so you can trust us because it is objective and unbiased because it’s mathematical.” Now that is, of course, not true. There’s no inherent reason for a mathematical algorithm to be unbiased [00:25:30] or fair, but it’s something that people are intimidated enough by that they don’t ask any further questions.

Mike Lynch:                        In order to avoid these types of scenarios where there is no transparency, what measures need to be put in place to ensure that data is more accurate and accessible?

Cathy O’Neil:                      The short answer to that is that we don’t really have tools yet to do this, but the longer answer is that we need to [00:26:00] build these tools, we need to make algorithms accountable and we don’t need to be intimidated by the very thought of it because I want people to think about it like this. Let’s say you’re a teacher and if you’re told, “We have a new way of evaluating you as a teacher.” What would the first thing be that you would ask? You would ask, “How different is it from the old way you evaluated me?”

If you were [00:26:30] in charge of the school system, you would ask, “to what extent does this new system agree with me, with what I think the best … How does it score the best teachers in my opinion? How does it score the teachers that I think are the worst in my opinion?” There would be some kind of check. I would call it ground truth. There would be some ground truth to make sure that the algorithm is performing as expected. There would be, for example, the first [00:27:00] two years it was rolled out, this new system, they would continue to evaluate teachers by the old system and they would check to make sure there was some relationship between those two things.

If it’s too expensive to use the old system everywhere, just do it in four schools. Make sure that for those four schools, for the teachers in those four schools, the two systems, the expensive old one and the cheaper new one, they agree for the most part. In other words, it’s a way of saying this is relatively accurate.

[00:27:30] The second remark about accountability is what we really want to make sure often is that what we’re doing isn’t discriminatory. So what we’d want to say is, “What is the success rate for … ” Let’s say it’s a hiring algorithm at Google and Google’s been in the news recently for having sexist practises and I’m not singling out Google. It could happen to literally any large company, but they have possibly an algorithm to decide who gets hired.

[00:28:00] The question is is that a fair algorithm? I don’t know, but what we should do is we should use human beings to decide here are 10 qualified women, 10 qualified men. How many of these women get through this filtering system and how many of the men do, and if you see that twice as many women are filtered out as men, then you’d be like, “Oh, this doesn’t seem fair.”

In other words, it’s not rocket science. I’m asking us to think about every process that is [00:28:30] an algorithmic process. Think of it as like an old fashioned human process. How would we check an old fashioned human process was fair? We had ways of thinking about it. We could still use those same ways. We can ask for audits. That’s what I call it, an algorithmic audit. Let’s build methodology, and that’s what I’m actually trying to do right now in my company, which is an algorithmic auditing company, like build methodology to test whether algorithms are doing [00:29:00] what they’re supposed to do, whether they’re meaningful, they’re accurate, and whether they’re fair, whether they’re legal, and these are all questions that we absolutely must start asking.

Andy McLean:                   When it comes to data, accuracy and integrity is the name of the game. Because of the complexities that we have discussed, do you feel at least for now that big data, both the input and interpretation of that need to be done by a data scientist?

Cathy O’Neil:                      I do think there’s a new field here, right. There’s a new field of expertise here and it’s [00:29:30] not exactly the same as building algorithms. It’s a new field, which is interpreting and auditing algorithms, and that field will grow up in the next 10 years and I’m hoping to be one of the founders of that field, but I also think that if we are successful in this new field, if we’re successful, what we’re going to do is we’re going to decouple the technical issues of formalising code and deciding how to interpret results from algorithms. We’re going to decouple that [00:30:00] technical stuff that very few people will understand from the question of the ethics and the values that we’re trying to insert into these algorithms.

In other words, what I would like to see is for us to be interpreters or translators of ethics, ethical decisions that are made in a very large group, in as large as possible group, like people, stakeholders of these algorithms. In the case of the teachers, it would be like teachers [00:30:30] should be part of this, principals, federal people who care about accountability. Everyone should decide what we’re actually trying to measure, what we value, the cost of failure and who should bear the cost and how should that cost be distributed, etc. Then once they’ve made those decisions, then the data scientist in question should translate that into the code. That is the technical part, but it shouldn’t have anything to do with [00:31:00] the ethics.

Let me just back up second. The real issue here is that data scientists are doing two things at once. They’re building code. They’re building these algorithms, but they’re also implicitly or explicitly inserting values and ethics into their code that they really have no business doing. They’re not trained in ethics. Often, they don’t even think about the ethical implications of their code [00:31:30] and that’s the problem. The problem is that we have default if you will, default ethics that they are often extremely destructive and lay waste to thousands of people’s lives, but because we’re not acknowledging it, we’re not addressing it, it continues to happen.

What I want us to do as data scientists as a field is to say, “That’s not my job, that’s not my job and I refuse to do it, but what I’ll do is I’ll listen to what you guys decide the ethics should be. I’ll insert that and I will also [00:32:00] insert an ongoing monitor of my algorithm to make sure it is functioning appropriately to this chosen set of ethics and values.”

Andy McLean:                   Now big data mistakes can have devastating consequences or when algorithms are wrong or biases have been introduced. Should governments heavily regulate the industry until we’ve got the balance right do you think?

Cathy O’Neil:                      The answer is yes, I do think it needs to be regulated by government, but I think it’s [00:32:30] important to point out that right now, there’s all sorts of laws that I don’t think are being enforced. So we already have laws in the books. Like the example I gave about the Americans with Disability Act, the personality test filtering out people with mental health status, that’s illegal, right, if it’s actually happening, which we have strong evidence that it is. It’s illegal and yet, they’re getting away with it.

When we talk about regulating algorithms, the first step [00:33:00] is just enforcing the laws that we already have even in the context of algorithms. Right now, what’s happening is the regulators are so intimidated themselves by the algorithms and by the technology and they don’t know how to look into it, they don’t know how to audit it, they don’t have the expertise and frankly, we don’t have the methodology and the tools to do it yet that it’s just people are getting away with doing illegal things because they’re doing it via algorithm. The very first step is make the algorithms legal in our [00:33:30] current legal context.

The second step would be understanding that these algorithms around political micro targeting and other kinds of online stuff, they’re creating problems that we don’t have laws for. We don’t have laws to address yet and so in that sense, I do think we need to update our laws and expand them to address the questions of propaganda and undermining of democracy.

That might look pretty simple. One way that [00:34:00] could look would be for Facebook to be showing us the kinds of ads that they’re showing people, there’s some kind of transparency so that at least a nosy journalist could see whether a campaign is actually feeding false information to people who might vote a certain way or not vote at all. Right now, we have no transparency into that and I think that’s a problem and I think there should be a law, but there’s no law right now.

Mike Lynch:                        [00:34:30] In the near future, will the combination of AI and big data, which may require less human interaction, mean we’ll see more accurate information or could it be the start of a bigger problem?

Cathy O’Neil:                      I think over time, as more and more data is collected, assuming it continues to be legal to do such surveillance and collection that algorithms will become more accurate over time and that’s the thing I was worried about with respect to [00:35:00] those health scores, the future health risk scores, that they will get more and more accurate to the point where people who will someday get diabetes are being asked to pay for their diabetes cost in advance and they will completely unaffordable.

I would say that the biggest fear I have is that the algorithms as they become omnipresent in people’s lives and from the age of 5 on because they’re being tracked as primary school students and then, their big data [00:35:30] algorithms all over the place and getting accepted to college, getting accepted in a job, all these sorts of things at every sort of point that what I worry about is that, let me put it this way, it’ll be a force against mobility, right. We’ll have people who are defined more and more by their demographic, where they grew up, what their race is, what their gender is, what their zip code is and if they are unlucky enough to be in the same family as somebody who went to prison, that [00:36:00] is actually used against them in recidivism risk scores right now. If your father went to prison, your risk score will go up and your sentence will be longer. It’s crazy.

It has very little to do with behaviour as an individual, but it has everything to do with demographics like where you’re from. As I said, because it works 51% of the time, because it’s a better guess than nothing, this is used by the algorithms [00:36:30] so unlucky people will continue to be unlucky. Lucky people will continue to be lucky. The status quo is propagated. If you think about the status quo being propagated over a lifetime, that means that poor people are funnelled into poor neighbourhoods and rich into richer neighbourhoods and it works against the concept of mobility itself, which we already have enough problems with that in the age of deep inequality, but I think algorithms [00:37:00] are going to make that even worse.

Mike Lynch:                        Cathy, thank you so much for your time.

Cathy O’Neil:                      Thanks. Thanks for having me.

Andy McLean:                   Cathy O’Neil is an American mathematician, data scientist, TED Talk speaker and author. Her latest book is Weapons of Math Destruction. Check out our podcast show notes for details.

If you’re curious to know more about big data, how it’s shaping the future of business and how you can harness it for the powers of good, then you might want to register for the online tech bootcamp conference on the 13th of September. [00:37:30] You can access the conference online from anywhere in the world. Check out our show notes for details.

Mike Lynch:                        That concludes Episode 12 of the Acuity Magazine Podcast. We encourage you to subscribe to the podcast at acuitypodcast.com or on iTunes. Acuitypodcast.com is also the place where you can keep up to date with our latest interviews or view the show notes for each episode. You can email us directly, [email protected] There’s lots more ahead on Acuity [00:38:00] Podcast. Until next time, bye for now.

Announcer:                        The Acuity Podcast is brought to you by Chartered Accountants Australia and New Zealand.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone
2017-08-15T10:07:22+00:00