Tag Archives: ML

BaRT: Barrage of Random Transforms for Adversarially Robust Defense

This week I'm at CVPR — the IEEE's Computer Vision and Pattern Recognition Conference, which is a huge AI event. I'm currently rehearsing the timing of my talk one last time, but I wanted to take a minute between run-throughs to link to my co-author Steven Forsyth's wonderful post on the NVIDIA research blog about our paper.

Steven does a fantastic job of describing our work, so head over there to see what he has to say. I couldn't resist putting a post of my own because (a) I love this video we created...

...and (b), Steven left out what I think was the most convincing result we had, which shows that BaRT achieves a Top-1 accuracy on ImageNet that is higher than the Top-5 accuracy of the previous state-of-the-art defense, Adversarial Training.

A result from our paper, showing accuracy for varying adversarial distances.
Accuracy of BaRT under attack by PGD for varying adversarial distances, compared to the previous state-of-the-art.

Also, (c) I am very proud of this work. It's been an idea I've been batting around for almost three years now, and I finally got approval from my client to pursue it last year. It turns out it works exactly how I expected, and I can honestly say that this is the first — and probably only — time in my scientific career that has ever happened.

If you want a copy of the paper, complete with some code in the appendices, ((Our hands are somewhat tied releasing the full code due to the nature of our client relationship with the wonderful Laboratory for Physical Sciences, who funded this work.)) our poster, and the slides for our oral presentation you can find it on the BaRT page I slapped together on my website.

Posted in CS / Science / Tech / Coding | Tagged , , , , , | Leave a comment

Why we worry about the Ethics of Machine Intelligence

This essay was co-authored by myself and Steve Mills.

We worry about the ethics of Machine Intelligence (MI) and we fear our community is completely unprepared for the power we now wield. Let us tell you why.

To be clear, we’re big believers in the far-reaching good MI can do. Every week there are new advances that will dramatically improve the world. In the past month we have seen research that could improve the way we control prosthetic devices, detect pneumonia, understand long-term patient trajectories, and monitor ocean health. That’s in the last 30 days. By the time you read this, there will be even more examples. We really do believe MI will transform the world around us for the better, which is why we are actively involved in researching and deploying new MI capabilities and products.

There is, however, a darker side. MI also has the potential to be used for evil. One illustrative example is a recent study by Stanford University researchers who developed an algorithm to predict sexual orientation from facial images. When you consider recent news of the detainment and torturing of more than 100 male homosexuals in the Russian republic of Chechnya, you quickly see the cause for concern. This software and a few cameras positioned on busy street corners will allow the targeting of homosexuals at industrial-scale – hundreds quickly become thousands. The potential for this isn’t so far-fetched. China is already using CCTV and facial recognition software to catch jaywalkers. The researchers pointed out that their findings “expose[d] a threat to the privacy and safety of gay men and women.” That disavowal does little to prevent outside groups from implementing the technology for mass targeting and persecution.

Many technologies have the potential to be applied for nefarious purposes. This is not new. What is new about MI is the scale and magnitude of impact it can achieve. This scope is what will allow it to do so much good, but also so much bad. It is like no other technology that has come before. The notable exception being atomic weapons, a comparison others have already drawn. We hesitate to draw such a comparison for fear of perpetuating a sensationalistic narrative that distracts from this conversation about ethics. That said, it’s the closest parallel we can think of in terms of the scale (potential to impact tens of millions of people) and magnitude (potential to do physical harm).

None of this is why we worry so much about the ethics of MI. We worry because MI is unique in so many ways that we are left completely unprepared to have this discussion.

Ethics is not [yet] a core commitment in the MI field. Compare this with medicine where a commitment to ethics has existed for centuries in the form of the Hippocratic Oath. Members of the physics community now pledge their intent to do no harm with their science. In other fields ethics is part of the very ethos. Not so with MI. Compared to other disciplines the field is so young we haven’t had time to mature and learn lessons from the past. We must look to these other fields and their hard-earned lessons to guide our own behavior.

Computer scientists and mathematicians have never before wielded this kind of power. The atomic bomb is one exception; cyber weapons may be another. Both of these, however, represent intentional applications of technology.  While the public was unaware of the Manhattan Project, the scientists involved knew the goal and made an informed decision to take part. The Stanford study described earlier has clear nefarious applications; many other research efforts in MI may not. Researchers run the risk of unwittingly conducting studies that have applications they never envisioned and do not condone. Furthermore, research into atomic weapons could only be implemented by a small number of nation-states with access to proper materials and expertise. Contrast that with MI, where a reasonably talented coder who has taken some open source machine learning classes can easily implement and effectively ‘weaponize’ published techniques. Within our field, we have never had to worry about this degree of power to do harm. We must reset our thinking and approach our work with a new degree of rigor, humility, and caution.

Ethical oversight bodies from other scientific fields seem ill-prepared for MI. Looking to existing ethical oversight bodies is a logical approach. Even we suggested that MI is a “grand experiment on all of humanity” and should follow principals borrowed from human subject research. The fact that Stanford’s Institutional Review Board (IRB), a respected body within the research community, reviewed and approved research with questionable applications should give us all pause. Researchers have long raised questions about the broken IRB system. An IRB system designed to protect the interests of study participants may be unsuited for situations in which potential harm accrues not to the subjects but to society at large. It’s clear that the standards that have served other scientific fields for decades or even centuries may not be prepared for MI’s unique data and technology issues. These challenges are compounded even further by the general lack of MI expertise, or sometimes even technology expertise, within the members of these boards. We should continue to work with existing oversight bodies, but we must also take an active role in educating them and evolving their thinking towards MI.

MI ethical concerns are often not obvious. This differs dramatically from other scientific fields where ethical dilemmas are self-evident. That’s not to say they are easy to navigate. A recent story about an unconscious emergency room patient with a “Do Not Resuscitate” tattoo is a perfect example. Medical staff had to decide whether they should administer life-saving treatment despite the presence of the tattoo. They were faced with a very complex, but very obvious, ethical dilemma. The same is rarely true in MI where unintended consequences may not be immediately apparent and issues like bias can be hidden in complex algorithms. We have a responsibility to ourselves and our peers to be on the lookout for ethical issues and raise concerns as soon as they emerge.  

MI technology is moving faster than our approach to ethics. Other scientific fields have had hundreds of years for their approach to ethics to evolve alongside the science. MI is still nascent yet we are already moving technology from the ‘lab’ to full deployment. The speed at which that transition is happening has led to notable ethical issues including potential racism in criminal sentencing and discrimination in job hiring. The ethics of MI needs to be studied as much as the core technology if we ever hope to catch up and avoid these issues in the future. We need to catalyze an ongoing conversation around ethics much as we see in other fields like medicine, where there is active research and discussion within the community

The issue that looms behind all of this, however, is the fact that we can’t ‘put the genie back in the bottle’ once it has been released. We can’t undo the Stanford research now that it’s been published. As a community, we will forever be accountable for the technology that we create.

In the age of MI, corporate and personal values take on entirely new importance. We have to decide what we stand for and use that as a measure to evaluate our decisions. We can’t wait for issues to present themselves. We must be proactive and think in hypotheticals to anticipate the situations we will inevitably face.

Be assured that every organization will be faced with hard choices related to MI. Choices that could hurt the bottom line or, worse, harm the well-being of people now or in the future. We will need to decide, for example, if and how we want to be involved in Government efforts to vet immigrants or create technology that could ultimately help hackers. If we fail to accept that these choices inevitably exist, we run the risk of compromising our values. We need to stand strong in our beliefs and live the values we espouse for ourselves, our organizations, and our field of study. Ethics, like many things, is a slippery slope. Compromising once almost always leads to compromising again.

We must also recognize that the values of others may not mirror our own. We should approach those situations without prejudice. Instead of anger or defensiveness we should use them as an opportunity to have a meaningful dialog around ethics and values. When others raise concerns about our own actions, we must approach those conversations with humility and civility. Only then can we move forward as a community.

Machines are neither moral or immoral. We must work together to ensure they behave in a way that benefits, not harms, humanity. We don’t purport to have the answers to these complex issues. We simply request that you keep asking questions and take part in the discussion.


This has been crossposted to Medium and to the Booz Allen website as well.

We’re not the only one discussing these issues. Check out this Medium post by the NSF-Funded group Pervasive Data Ethics for Computational Research, Kate Crawford’s amazing NIPS keynote, Mustafa Suleyman’s recent essay in Wired UK, and Bryor Snefjella’s recent piece in BuzzFeed.

Posted in CS / Science / Tech / Coding | Tagged , , , , , , | Leave a comment

AIES 2018

Last week I attended the first annual conference on AI, Ethics & Society where I presented some work on a Decision Tree/Random Forest algorithm that makes decisions that are less biased or discriminatory. ((In the colloquial rather than technical sense)) You can read all the juicy details in our paper. This isn't a summary of our paper, although that blog post is coming soon. Instead I want to use this space to post some reaction to the conference itself. I was going to put this on a twitter thread, but it quickly grew out of control. So, in no particular order, here goes nothing:

Many of the talks people gave were applicable to GOFAI but don't fit with contemporary approaches. Approaches to improving/limiting/regulating/policing rule-based or expert systems won't work well (if at all) with emergent systems.

Many, many people are making the mistake of thinking that all machine learning is black box. Decision trees are ML but also some of the most transparent models possible. Everyone involved in this AI ethics discussion should learn a rudimentary taxonomy of AI systems. It would avoid mistakes and conflations like this, and it would take maybe an hour of time.

Now that I think of it, it would be great if next year's program included some tutorials. A crash course in AI taxonomy would be useful, as would a walk-through of what an AI programmer does day-to-day. (I think it would help people to understand what kinds of control we can have over AI behavior if they knew a little more about what went in to getting any sort of behavior at all.) I'd be interested in some lessons on liability law and engineering, or how standards organization operate.

Lots of people are letting the perfect be the enemy of the good. I heard plenty of complaints about solutions that alleviate problems but don't eliminate them completely, or work in a majority of situations but don't cover every possible sub-case.

Some of that was the standard posturing that happens at academic conferences ("well, sure, but have you ever thought of this??!") but that's a poor excuse for this kind of gotcha-ism.

Any academic conference has people who ask questions to show off how intelligent they are. This one had the added scourge of people asking questions to show off how intelligent and righteous they are. If ever there was a time to enforce concise Q&A rules, this is it.

We’re starting from near scratch here and working on a big problem. Adding any new tool to the toolbox should be welcome. Taking any small step towards the goal should be welcome.

People were in that room because they care about these problems. I heard too much grumbly backbiting about presenters that care about ethics, but don't care about it exactly the right way.

We can solve problems, or we can enforce orthodoxy, but I doubt we can do both.

It didn't occur to me at the time, but in retrospect I'm surprised how circumscribed the ethical scenarios being discussed were. There was very little talk of privacy, for instance, and not much about social networks/filter bubbles/"fake news"/etc. that has been such a part of the zeitgeist.

Speaking of zeitgeist, I didn't have to hear the word "blockchain" even one single time, for which I am thankful.

If I had to give a rough breakdown of topics, it would be 30% AV/trolley problems, 20% discrimination, 45% meta-discussion, and 5% everything else.

One questioner brought up Jonathan Haidt's Moral Foundations Theory at the very end of the last day. I think he slightly misinterpreted Haidt (but I'm not sure since the questioner was laudably concise), but I was waiting all weekend for someone to bring him up at all.

If any audience would recognize the difference between “bias” in the colloquial sense and “bias” in the technical, ML/stats sense, I would have hoped it was here. No such luck. This wasn't a huge problem in practice, but it’s still annoying.

There’s a ton of hand-waving about how many of the policies being proposed for ethical AI will actually work at the implementation level. “Hand-waving” is even too generous of a term. It’s one thing to propose rules, but how do you make that work when fingers are hitting keyboards?

I’ll give people some slack here because most talks were very short, but “we’ll figure out what we want, and then tell the engineers to go make it happen somehow” is not really a plan. The plan needs to be grounded in what's possible starting at its conception, not left as an implementation detail for the technicians to figure out later.

"We'll figure out what to do, and then tell the geeks to do it" is not an effective plan. One of the ways it can fail is because it is tinged with elitism. (I don't think participants intended to be elitist, but that's how some of these talks could be read.) I fully endorse working with experts in ethics, sociology, law, psychology, etc. But if the technicians involved interpret what those experts say — accurately or not — as "we, the appointed high priesthood of ethics, will tell you, the dirty code morlocks, what right and wrong is, and you will make our vision reality" then the technicians will not be well inclined to listen to those experts.

Everyone wants to 'Do The Right Thing'. Let's work together to help each other do that and refrain as much as possible from finger pointing at people who are 'Doing It Wrong.' Berating people who have fallen short of your ethical standards — even those who have fallen way, way short — feels immensely satisfying and is a solid way to solidify your in-group, but it's not productive in the long run. That doesn't mean we need to equivocate or let people off the hook for substandard behavior, but it does mean that the response should be to lead people away from their errors as much as possible rather than punishing for the sake of punishing.

I wish the policy & philosophy people here knew more about how AI is actually created.

(I’m sure the non-tech people wish I knew more about how moral philosophy, law, etc. works.)

Nonetheless, engineers are going to keep building AI systems whether or not philosophers etc. get on board. If the latter want to help drive development there is some onus on them to better learn the lay of the land. That’s not just, but they have they weaker bargaining position so I think it's how things will have to be.

Of course I'm an engineer, so this is admittedly a self-serving opinion. I still think it's accurate though.

Even if every corporation, university, and government lab stopped working on AI because of ethical concerns, the research would slow but not stop. I can not emphasize enough how low the barriers to entry in this space are. Anyone with access to arXiv, github, and a $2000 gaming computer or some AWS credits can get in the game.

I was always happy to hear participants recognize that while AI decision making can be unethical/amoral, human decision making is also often terrible. It’s not enough to say the machine is bad if you don’t ask “bad compared to what alternative?”. Analyze on the right margin! Okay, the AI recidivism model has non-zero bias. How biased is the parole board? Don't compare real machines to ideal humans.

Similarly, don't compare real-world AI systems with ideal regulations or standards. Consider how regulations will end up in the real world. Say what you will about the Public Choice folks, but their central axiom is hard to dispute: actors in the public sector aren't angels either.

One poster explicitly mentioned Hume and the Induction Problem, which I would love to see taught in all Data Science classes.

Several commenters brought up the very important point that datasets are not reality. This map-is-not-the-territory point also deserves to be repeated in every Data Science classroom far more often.

That said, I still put more trust in quantitative analysis over qualitative. But let's be humble. A data set is not the world, it is a lens with which we view the world, and with it we see but through a glass darkly.

I'm afraid that overall this post makes me seem much more negative on AIES than I really am. Complaining is easier than complementing. Sorry. I think this has been a good conference full of good people trying to do a good job. It was also a very friendly crowd, so as someone with a not insignificant amount of social anxiety, thank you to all the attendees.

Posted in CS / Science / Tech / Coding | Tagged , , , , , | Leave a comment

MalConv: Lessons learned from Deep Learning on executables

I don't usually write up my technical work here, mostly because I spend enough hours as is doing technical writing. But a co-author, Jon Barker, recently wrote a post on the NVIDIA Parallel For All blog about one of our papers on neural networks for detecting malware, so I thought I'd link to it here. (You can read the paper itself, "Malware Detection by Eating a Whole EXE" here.) Plus it was on the front page of Hacker News earlier this week, which is not something I thought would ever happen to my work.

Rather than rehashing everything in Jon's Parallel for All post about our work, I want to highlight some of the lessons we learned from doing this about ML/neural nets/deep learning.

As way of background, I'll lift a few paragraphs from Jon's introduction:

The paper introduces an artificial neural network trained to differentiate between benign and malicious Windows executable files with only the raw byte sequence of the executable as input. This approach has several practical advantages:

  • No hand-crafted features or knowledge of the compiler used are required. This means the trained model is generalizable and robust to natural variations in malware.
  • The computational complexity is linearly dependent on the sequence length (binary size), which means inference is fast and scalable to very large files.
  • Important sub-regions of the binary can be identified for forensic analysis.
  • This approach is also adaptable to new file formats, compilers and instruction set architectures—all we need is training data.

We also hope this paper demonstrates that malware detection from raw byte sequences has unique and challenging properties that make it a fruitful research area for the larger machine learning community.

One of the big issues we were confronting with our approach, MalConv, is that executables are often millions of bytes in length. That's orders of magnitude more time steps than most sequence processing networks deal with. Big data usually refers to lots and lots of small data points, but for us each individual sample was big. Saying this was a non-trivial problem is a serious understatement.

The MalConv architecture
Architecture of the malware detection network. (Image copyright NVIDIA.)

Here are three lessons we learned, not about malware or cybersecurity, but about the process of building neural networks on such unusual data.

1. Deep learning != image processing

The large majority of the work in deep learning has been done in the image domain. Of the remainder, the large majority has been in either text or speech. Many of the lessons, best practices, rules of thumb, etc., that we think apply to deep learning may actually be specific to these domains.

For instance, the community has settled around narrow convolutional filters, stacked with a lot of depth as being generally the best way to go. And for images, narrow-and-deep absolutely seems to be the correct choice. But in order to get a network that processes two million time steps to fit in memory at all (on beefy 16GB cards no less) we were forced to go wide-and-shallow.

With images, a pixel values is always a pixel value. 0x20 in a grayscale image is always darkish gray, no matter what. In an executable, a byte values are ridiculously polysemous: 0x20 may be part of an instruction, a string, a bit array, a compressed or encrypted values, an address, etc. You can't interpolate between values at all, so you can't resize or crop the way you would with images to make your data set smaller or introduce data augmentation. Binaries also play havoc with locality, since you can re-arrange functions in any order, among other things. You can't rely on any Tobbler's Law ((Everything is related, but near things are more related than far things.)) relationship the way you can in images, text, or speech.

2. BatchNorm isn't pixie dust

Batch Normalization has this bippity-boppity-boo magic quality. Just sprinkle it on top of your network architecture, and things that didn't converge before now do, and things that did converge now converge faster. It's worked like that every time I've tried it — on images. When we tried it on binaries it actually had the opposite effect: networks that converged slowly now didn't at all, no matter what variety of architecture we tried. It's also had no effect at all on some other esoteric data sets that I've worked on.

We discuss this at more length in the paper (§5.3), but here's the relevant figure:

BatchNorm activations
KDE plots of the convolution response (pre-ReLU) for multiple architectures. Red and orange: two layers of ResNet; green: Inception-v4; blue: our network; black dashed: a true Gaussian distribution for reference.

This is showing the pre-BN activations from MalConv (blue) and from ResNet (red & orange) and Inception-v4 (green). The purpose of BatchNorm is to output values in a standard normal, and it implicitly expects inputs that are relatively close to that. What we suspect is happening is that the input values from other networks aren't gaussian, but they're close-ish. ((I'd love to be able to quantify that closeness, but every test for normality I'm aware of doesn't apply when you have this many samples. If anyone knows of a more robust test please let me know.)) The input values for MalConv display huge asperity, and aren't even unimodal. If BatchNorm is being wonky for you, I'd suggest plotting the pre-BN activations and checking to see that they're relatively smooth and unimodal.

3. The Lump of Regularization Fallacy

If you're overfitting, you probably need more regularization. Simple advice, and easily executed. Everytime I see this brought up though, people treat regularization as if it's this monolithic thing. Implicitly, people are talking as if you have some pile of regularization, and if you need to fight overfitting then you just shovel more regularization on top. It doesn't matter what kind, just add more.

We ran in to overfitting problems and tried every method we could think of: weight decay, dropout, regional dropout, gradient noise, activation noise, and on and on. The only one that had any impact was DeCov, which penalized activities in the penultimate layer that are highly correlated with each other. I have no idea what will work on your data — especially if it's not images/speech/text — so try different types. Don't just treat regularization as a single knob that you crank up or down.

I hope some of these lessons are helpful to you if you're into cybersecurity, or pushing machine learning into new domains in general. We'll be presenting the paper this is all based on at the Artificial Intelligence for Cyber Security (AICS) workshop at AAAI in February, so if you're at AAAI then stop by and talk.

Posted in CS / Science / Tech / Coding | Tagged , , , , , , | Leave a comment

AI's "one trick pony" has a hell of a trick

The MIT Technology Review has a recent article by James Somers about error backpropagation, "Is AI Riding a One-Trick Pony?" Overall, I agree with the message in the article. We need to keep thinking of new paradigms because the SotA right now is very useful, but not correct in any rigorous way. However, as much as I agree with the thesis, I think Somers oversells it, especially in the beginning of the piece. For instance, the introductory segment concludes:

When you boil it down, AI today is deep learning, and deep learning is backprop — which is amazing, considering that backprop is more than 30 years old. It’s worth understanding how that happened—how a technique could lie in wait for so long and then cause such an explosion — because once you understand the story of backprop, you’ll start to understand the current moment in AI, and in particular the fact that maybe we’re not actually at the beginning of a revolution. Maybe we’re at the end of one.

That's a bit like saying "When you boil it down, flight is airfoils, and airfoils are Bernoulli's principle — which is amazing, considering that Bernoulli's principle is almost 300 years old." I totally endorse the idea that we ought to understand backprop; I've spent a lot of effort in the last couple of months organizing training for some of my firm's senior leadership on neural networks, and EBP/gradient descent is the heart of my presentation. But I would be very, very careful about concluding that backprop is the entire show.

Backprop was also not "lying in wait." People were working on it since it was introduced in 1986. The problem was that '86 was the height of the 2nd AI winter, which lasted another decade. Just like people should understand backprop to understand contemporary AI, they should learn about the history of AI to understand contemporary AI. Just because no one outside of CS (and precious few people in CS, for that matter) paid any attention to neural networks before 2015 doesn't mean they were completely dormant, only to spring up fully formed in some sort of intellectual Athenian birth.

I really don't want to be in the position of defending backprop. I took the trouble to write a dissertation about non-backprop neural nets for a reason, after all. ((That reason being, roughly put, that we're pretty sure the brain is not using backprop, and it seems ill-advised to ignore the mechanisms employed by the most intelligent thing we are aware of.)) But I also don't want to be in the position of letting sloppy arguments against neural nets go unremarked. That road leads to people mischaracterizing Minksy and Papert, abandoning neural nets for generations, and putting us epochs behind where we might have been. ((Plus sloppy arguments should be eschewed on the basis of the sloppiness alone, irrespective of their consequences.))


PS This is also worth a rejoinder:

Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

That's not what thought is, that's how thought can be represented. Planets are not vectors, but their orbits can be profitably described that way, because "it behooves us to place the foundations of knowledge in mathematics." I'm sorry if that seems pedantic, but the distinction between a thing and its representation—besides giving semioticians something to talk about—underpins much of our interpretation of AI systems and cognitive science as well. Indeed, a huge chunk of data science work is figuring out the right representations. If you can get that, your problem is often largely solved. ((IIRC both Knuth and Torvalds have aphorisms to the effect that once you have chosen the correct data structures, the correct algorithms will naturally follow. I think AI and neuroscience are dealing with a lot of friction because we haven't been able to figure out the right representations/data structures. When we do, the right learning algorithms will follow much more easily.))

PPS This, on the other hand, I agree with entirely:

Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way. … What we know about intelligence is nothing against the vastness of what we still don’t know.

What I fear is that people read that and conclude that artificial neural networks are built on a shallow foundation, so we should give up on them as being unreliable. A much better conclusion would be that we need to keep working and build better, deeper foundations.

Posted in CS / Science / Tech / Coding | Tagged , , , , , , , | Leave a comment