Transcript for Elon Musk: Tesla Autopilot
SPEAKER_01
00:00 - 03:05
The following is a conversation with Elon Musk. He's a CEO of Tesla, SpaceX, New York Link, and a co-founder of several other companies. This conversation is part of the artificial intelligence podcast. The series includes leading researchers in academia and industry, including CEOs and CEOs of automotive robotics, AI and technology companies. This conversation happened after the release of the paper from our group at MIT on driver functional vigilance during use of Tesla's autopilot. The Tesla team reached out to me offering a podcast conversation with Mr. Musk. I accepted with full control of questions I could ask and the choice of what is released publicly. I ended up editing out nothing of substance. I've never spoken with Elon before this conversation, publicly or privately. Neither he nor his companies have any influence on my opinion, nor on the rigor and integrity of the scientific method that I practice in my position in MIT. Tesla has never financially supported my research, and I've never owned a Tesla vehicle, I've never owned Tesla stock. This podcast is not a scientific paper, it is a conversation. I respect Elon as I do all other leaders and engineers I've spoken with. We agree on some things and disagree on others. My goal is always with these conversations is to understand the way the guests sees the world. One particular point of disagreement in this conversation was the extent to which camera-based driver monitoring will improve outcomes and for how long it will remain relevant for AI assisted driving. As someone who works on and is fascinated by human-centered artificial intelligence, I believe that if implemented and integrated effectively, camera-based driver monitoring is likely to be of benefit in both the short-term and the long-term. In contrast, Elon and Tesla's focus is on the improvement of autopilot such that it's statistical, safety benefits override any concern of human behavior and psychology. Elon and I may not agree on everything. But I deeply respect the engineering and innovation behind the efforts that he leads. My goal here is to catalyze a rigorous nuanced and objective discussion in industry in academia on AI assisted driving. One that ultimately makes for a safer and better world. And now, here's my conversation with Elon Musk. What was the vision, the dream of autopilot when in the beginning? The big picture system level when it was first conceived and started being installed in 2014 and the hardware and the cars. What was the vision, the dream?
SPEAKER_00
03:05 - 04:10
I wouldn't characterize the vision or dream. It's simply that there are obviously two massive revolutions in the automobile industry. one is the transition to electrification and then the other is autonomy. And it became obvious to me that in the future, in any car that does not have autonomy, I would be about as useful as a horse, which is not to say that there's no use, it's just rare. And somewhat, it isn't credit, if somebody has a horse at this point. So, it's obvious that cars will drive themselves completely. It's just a question of time. And if we did not participate in the autonomy revolution, then our cause would not be useful to people relative to cars that are autonomous. I mean, autonomous car is arguably worth 5 to 10 times more than a car which is not autonomous.
SPEAKER_01
04:10 - 04:12
In a long term.
SPEAKER_00
04:12 - 04:18
Transfer what you mean by long term, but with to at least for the next five years, perhaps ten years.
SPEAKER_01
04:18 - 04:36
So there are a lot of very interesting design choices with autopilot early on. First is showing on the instrument cluster or in the model three on the center stack display, what the combined sense of sweet sees, what was the thinking behind that choice, was their debate, what was the process?
SPEAKER_00
04:37 - 05:18
The whole point of the display is to provide a health check on the vehicle's perception of reality. So the vehicle is taking an information for a bunch of sensors, primarily cameras, but also radar and ultrasonic's GPS and so forth. And then that that information is then rendered into vector space and then you know with a bunch of objects with properties like lane lines and traffic lights and other cars and then in vector space that is re-randered onto a display so you can confirm whether the car knows what's going on or not by looking at the window.
SPEAKER_01
05:19 - 06:09
Right, I think that's an extremely powerful thing for people to get an understanding, so to become one with a system and understanding what the system was capable of. Now, have you considered showing more? So if we look at the computer vision, Like road segmentation, lane detection, vehicle detection, object detection, underlying the system. There is at the edges, someone's certainty. Have you considered revealing the parts that the uncertainty in the system? Yeah, so right now it shows like the vehicles and the vicinity of very clean crisp image and people do confirm that there's a car in front of me and the system sees there's a car in front of me But to help people build an intuition of what computer vision is by showing some of the uncertainty
SPEAKER_00
06:10 - 06:55
Well, I think it's, I mean, my car, I always look at this sort of the debug view. And there's this two debug views. One is augmented vision, where, which I'm sure you've seen where it basically, we draw boxes and labels around objects that are recognized. And then there's, we'll call the visualizer, which is basically a vector space representation, summing up the input from all sensors. That does not show any pictures, but it shows all of the, it basically shows the cause view of of the world in vector space. But I think this is very difficult for people to not know who people to understand. They would not know what they're looking at.
SPEAKER_01
06:56 - 07:05
So it's almost an HDMI challenge to the current things that are being displayed is optimized for the general public understanding of what the system is capable of.
SPEAKER_00
07:05 - 07:27
If like if you're no idea how computer vision works or anything you can still look at the screen and see if the card knows what's going on. And then if you're a development engineer, or if you're, you know, if you have the development build, like I do, then you can see, you know, all the debug information, but those would just be like total diverse to most people.
SPEAKER_01
07:28 - 08:01
What's your view on how to best distribute effort? So there's three, I would say, technical aspects of autopilot that are really important. So it's the underlying algorithms, like the neural network architecture. There's the data, so that it's trained on, and then there's the hardware development. There may be others, but so look algorithm. data hardware. You only have so much money, only have so much time. What do you think is the most important thing to allocate resources to? Do you see this pretty evenly distributed between those three?
SPEAKER_00
08:01 - 09:46
We automatically get fast amounts of data because all of our cars have eight external facing cameras and radar and usually 12 ultrasonic sensors GPS obviously and I am you and so we basically have a fleet that has and we've got about 400,000 cars on the road that have that level of data actually I think you keep quite close track of it actually yes Yeah, so we were approaching half million cars on the road that have the full sensor suite. So this is, I'm not sure how many other cars on the road have this sensor suite, but I would be surprised if it's more than 5,000, which means that we have 99% of all the data. So there's this huge inflow of data. Absolutely, massive inflow of data. And then it's taken about three years, but now we've finally developed out full self-driving computer, which can process, in order to magnitude as much as the Nvidia system that we currently have in the cars. And it's really just to use it, you unplug the Nvidia computer and plug the Tesla computer in. That's it. And it's, in fact, we're not even, we still are exploring the boundaries of escape abilities. We're able to run the cameras at full frame rate, full resolution, not even crop the images. And it's still got had room, even on one of the systems. The whole sub-driving computer is really two computers, two systems on a chip that are fully redundant. So you could put it both through basically any part of that system and it still works.
SPEAKER_01
09:47 - 09:58
The redundancy, are they perfect copies of each other? Yeah. Also, it's purely for redundancy as opposed to an argument machine kind of architecture where they're both making decisions. This is purely for redundancy.
SPEAKER_00
09:58 - 10:33
I think it's, if you ever say a twin engine aircraft, a commercial aircraft, this system will operate best if both systems are operating, but it's, it's capable of operating safely on one. So, but as is right now, we haven't even hit the edge of performance. So, there's no need to actually distribute to functionality across both SOCs. We can actually just run a full duplicate on on each one.
SPEAKER_01
10:34 - 11:15
Do you have any really explored or hit the limit? No, not yet. So the magic of deep learning is that it gets better with data. You said there's a huge inflow of data, but the thing about driving the really valuable data to learn from is the edge cases. So how do you, I've heard you talk somewhere about autopilot disengagement being an important moment of time to use? Is there other edge cases or perhaps can you speak to those edge cases? What aspects of that might be valuable? Or if you have other ideas how to discover more and more and more edge cases in driving?
SPEAKER_00
11:17 - 12:01
Well, there's a lot of things that are learned. There's certainly edge cases where I say somebody's on autopilot and they take over and then, okay, that's a trigger that goes to us and I says, okay, they take over for convenience or do they take over because the autopilot wasn't working properly. There's also, like let's say, we're trying to figure out what is the optimal spline for traversing an intersection. Then the ones where there are no interventions are the right ones. So you then say, okay, when it looks like this, do the following. And then you get the optimal spline for a complex, now getting a complex intersection.
SPEAKER_01
12:02 - 12:18
So that's for this kind of the common case, if you're trying to capture a huge amount of samples of a particular intersection, how one thing's going right. And then there's the edge case where, as you said, not for convenience, but something that's all right.
SPEAKER_00
12:18 - 12:30
Somebody took away some of your sort of manual control from autopilot. And really, like the way to look at this is view, all input is error. If the user had to do input, all input is error.
SPEAKER_01
12:30 - 12:43
It's a powerful line to think of it that way, because it may very well be error, but if you want to exit the highway, or if you want to, it's an navigation decision that all autopiles not currently designed to do, then the driver takes over.
SPEAKER_00
12:44 - 13:05
How do you know if that's going to change with Navi get a notepad which we just released and with that stalk of them so the Navi gauge like lane change based It like a certain control in order to change do it to your lane change or X there freeway or or doing a highway interchange The bathroom journey that will go away with the release that just went out
SPEAKER_01
13:05 - 13:11
Yeah, so that I don't think people quite understand how big of a step that is.
SPEAKER_00
13:11 - 13:15
Yeah, they don't. So if you drive the car, then you do.
SPEAKER_01
13:15 - 13:38
So you still have to keep your hands on the steering wheel currently when it does the automatic clean change. What are? So there's these big leaps through the development of autopilot through its history and what stands out to you as the big leaps. I would say this one navigate an autopilot without a confirm what I'll have to confirm as a huge leap.
SPEAKER_00
13:38 - 14:12
It is a huge leap. They also automatically overtakes low cars. So it's both navigation and seeking the fastest lane. So it'll overtake a slow cause and exit the freeway and take highway interchangeers and and then we have traffic light recognition which introduced initially as a warning. I mean on the development version that I'm driving, the car fully stops and goes at traffic lights.
SPEAKER_01
14:13 - 14:26
So those are the steps, right? You've just mentioned something sort of an inkling of a step towards full autonomy. What would you say are the biggest technological roadblocks to full cell driving?
SPEAKER_00
14:26 - 15:17
Actually, I don't think we just, the full cell driving computer that we just, that, that has a, or called FSD computer. That's, that's now in production. So if you order any Model S or X or any Model 3 that has the full self-driving package, you'll get the FSD computer. That's important to have enough base computation. Then refining the neural net and the control software. But all of that can be provided as an over there update. The thing that's really profound and well-being emphasizing at the sort of what that investor-day that we're having focused on autonomy is that the cars currently being produced, but the hardware currently being produced, is capable of full self-driving.
SPEAKER_01
15:17 - 15:24
But capable is an interesting word because the hardware is, and as we're refined the software,
SPEAKER_00
15:26 - 15:49
the capabilities will increase dramatically, and then the reliability will increase dramatically, and then it will receive regulatory approval. So essentially buying a car today is an investment in the future. You're essentially buying, I think the most profound thing is that if you buy a Tesla today, I believe you are buying and appreciating asset, not appreciating asset.
SPEAKER_01
15:50 - 16:00
So that's a really important statement there because if hardware is capable enough, that's the hard thing to upgrade. Yes. Usually. Exactly. So then the rest is a software problem.
SPEAKER_00
16:00 - 16:03
Yes. A software has no marginal cost, really.
SPEAKER_01
16:04 - 16:25
But what's your intuition on the software side, how hard are the remaining steps to get it to where the experience, not just the safety, but the full experience is something that people would enjoy.
SPEAKER_00
16:26 - 17:08
Well, I think people enjoy it very much on the highways. It's a total game changer for quality of life for using Tesla or to pilot on the highways. So it's really just extending that functionality to city streets adding in the traffic light recognition navigating complex intersections. and then being able to navigate complicated parking lots so the car can exit a parking space and come and find you even if it's in a complete maze of a parking lot and then if and then you can just it just drop you off and find a parking spot by itself.
SPEAKER_01
17:09 - 18:14
Yeah, in terms of enjoyability and something that people would actually find a lot of use from the parking lot is a really, you know, it's rich of annoyance when you have to do it manually. So there's a lot of benefit to be gained from automation there. So let me start injecting the human into this discussion a little bit. So let's talk about full autonomy. If you look at the current level four vehicles being test on road like Waymo and so on, they're only technically autonomous. They're really level two systems with just the different design philosophy because there's always a safety driver and almost all cases in their monitoring system. Do you see Tesla's full-south driving is still for a time to come requiring supervision of the human being. So it's capabilities a powerful enough to drive, but nevertheless requires the human to still be supervising just like a safety driver is in a other fully autonomous vehicles.
SPEAKER_00
18:14 - 19:08
I think it will require Detecting hands on wheel for at least six months or something like that from here. Really, it's a question of like for a regulatory standpoint. How much safer than a person? Just what a pal need to be for it to be okay to not monitor the car. And this is a debate that one can have it. But you need a large amount of data so that you can prove with high confidence, statistically speaking, that the car is dramatically safer than a person. And that adding in the person monitoring does not materially affect the safety. So it might need to be like two or three hundred percent safer than a person. And how do you prove that?
SPEAKER_01
19:08 - 19:13
Insulence per mile. Insulence per mile. So crashes and fatalities.
SPEAKER_00
19:13 - 19:48
Yeah. Fritch fatalities would be a factor. But there are just not enough fatalities to be statistically significant at scale. But there are enough crashes, you know, there are much formal crashes and there are fatalities. So you can assess where is the probability of crash? than there's another step which probability of injury and probability of opponent injury and probability of death and all of those need to be much better than a person by at least have 200%.
SPEAKER_01
19:48 - 19:56
And you think there's the ability to have a healthy discourse with the regulatory bodies on this topic
SPEAKER_00
19:57 - 20:24
I mean, there's no question that regulators pay disproportionate amount of attention to that which generates press. This is just an objective fact. And it tells of generates a lot of press. So in the United States, there's, I think, almost 40,000 automotive deaths per year. But if there are four in Tesla, they'll probably receive 1,000 times more press than anyone else.
SPEAKER_01
20:25 - 21:56
So the psychology of that is actually fascinating. I don't think we'll have enough time to talk about that, but I have to talk to you about the human side of things. So myself and our team at MIT recently released the paper on functional vigilance of drivers while using autopilot. This is work we've been doing since autopilot was first released publicly over three years ago. collecting video driver faces and driver body. So I saw that you tweeted a quote from the abstract. So I can at least guess that you've glanced at it. Yeah, right. Can I talk you through what we found? Sure. Okay. So it appears that in the data that we've collected, that drivers are maintaining functional vigilance such that we're looking at 18,000 disengagement from autopilot, 18,900. and annotating where they able to take over control in a timely manner. So they were there present looking at the road to take over control. Okay. So this goes against what many would predict from the body of literature and vigilance with automation. Now, the question is, do you think these results hold across the broader population? So ours is just a small subset. Do you think one of the criticism is that, you know, there's a small minority of drivers that may be highly responsible where their vigilance decrement would increase without a pilot use?
SPEAKER_00
21:57 - 22:25
I think this is all really going to be swept. I mean, the systems are proving so much so fast that this is going to be a mood point very soon. Where vigilance is, if something's many times safer than a person, then adding a person just the effect on safety is limited. And in fact, it could be negative.
SPEAKER_01
22:27 - 22:39
That's really interesting. So the fact that a human may some percent of the population may exhibit a visionless document will not affect the overall statistics numbers of safety.
SPEAKER_00
22:39 - 23:25
No, in fact, I think it will become very, very quickly, maybe in towards the end of this year, but I'd say I'd be shocked if it's not next year at the latest, that having the person having a human intervene will decrease safety. I can imagine if you're an elevator. I used to be the elevator operators and you couldn't go in an elevator by yourself and work the lever to move between floors. Now, nobody wants an elevator operator because the automated elevator that stops the floors is much safer than the elevator operator. In fact, it would be quite dangerous to have someone with a lever that can move the elevator between floors.
SPEAKER_01
23:26 - 23:57
So that's a really powerful statement and really interesting one. But I also have to ask from a user experience and from a safety perspective, one of the passions for me algorithmically is a camera-based detection of just sensing the human. But detecting what the driver is looking at cognitive load, body pose, on the computer vision side that's a fascinating problem. But there's many in industry believe you have to have camera-based driver monitoring. Do you think there could be benefit gained from driver monitoring?
SPEAKER_00
23:57 - 24:32
If you have a system that's at a below a human level reliability, then driver monitoring makes sense. But if your system is dramatically better, more level than a human than driving monitoring, monitoring is not just not help much. And like I said, you just like as you wouldn't want someone into, like you wouldn't want someone in the elevator. If you're in an elevator, do you really want someone with a big lever? So some random person operating the elevator between flows. I wouldn't trust that. I would rather have the buttons.
SPEAKER_01
24:34 - 24:42
Okay, you're optimistic about the pace of improvement of the system, that from what you've seen with the full-size driving car computer.
SPEAKER_00
24:42 - 24:43
The right of improvement is exponential.
SPEAKER_01
24:45 - 25:55
So one of the other very interesting design choices early on that connects to this is the operational design domain of autopilot. So where autopilot is able to be turned on. So contrast, another vehicle system that we are studying is the Cadillac supercrew system that's in terms of ODD, very constrained to particular kinds of highways, well mapped, tested, much narrower than the ODD of Tesla vehicles. What's there's there's a 80 D. Yeah, that's good. That's a good line. What was the design decision? What's in that different philosophy of thinking where there's pros and cons what we see with a wide ODD is drive Tesla drivers are able to explore more the limitations of the system at least early on. and they understand together with the instrument cluster display, they start to understand what are the capabilities. So that's a benefit. The con is, you're letting drivers use it basically anywhere.
SPEAKER_00
25:55 - 25:58
So anyways, how could it detect lanes with coins?
SPEAKER_01
25:58 - 26:10
Was there philosophy designed decisions that were challenging, that were being made there? Or from the very beginning, was that, uh, don't I'm purpose with intent?
SPEAKER_00
26:11 - 26:49
Well, I mean, I think it's, frankly, it's pretty crazy giving, letting people drive a two-ton death machine manually. That's crazy. Like, in the future, people were like, I can't believe anyone was just a lot to drive for one of these two-ton death machines. I think it's just drive wherever they wanted. It's just like elevators. You're just like, move the elevator with the lever wherever you want. It can stop at halfway between floors if you want. It's pretty crazy. So... It's going to seem like a mad thing in the future that people would drive in cars.
SPEAKER_01
26:49 - 27:39
So I have a bunch of questions about the human psychology about behavior and so on. That would become that. Right. Because you have faith in the AI system, not faith, but both on the hardware side and the deep learning approach of learning from data will make it just far safer than humans. Yeah, exactly. Recently, there are a few hackers who tricked autopilot to act and not expect the ways of the adversarial examples. So we all know that neural network systems are very sensitive to minor disturbances to these adversarial examples on input. Do you think it's possible to defend against something like this for the industry? Sure. So what's the problem? Yeah. Can you elaborate on the confidence behind that answer?
SPEAKER_00
27:40 - 28:19
Well, the, you know, when you're on there, it's just like basically a bunch of matrix math. You have to be like a very sophisticated, somebody who really has to have neural nets. And like basically reverse engineer how the matrix is being built. And then create a little thing that's just exactly causes the matrix math to be slightly off. But it's very, then block that by having Basically anti-rate negative recognition. If the system sees something that looks like a matrix hack, excluded. It's such an easy thing to do.
SPEAKER_01
28:19 - 28:26
So learn both on the valid data and the valid data. So basically learn on the adversarial examples to be able to exclude them.
SPEAKER_00
28:26 - 28:46
Yeah, like you basically want us to both know what is a car and what is definitely not a car. And you train for this is a car and this is definitely not a car. Those are two different things. I feel like no idea neural nets really. They probably think neural nets involves like, you know, phishing net alarming.
SPEAKER_01
28:46 - 29:10
So as you know, taking a step beyond just Tesla and autopilot, current deep learning approaches still seem in some ways to be far from general intelligence systems. Do you think the current approaches will take us to general intelligence or do totally new ideas need to be invented?
SPEAKER_00
29:10 - 29:55
I think we're missing a few key ideas for general intelligence, general artificial general intelligence. But it's going to be upon us very quickly And then we'll need to figure out what shall we do if we even have that choice. But it's amazing how people can differentiate between, say, the narrow AI that allows a card to figure out what a lane line is and navigate streets versus general intelligence. These are just very different things. Like your toaster and your computer are both machines, but once much more sophisticated than another.
SPEAKER_01
29:55 - 30:00
Your confident with Tesla, you can create the world's best toaster.
SPEAKER_00
30:00 - 30:27
The world's best toaster, yes, the world's best self-driving. I'm, yes. To me, right now, this seems game set match. I don't, I mean, that's, I don't want to be complacent or confident, but that's what it is. That is just literally what it, how it appears right now. I could be wrong, but it appears to be the case that Tesla is vastly ahead of everyone.
SPEAKER_01
30:27 - 30:35
Do you think we'll ever create an AI system that we can love and love this back in a deep meaning for way, like in the movie her?
SPEAKER_00
30:37 - 30:42
I think AI will be capable of convincing you to form love with it very well.
SPEAKER_01
30:42 - 30:44
And it's different than us humans.
SPEAKER_00
30:46 - 31:34
You know, we start getting into a metaphysical question of, like, do emotions and thoughts exist in a different realm than the physical. And maybe they do, maybe they don't, I don't know, but from a physics standpoint, I think a tent of think of things, you know, like physics was my main sort of training. And for a physics standpoint, essentially, if it loves you in a way that you can't tell whether it's real or not, it is real. That's a physics view of love. Yeah. If there's no, if you, if you, if you cannot just, if you can't approve that, it does not. If there's no test that you can apply that would make it. make allow you to tell the difference, then there is no difference.
SPEAKER_01
31:34 - 31:45
And it's similar to seeing our world assimilation. There may not be a task to tell the difference between what the real world assimilation and therefore from a physics perspective, it might as well be the same thing.
SPEAKER_00
31:46 - 32:08
Yes. And there may be ways to test whether it's a simulation. There might be, I'm not saying they're wrong, but you could certainly imagine that a simulation could correct that once an entity in the simulation found a way to detect a simulation, it could either restart that it, you know, pause the simulation, start a new simulation or do one of many other things that then corrects for that error.
SPEAKER_01
32:10 - 32:33
So when maybe you or somebody else creates an AGI system and you get to ask her one question, will that question be?
SPEAKER_00
32:33 - 32:35
What's outside the simulation?
SPEAKER_01
32:38 - 32:41
Elon, thank you so much for talking today, as a pleasure. All right, thank you.