WEBVTT 1 00:00:27.410 --> 00:00:34.620 Dilma Da Silva: I see the number of participants is still going on. It's up, so let's wait until it gets a little stable. I guess 2 00:01:00.160 --> 00:01:02.050 Dilma Da Silva: so. Hello, everyone 3 00:01:02.230 --> 00:01:17.290 Dilma Da Silva: I'm Dilma da Silva. I am the division director for Ccf. Computing communication foundations in size, and here we size are delighted to have the opportunity to host a Professor Mike Carbine. 4 00:01:17.630 --> 00:01:35.330 Dilma Da Silva: So Professor Carbine is an associate professor of electrical engineering and computer science at Mit, where he leads the programming systems group in his work. He aims at improving reliability, performance, energy, consumption, and resilience for computer systems, one hundred 5 00:01:35.340 --> 00:01:50.219 Dilma Da Silva: using techniques from programming language. We are happy to say that Professor Kami has received an Nsf. Career. Award Islam Foundation Research Award Mit Frank Banks award for excellence in graduated devising. 6 00:01:50.230 --> 00:01:59.490 Dilma Da Silva: His work has been recognized in so many forums, but one of those you know best paper awards for I Ilr and I cfp the 7 00:02:07.430 --> 00:02:22.229 Dilma Da Silva: He got his undergrad in computer science from Stanford in two thousand and six, and then a Phd. In electoral and computer science from Mit. 8 00:02:22.240 --> 00:02:31.299 Dilma Da Silva: And he then went to Microsoft Research, where he worked with deep learning systems from two thousand and fourteen to two thousand and eighteen. 9 00:02:31.740 --> 00:02:33.390 Dilma Da Silva: So ah! 10 00:02:33.490 --> 00:02:44.129 Dilma Da Silva: We are very pleased to have you here, Professor Carmine, Mike, and go ahead, and we are looking forward to learning a lot of programming, 11 00:02:44.300 --> 00:02:48.000 Dilma Da Silva: uncertainty and the relationship between them. 12 00:02:48.370 --> 00:02:52.280 Michael Carbin: Great. Thank you so much for that generous interaction. 13 00:02:52.290 --> 00:02:57.149 Michael Carbin: So yeah, So today i'm going to talk about. I guess some work that I've been working on 14 00:02:57.280 --> 00:03:00.990 Michael Carbin: since graduate school, I guess, coming on almost fifteen years now, and 15 00:03:01.000 --> 00:03:30.880 Michael Carbin: the the flavor that has changed over the years as our computing systems have changed over the years. I mean just thinking about the past few months of what's going on with, you know. Chat your D. And G before, and just a new ways of thinking about software. It seems quite. I mean quite fantasical. It's a waste. Um, but for a long time it's It's sort of been what I've been expecting in some ways, because what I what I think and the way I think about the world where computation is going is that every single computation that we're dealing with and programming that or many of the very interesting, but emerging computations that we're thinking about are what I call a certain conversation, 16 00:03:31.210 --> 00:03:42.489 Michael Carbin: and so to let me try to explain that over the course of this talk. I try to explain this, give some examples, and also sort of organize how we might think about the future software and important problems that we might need to be working on. 17 00:03:42.500 --> 00:03:45.369 Michael Carbin: So what is an uncertain computation? 18 00:03:45.380 --> 00:03:50.390 Michael Carbin: Well, the way I like to explain this these days is by looking at this cartoon diagram of the computing stack. 19 00:03:50.400 --> 00:04:06.929 Michael Carbin: And the way I came to this explanation was, I was teaching a computer systems engineering class a couple years ago now, and I was. We were talking with students. I was talking with students about basically trust. You know, your trustee, What what parts of your system do you trust to be reliable. 20 00:04:06.940 --> 00:04:20.879 Michael Carbin: Perhaps here are free or secure. You know what definitions and we're going through levels of the stack, and we started up with applications. The students said, Oh, applications! Libraries can't trust those I write code. I write out the they have bugs, 21 00:04:20.890 --> 00:04:34.469 Michael Carbin: you know, compiler and runtime system. So a layer lower than that, you know It's not not necessarily so obvious to everyone that there there are bugs. Um that we need to deal with at those systems. But There's a long history of work, you know, to my community the Program Age community broadly, 22 00:04:34.480 --> 00:04:44.089 Michael Carbin: and of course they put it wonderfully by and Nsf. On actually identifying bugs in these types of systems and reading them. How can we actually do secure computation 23 00:04:44.100 --> 00:04:59.889 Michael Carbin: Now, below that operating systems? You know, I think all of us have had the unfortunate experience of dealing with operating system. Bug. These are windows, even back any of these things. But where things got interesting, and I was talking to the students asked about hardware systems, and he said, Oh, no, hardware. That's great, 24 00:04:59.900 --> 00:05:17.829 Michael Carbin: you know. And this was before, you know. Perhaps the specter in meltdown, more security focus that perhaps we're seeing more broadly in the community around hardware security. But these again, were undergraduate students, and they said, Yes, you know if I have some instruction, it's one plus one. Then i'm going to get two as each, and I unfortunately 25 00:05:17.840 --> 00:05:20.030 Michael Carbin: does not always the kiss. 26 00:05:20.040 --> 00:05:36.960 Michael Carbin: And so This game is a shock, and it's a prize, and sometimes it comes to a shock. Is it so? People who don't really think about these issues? But, in fact, you right now you could go on the machine that you're sitting right now and go to Intel's website and find the errata, or whichever processor you know, sitting in your machine right now. Here's one from a few years ago. 27 00:05:36.970 --> 00:05:51.099 Michael Carbin: And what is going to tell you? Actually, all the all, the all the bugs all the errors that may have it while executing programs or software on your system. Now, some of these may be rather benign. So this first one at the top cache on cache performance. Monitoring events need to be accurate. 28 00:05:51.110 --> 00:06:11.530 Michael Carbin: You're doing low-level performance engineering. Maybe this this isn't something that is going to affect your day to day but this one below here at the bottom this this is a little bit more ambitious, so it says, processor may hang or cause unpredictable behavior and really unpredictable system behavior here. But what it means is that your system can return arbitrarily incorrect results at sometimes. 29 00:06:11.540 --> 00:06:19.889 Michael Carbin: And this is not just, you know. In theory it's very, very rare circumstances, but this actually shows up in practice. 30 00:06:19.900 --> 00:06:49.040 Michael Carbin: You're your consumer, and you're really hammering on your machine. You know consumer level really doing extensive computations. You can see these types of errors actually show up in corruptor stations. Now, if you go up another level and think about large massive organizations. They're running supercomputer and data set on their jobs. These types of errors actually are a primary first order concern in the design of these systems, because you can see that even within the course of a single day, you're very likely to see some sort of corruption inside your company 31 00:06:49.050 --> 00:06:50.150 station, 32 00:06:50.200 --> 00:07:10.170 Michael Carbin: and this is because we look at a large distributed system. Wow! The probability that, let's say one processor encounters some sort of issue as it's executed may be very small. If you take a large distributed computation, remember tens of thousands, hundreds of thousands of cores. The probability that one gets something that hash becomes much higher, much, much higher. 33 00:07:10.180 --> 00:07:23.689 Michael Carbin: And we started to see more attention. There's been a long held focus actually in the research community of thinking about these sorts of issues. We're starting to see these again, re-emerge the different flavor and different names and research that we've been seeing coming up the on Facebook over these years. 34 00:07:23.850 --> 00:07:26.889 Michael Carbin: So how do we cope with this? I have a problem 35 00:07:27.030 --> 00:07:34.790 Michael Carbin: practice and the type of errors I want to talk about here are trains dance non-determined scares in the sense that if I have a computation, I run it once, 36 00:07:34.800 --> 00:07:36.710 Michael Carbin: and it encounters an error. 37 00:07:36.780 --> 00:07:44.689 Michael Carbin: If I start over and run that exact same computation again on that same machine with high probability. I'm gonna get the correct answer. 38 00:07:44.700 --> 00:07:49.369 Michael Carbin: So these are non-deterministic These are not permanent error. We're always going to get an incorrect with all. 39 00:07:49.380 --> 00:07:51.159 Michael Carbin: Well, in this instance, 40 00:07:51.170 --> 00:07:54.389 Michael Carbin: classically, what we've seen is people use something like replication, 41 00:07:54.400 --> 00:07:59.910 Michael Carbin: right? So there's been a long history of essentially taking a computation that I might have, and I duplicate it. 42 00:07:59.940 --> 00:08:03.330 Michael Carbin: I ran twice. I check to see if the results agree. 43 00:08:03.340 --> 00:08:15.739 Michael Carbin: The results disagree. Then i'm going to go ahead and run it again. I'll keep running it and running it until I get the right result. Of course, there's some specific restrictions on the types of errors and frequency of errors in which there is a sensible approach. 44 00:08:15.750 --> 00:08:24.440 Michael Carbin: This has been a class approach. It's so a classic that going back in the computing literature you can find that the first time this really came into computing was von Neumann 45 00:08:24.450 --> 00:08:38.140 Michael Carbin: when he was trying to construct reliable computations from these unreliable components, and what was very interesting at that time. As for if you go read this paper, it's actually these unreliable owners look a lot like neural networks, and I guess i'll come back to that. 46 00:08:39.000 --> 00:08:50.109 Michael Carbin: But I think it's It's reminiscent of again these types of concerns that we're doing within model of software where things like neural networks are starting to become part and parcel of the way that we're. 47 00:08:50.410 --> 00:09:00.590 Michael Carbin: So now replication plastic strategy for doing this. But I want to encourage us to think differently about this problem strictly. How might we solve this otherwise? 48 00:09:00.870 --> 00:09:12.039 Michael Carbin: Well, one alternative approach here is what if we just let errors happen right instead of trying to correct errors. Let's just let's just let them have let's just let system process, and what what what could go wrong? 49 00:09:12.420 --> 00:09:22.120 Michael Carbin: Well, let's let's explore this right. I'm sure you know, if your so and said this to you, you probably got in your mind all these horrible ways in which this can go wrong. But I want to show an example. 50 00:09:22.170 --> 00:09:32.140 Michael Carbin: And this example here is a very simple method for solving systems of linear occasions called the the Jacobi method, and you don't need to understand this program at all. 51 00:09:32.150 --> 00:09:44.829 Michael Carbin: But I do want to highlight a few specific features of this. So one. This is an iterative. Algorithm So you can see there's this while loop where this program is going to run until it converges after some number of sense. 52 00:09:44.940 --> 00:09:46.710 Michael Carbin: So that's the first one. 53 00:09:46.720 --> 00:10:03.750 Michael Carbin: The other thing I want to highlight are these operations in red? So these are operations where I want to show you what happens if we inject errors into these operations, and let these errors happen, You let these errors let these operations be corrupted, and let them compute on and go on and see what happens. 54 00:10:04.590 --> 00:10:22.730 Michael Carbin: So this is what happens. So here is an illustration of the Jacobi method executing over some number of iterations. So iterations of x-axis. Why, what we're going to have is the difference between our current. Yes, it's a solution to the system linear equations, and in the actual solution. 55 00:10:22.740 --> 00:10:31.389 Michael Carbin: And so this blue line is, if I just run the computation with no errors. Let me see this nice convergent behavior over time, which is very nice. 56 00:10:31.400 --> 00:10:35.280 Michael Carbin: Now, this red line here is what happened if we let errors happen, 57 00:10:35.520 --> 00:10:39.519 Michael Carbin: And in particular we're going to inject errors here on iteration, ten thousand. 58 00:10:39.620 --> 00:10:44.039 Michael Carbin: And So what we see here is initially, there's quite a significant leap up 59 00:10:44.200 --> 00:10:52.709 Michael Carbin: right in error from where we were before, and in fact, it looks like we've lost nearly five hundred or six thousand iterations of progress 60 00:10:52.770 --> 00:10:54.660 Michael Carbin: by encountering this error. 61 00:10:54.670 --> 00:10:56.100 Michael Carbin: However, 62 00:10:56.110 --> 00:11:06.699 Michael Carbin: if we actually let this computation continue running out, what we see is that we actually add only two point, six percent more iterations until we converge 63 00:11:06.740 --> 00:11:26.090 Michael Carbin: to an equally accurate result. So this is interesting. This is interesting, because now it gives us another strategy for thinking about dealing with these types of errors. Right? So, on one hand, we can do something like replication. And so replication. You might think this is be even horribly expensive, and in some cases it really is. But if you scour through the literature, and they're 64 00:11:26.100 --> 00:11:36.879 Michael Carbin: very clever tricks for compilation and hardware support, you can find that maybe it is down to fifty percent in this, and at least in latency energy. Harvard, All these other things. There are other measure that you may need to keep, in fact, 65 00:11:37.240 --> 00:11:40.890 Michael Carbin: conceptually fifty percent open. And we're thinking about time to execution. 66 00:11:41.100 --> 00:11:43.710 Michael Carbin: Now, approximation. On the other hand, 67 00:11:43.920 --> 00:11:47.610 Michael Carbin: what this gives us is instead two point six percent overhead. 68 00:11:47.780 --> 00:11:59.229 Michael Carbin: But the difference Now, the difference here on the crux is that our intermediate results or our results in general may differ from what we intended them to be by some epsilon. 69 00:12:00.310 --> 00:12:01.989 Michael Carbin: And so this Epsilon is important. 70 00:12:02.000 --> 00:12:16.389 Michael Carbin: So when I think about uncertain computations. What I think about our computations where we've composed the software system, and somewhere along the way, or maybe even at the output, results are being computed to some imprecision or inaccuracy, 71 00:12:16.490 --> 00:12:34.069 Michael Carbin: and classically as computer scientists, we have this very discrete view of the world. We don't necessarily think about these things. Of course, if we talk to our friends in numerical analysis or scientific computing is something they think about over time and over the course of time. What I found through. My research is if we can move towards software systems 72 00:12:34.080 --> 00:12:36.750 Michael Carbin: where this epsilon is more explicit, 73 00:12:36.880 --> 00:12:49.450 Michael Carbin: we get new opportunities. And so we, seeing one right here with approximation. Or if we know about this Epsilon, we can reason about this epsilon. Perhaps we can build new systems that are more efficient and tolerant of things like hardware errors. 74 00:12:49.460 --> 00:12:59.010 Michael Carbin: And I think this is critically important, as we're thinking forward as a community of how computing is going to evolve, because this type of uncertainty is now absolutely everywhere. 75 00:12:59.310 --> 00:13:18.450 Michael Carbin: So it's coming up from below from from our hardware systems. We already see this with things like errors. Um that are happening in our hardware systems. But even more you think about the trajectory of machine learning system. They're moving towards reduced precisionances where we're going to have approximation of numerical computations that we're doing. 76 00:13:18.460 --> 00:13:32.890 Michael Carbin: We're seeing very emerging computations in stochastic or analog or neuromorphic computer, or even quantum review. But there are suddenly these fundamentally probabilistic or uncertain computational substrates that we can use. We need to use, 77 00:13:32.900 --> 00:13:36.490 Michael Carbin: or the trade-off is we can get much more efficient or much more capable systems, 78 00:13:36.500 --> 00:13:40.020 Michael Carbin: but it's unavoidable that we now need to think about the subsequent 79 00:13:40.160 --> 00:13:52.339 Michael Carbin: the other way. This is coming up is from the application, or more specifically thinking about the environment, the fact that we're building these applications that are trying to reach out and model the fundamentally uncertain physical world. 80 00:13:52.570 --> 00:14:12.490 Michael Carbin: So take, for example, this computer vision example here, or something like object detection. The goal of these types of systems is to try to actually identify the objects or the locations and positions and boundary boxes around objects that we have in the stream. And unfortunately, this system only has noisy axis, 81 00:14:12.500 --> 00:14:28.570 Michael Carbin: right to this physical world, right? This environment is only partially observable. The data that we're getting back from sensors is noisy, and the models that we're using to construct these types of bounding boxes in essence are are fundamentally only correct with some some sort of probability, because they're typically approximations 82 00:14:28.740 --> 00:14:40.399 Michael Carbin: and as well moving across to other applications and machine learning, image processing computer vision quantum. Again, we have fundamentally applications themselves, for this Epsilon is now 83 00:14:40.410 --> 00:14:45.570 Michael Carbin: so uncertainty These epsilons are starting to show up everywhere in our computing stack 84 00:14:45.830 --> 00:14:57.490 Michael Carbin: again come from below coming from above. And this is interesting. It's amazing. It's giving us new capabilities. We're able to build software systems that press. We couldn't even build before 85 00:14:57.500 --> 00:15:12.840 Michael Carbin: by thinking about what neural networks are really able to do these days and processing on with certain sensor data that's coming in. We're getting very interesting opportunities for performance, Thinking about again what happened with approximate computing or this approximation opportunity. I talked about a few slides ago, 86 00:15:12.850 --> 00:15:26.420 Michael Carbin: and of course, resilience as well resilience to errors if things go wrong, and they're only epsilon wrong. Can we figure out a way to still work through those types of there. But there's a fundamental research question here which is well, how do we build 87 00:15:26.800 --> 00:15:28.190 Michael Carbin: software this way? 88 00:15:28.200 --> 00:15:48.120 Michael Carbin: How can we productively build software. That has all the nice properties that we think about things like modularity and compositionality, and things that we can reason about in the presence of these Epsilons, because this is not how we formally or traditionally constructed software. It Hasn't been the main focus computer science again, been thinking about deterministic and discrete. 89 00:15:48.660 --> 00:16:07.840 Michael Carbin: So that gives me brings me essentially to a world really the subject of of today's talk. And really what I've been doing over the years of thinking about ways of building software in this new reality from programming methodologies. How do we think about these new emerging software systems that are now mixed between classical code 90 00:16:07.850 --> 00:16:27.250 Michael Carbin: neural networks? And now, with things like Chat, Gbt, Gb. Four. All the other models that are out there, perhaps natural language, that there's really interesting compositions where ultimately, at the end of the day. Even what these computations do. We only know what these some of these computations do up to, sometimes 91 00:16:27.500 --> 00:16:47.199 Michael Carbin: moving beyond that their fundamental questions in semantics, or many of these computations now are real valued, but nondeterministic or probabilistic. As I said before, this is this is not where computer science has spent most of the time right? And instead, we need to fundamentally think about what are the semantics of programs that actually execute with these types of data types, because It is actually fundamentally different 92 00:16:47.740 --> 00:17:02.660 Michael Carbin: beneath that. When you think about how do we represent things like real numbers, non-determinism, probability How do we compute what are efficient representations of these types of data types, so that we can compute on them productively in these types of applications. 93 00:17:02.800 --> 00:17:05.150 Michael Carbin: And then, finally, reason, 94 00:17:05.359 --> 00:17:07.529 Michael Carbin: once I have this software system. 95 00:17:07.540 --> 00:17:22.060 Michael Carbin: How do I actually now reason about what sort of guarantees that I can get from this? And I actually prove that with reasonable probabilities can satisfy some specification that I have in my mind. Or perhaps isn't going to be epsilon close to the specification I have. 96 00:17:22.500 --> 00:17:34.199 Michael Carbin: So today, what I want to do is I want to talk to you, maybe just a couple of these. But I'm going to motivate this. You know on how I've been seeing this. I've gone about looking at this problem from a very specific application 97 00:17:34.210 --> 00:17:54.580 Michael Carbin: of thinking about how do we build software sort of in this domain? And let's say these programming methodologies that are very rich and expressive in the sense that we get networks and perhaps other specific techniques really took them to solve classic problem, and it's one of my great problems in in in compilation. How can we build new compilers that incorporate learning? 98 00:17:54.590 --> 00:17:55.890 Michael Carbin: Ah! To tackle 99 00:17:55.910 --> 00:18:13.730 Michael Carbin: the fact that now our underlying uncertain. Our underlying hardware systems are actually very uncertain when we need very expressive models on rich, learned models for behavior. So i'm going to do that, and then use that as an example, really to concretize a few of the opportunities that I see in this space. 100 00:18:14.320 --> 00:18:15.370 Michael Carbin: So 101 00:18:15.480 --> 00:18:22.340 Michael Carbin: the core a compiler. So this is a an incredibly simplified perspective from. I know we have a broad audience here 102 00:18:22.350 --> 00:18:42.049 Michael Carbin: for some of you. You spend time hacking on compiler. I try to write a pilot every year. It's sort of my therapy. We're all right. But for many of you this is perhaps a reasonably simple diagram in the sense that I have source, code, c. Plus c. Java, something coming in on one end into the system, and then on the other side. I'm going to get machine code, 103 00:18:42.060 --> 00:18:58.899 Michael Carbin: and of course they shouldn't be broken out into finer and finer granularities. If you actually know how these systems are composed, which is oftentimes the front end, going from source code to an intermediate representation, and then from that intermediate representation, then down to the machine code itself. 104 00:18:59.150 --> 00:19:06.889 Michael Carbin: Now, where I want to focus. My attention is mostly on this this back end, where we have this rich intermediate representation that represents the program. 105 00:19:06.900 --> 00:19:09.279 Michael Carbin: And we're figuring out, How do we take that. 106 00:19:09.460 --> 00:19:11.419 Michael Carbin: I really map it onto the machine. 107 00:19:11.720 --> 00:19:16.630 Michael Carbin: And so this problem, this optimization problem we can think of 108 00:19:16.760 --> 00:19:19.790 Michael Carbin: as follows: So here is a simple example 109 00:19:19.800 --> 00:19:35.999 Michael Carbin: where I have this code here in the middle, where I've got three little instructions and my goal here. One particular goal here that you can have in a back end is, we want to figure out what's the optimal ordering of these instructions so that I can get the best performance on the underlying. 110 00:19:36.130 --> 00:19:53.290 Michael Carbin: Now. So this this border, what is an overall? It's going to be some permutation of the instructions that I have, and this is going to give us essentially a transformation space. So the in space of valid transformations are going to have of this particular code, and we can represent that here as this as this dag that i'm showing here. 111 00:19:53.300 --> 00:19:56.520 Michael Carbin: But what this is representing is the data dependencies. 112 00:19:56.530 --> 00:20:07.849 Michael Carbin: Obviously I can't compute, C until I've computed A and B. So A and B. Need to proceed, c. In any particular ordering that I have of this computation, that I'm. Ultimately going to generate 113 00:20:08.030 --> 00:20:15.840 Michael Carbin: So any topological sort through this graph is actually be a valid ordering thing in have of a conversation. 114 00:20:15.970 --> 00:20:18.030 Michael Carbin: So that's my transformation space. 115 00:20:18.570 --> 00:20:31.959 Michael Carbin: The next total component here is I have something that's called a performance model. So I want to take any permutation of these instructions, and then be able to predict what the underlying performance is going to be on the actual machine, 116 00:20:31.970 --> 00:20:55.379 Michael Carbin: and oftentimes what I want to be able to protect is because I want to be able to explore broadly in this transformation space the transcription. Space may be much larger, just the two options that we see here. Rather it could be thousands, more than thousands exponentially, many even in the size of the program. And with that in mind we want efficient ways to be able to search this rather than actually trying to run perhaps a potentially expensive competition on an online machine, 117 00:20:55.390 --> 00:20:59.889 Michael Carbin: right? So our performance model is here to actually help us out. So, for example, here's some 118 00:20:59.900 --> 00:21:06.620 Michael Carbin: examples where a performance model may say, Okay, the cost of this one is going to be three. There's first one that costs the second one. It's going to be two. 119 00:21:06.860 --> 00:21:10.360 So now, with that performance, model in hand. Now we can search. 120 00:21:10.380 --> 00:21:22.190 Michael Carbin: There's a methodology for trying to search efficiently. Search these types of spaces, and this particular instance we're going to go ahead and choose this one with cost two which we think is going to be more efficient. So this compiler back in this is 121 00:21:22.200 --> 00:21:29.700 Michael Carbin: in some sense the basic strategy that we have. We're now a critical component for getting to this work is is the underlying performance model. 122 00:21:30.060 --> 00:21:38.379 Michael Carbin: How do we understand how to model what the machine is doing beneath us, so that our compiler can make effective search decisions and how to generate. 123 00:21:38.890 --> 00:21:42.309 Michael Carbin: Now, this performs modeling problems 124 00:21:42.320 --> 00:21:55.079 Michael Carbin: always been a very tricky problem. It's becoming even even more tricky in in modern computing, because not only are we thinking about this, our traditional classic processor, we're starting to get these very heterogeneous. 125 00:21:55.170 --> 00:21:57.390 Michael Carbin: They are spanning many different chips, 126 00:21:57.400 --> 00:22:14.879 Michael Carbin: very divergent styles. Some are very efficient in particular dimensions of particular computations; some really want to use for very expensive computations, you know, such as machine learning, and we really want to be able to map essentially an application in parts of that application down 127 00:22:15.090 --> 00:22:25.889 Michael Carbin: to each of the individual chips that I may have available inside of my system. And so this is the sort of heterogeneous mapping problem is is a heart problem, and it's become increasingly hard. 128 00:22:26.370 --> 00:22:45.460 Michael Carbin: And so in the systems communities there is just this fundamentally hard systems challenge that many people in the community at this point are trying to have, which is, we want to take our program. We have some sort of hardware specification of all the components that we may have the sea of components that we may have down here at the bottom, that we may be targeting, and we need a performance model, 129 00:22:45.470 --> 00:22:53.390 Michael Carbin: and our compiler is supposed to. Then, you know, magically come in, navigate this space and generate code that appropriately targets the machines below. 130 00:22:53.720 --> 00:22:56.599 Michael Carbin: Now this is becoming increasingly harder, 131 00:22:56.840 --> 00:23:03.159 Michael Carbin: partly because what have I talked about before in the sense that each of these components, each of these 132 00:23:03.780 --> 00:23:21.969 Michael Carbin: chips, each of these machine components, may have a different behavioral model. You know. Again, we're moving away from these traditional models of computation where I have single input going into my process, learning in a single input that's coming out. And instead, I need to think about emerging models of computation 133 00:23:21.980 --> 00:23:25.589 Michael Carbin: or single inputs may now be turning into distributions of inputs 134 00:23:25.600 --> 00:23:37.089 Michael Carbin: and individual devices may have very different distributions of inputs. And so now, if I really want to think about their emerging systems challenge, I need to broaden that to now include this behavioral. 135 00:23:37.420 --> 00:23:39.989 Michael Carbin: And so given the complexity of these, 136 00:23:40.000 --> 00:23:44.599 Michael Carbin: and the fact that really the only effective way that we can measure 137 00:23:44.780 --> 00:23:52.269 Michael Carbin: the true behavior and performances of machines, as often through observations as refugees by running things and observations. This, in essence is 138 00:23:52.310 --> 00:24:03.419 Michael Carbin: has for a long time been viewed as a learning problem, and is increasingly so getting viewed as a learning problem in something where we're starting to see very effective techniques for applying. 139 00:24:04.780 --> 00:24:09.319 So what I want to talk about today is a little bit of our work that we've been done on the performance modeling side 140 00:24:09.330 --> 00:24:16.910 Michael Carbin: of using machine learning to build these performance models right and again, with an eye for concretizing exactly the 141 00:24:16.920 --> 00:24:33.280 Michael Carbin: how these sorts of exotic software systems are starting to be developed again, being motivated by the fact that well learning may be one of the most effective ways to capture the underlying uncertainty, the exact behavior of the underlying machine. And therefore, if we incorporate learning, we can get a more effective 142 00:24:33.820 --> 00:24:35.090 Michael Carbin: so performance one. 143 00:24:35.100 --> 00:24:46.790 Michael Carbin: So we already got a little peek preview of this before a few slides ago, and as I mentioned. This is critical for compilation, but also other areas of forms, engineering and debugging. But essentially what this is going to take is 144 00:24:46.910 --> 00:25:03.949 Michael Carbin: some piece of code. Let's say X, eighty, six basic blocks. This is the domain in which we've actually validated some of our results, and we can throw it into a performance model, and it's going to give us a number of cost. And here, in this case, what we are looking at is throughput, so throughput 145 00:25:03.960 --> 00:25:10.669 Michael Carbin: of sequence with structures is essentially, or to think about taking this code and just unrolling it all the way to 146 00:25:10.680 --> 00:25:15.519 Michael Carbin: right and then processing it. It is in the steady state of executing that. 147 00:25:15.530 --> 00:25:27.360 Michael Carbin: How many cycles does it take to execute essentially that basic block, an iteration of that basic bar? And again, thinking about the steady state unrolled and averaged off across, perhaps an infinite number of iterations, 148 00:25:27.450 --> 00:25:34.280 Michael Carbin: and you know the lower. This cost the fewer cycles it takes the better or the faster I can execute that code. So 149 00:25:34.670 --> 00:25:52.910 Michael Carbin: now this problem is hard and alluded to. It's hard, and I want to give you some more evidence that it's hard to go through some examples here. But the reason it's hard is because this is a very complicated device, a system that we're trying to characterize. So if you look at your modern microprocessor, what you're going to see is 150 00:25:52.920 --> 00:26:07.779 Michael Carbin: is a very sophisticated type of system, and if you're building a performance molecule, an accurate performance model. You need to figure out or come up with some effective model of many of the components of this system where instructions are coming in in this front end. 151 00:26:07.790 --> 00:26:27.289 Michael Carbin: They're being fetched. They're being decoded. They're being scheduled. They're being actually executed. There's also, of course, a whole memory subsystem that has to be dealt with. And if you actually, in principle want to model this, you need to come up with essentially some model of the behavior of all these individual components, how they work, how they fit together, and of course, the appropriate cost 152 00:26:27.300 --> 00:26:29.810 Michael Carbin: associated with each one of these. 153 00:26:29.820 --> 00:26:33.789 Michael Carbin: Now, what makes this more challenging is while we had these sort of cartoon, 154 00:26:33.800 --> 00:26:36.389 and there's some fidelity that we have in these diagrams. 155 00:26:36.400 --> 00:26:41.989 Michael Carbin: The reality is that the true access that you're going to have to. One of these machines is going to look like this 156 00:26:42.000 --> 00:26:44.030 Michael Carbin: right? And that's because 157 00:26:44.050 --> 00:26:45.790 Michael Carbin: we don't have a full specification. 158 00:26:45.800 --> 00:26:47.470 We do have some help. 159 00:26:48.010 --> 00:27:03.280 Michael Carbin: So the Intel. And so when I give this part of a talk in in person, there's usually someone that sees the Intel manual slides and sort of puts their head down and starts crying, weeping, because they've spent endless hours reading through this documentation with trying to figure out what exactly your process is coming, 160 00:27:03.380 --> 00:27:09.159 Michael Carbin: but it's it's a beast, it's, it's, it's a beast. It documents many, many aspects 161 00:27:09.170 --> 00:27:21.390 Michael Carbin: of of the architecture, interaction, specification of optimization, cpu parameters things that can help you towards the end of building, if not a mental performance model, an actual physical performance model. 162 00:27:21.400 --> 00:27:25.790 Michael Carbin: And again, this is this is filled. So it's filled with things like these sorts of instruction tables. 163 00:27:25.800 --> 00:27:28.130 Michael Carbin: But again, this is not 164 00:27:30.000 --> 00:27:31.020 Michael Carbin: it's. 165 00:27:31.030 --> 00:27:32.550 Michael Carbin: Now, if you actually looked at 166 00:27:32.560 --> 00:27:48.090 Michael Carbin: these instruction tables and you go through what you start to see or are footnotes like this. So these instruction tables are telling you various aspects of the latency throughput of instructions on on a given machine. But if you start to dig in here you'll see comments like this: All tables in this appendix. 167 00:27:48.150 --> 00:27:49.390 Michael Carbin: Our estimates 168 00:27:49.400 --> 00:28:02.919 Michael Carbin: right. Actual performance of these instructions right? Can range from somewhat faster to significantly faster. So what it means as though you're getting again a peak. You're not getting a full peak, 169 00:28:02.930 --> 00:28:14.119 Michael Carbin: and partly because this is a very hard problem. So very. These are very complicated systems, and two Intel's credits are trying to do a very reasonable job of trying to expose some sort of specification that people can work with. But 170 00:28:14.360 --> 00:28:19.589 Michael Carbin: this abstraction, this single instruction, abstraction with these latencies and throughput is is the 171 00:28:19.600 --> 00:28:24.650 Michael Carbin: complete abstraction. It's and it's very hard to try to deliver this this abstraction. 172 00:28:25.610 --> 00:28:44.610 Michael Carbin: So what this means is if someone's sitting down trying to build this performance model. Oftentimes, of course they'll be consulting these tables. But within the full complexity of this process, what they'll do is they'll start to make some kitchen, especially when you're thinking about compilation rather than you know other areas of research when you need higher for that. 173 00:28:44.620 --> 00:28:46.450 Michael Carbin: And so what they do is they're abstract. 174 00:28:47.130 --> 00:28:56.830 Michael Carbin: They'll choose a smaller subset of these components make some assumptions about their behaviors, and then set some parameters based on what they found, and they're going to get this. 175 00:28:57.180 --> 00:29:07.529 Michael Carbin: And so Now your modern performance model, at least, is represented in something like your modern compiler. Llvm. Is going to look something like this very simplified version of the reality that we have 176 00:29:07.700 --> 00:29:11.470 Michael Carbin: now is an example here just to see a hard. The citizen 177 00:29:11.610 --> 00:29:14.689 Michael Carbin: i'm giving you a test. No, I don't expect i'm an actor. 178 00:29:14.700 --> 00:29:22.659 Michael Carbin: We're gonna walk through this one. So here's an example, and that is how many cycles does it take to compute one hundred iterations of this following? 179 00:29:23.060 --> 00:29:38.189 Michael Carbin: Now I don't expect you to understand instruction, but we'll go ahead with the Intel Manual, and we'll go ahead and decode this instruction. So I go and look at this page and the Intel Manual. What i'll find is Aha! This is actually it's it's a Vector instruction. But what it's doing is essentially a bit-wise logical Xor 180 00:29:38.250 --> 00:29:40.300 Michael Carbin: of these registers, 181 00:29:40.720 --> 00:29:49.190 Michael Carbin: so I can rewrite this. And now I get something that's maybe a little bit more interpretable to your your average computer scientist, or I've got some register r one, and i'm going to 182 00:29:49.200 --> 00:29:52.199 Michael Carbin: store into it the value of R. One X over two, 183 00:29:54.270 --> 00:30:07.640 Michael Carbin: and now I can start to try to figure out a performance model. It can go to another part of the Intel Manual and look up the latency and the throughput for this instruction again similar to those tables that we saw a few slides ago. 184 00:30:07.670 --> 00:30:20.399 Michael Carbin: So if I go and look here, what I can find is that destruction has a latency of one. So what that means is it's going to take one cycle. The abstraction is, Make one cycle, the processor to execute this instruction, and then it has a throughput 185 00:30:20.540 --> 00:30:25.279 Michael Carbin: point three three. You are really one over three. And the way to think about this is 186 00:30:25.290 --> 00:30:26.870 Michael Carbin: it is a processor 187 00:30:26.910 --> 00:30:35.190 Michael Carbin: can execute three concurrent instances of this instruction at the same time, provided there are no data dependencies between that. So an excuse 188 00:30:35.220 --> 00:30:36.890 Michael Carbin: breathe is the same time, 189 00:30:37.060 --> 00:30:39.020 Michael Carbin: and he's a numbers before. 190 00:30:39.280 --> 00:30:46.790 Michael Carbin: So now, with this information, we can take a first crack of figuring out. What is it? How many cycles will it take to compute one hundred iterations. 191 00:30:46.810 --> 00:30:48.230 Michael Carbin: I Oh, this instruction! 192 00:30:49.490 --> 00:30:51.030 Michael Carbin: It's! 193 00:30:57.130 --> 00:31:00.829 Michael Carbin: There are a wider idea of different answers when I better do this in person, 194 00:31:00.840 --> 00:31:04.339 Michael Carbin: some all of them actually pretty good, because this is, this is a weird part. 195 00:31:05.080 --> 00:31:18.180 Michael Carbin: But so the first one and very reasonable one here is actually the answer that you would get from Llvm's performance model, as of several years ago. When we were doing this work, and then, after they saw our work, they they changed performance. But 196 00:31:18.410 --> 00:31:24.980 Michael Carbin: this is the answer, they get one hundred cycles. And the reason why you would think that is because, while I've got this instruction here, 197 00:31:25.210 --> 00:31:30.390 Michael Carbin: it takes one cycle, and, Moreover, this instruction reads from R One: The 198 00:31:30.400 --> 00:31:35.190 Michael Carbin: then writes to our right: I can't begin executing my next instruction 199 00:31:35.930 --> 00:31:43.430 Michael Carbin: unless I've figured, or unless I've completed executing the previous instruction if they're reading and writing from that that same data point. 200 00:31:43.440 --> 00:31:51.290 Michael Carbin: So this is. This is a very reasonable thing to start with, just by looking at the data dependencies, right? So our one is going to flow in R one. So 201 00:31:51.410 --> 00:31:57.849 Michael Carbin: of course, this is going to take one hundred cycles, because each instruction takes one cycle to execute, and this data depends between. 202 00:32:00.540 --> 00:32:19.970 Michael Carbin: Now There, there's a trick here, right? So if you, if you look at this a little bit more carefully, or and perhaps you're familiar with your bit twilling and bit vector arithmetic. What you'll see is that Well, this is actually this: is this Isn't: Really, this instruction, this is really this instruction. I'm: setting R: one to zero. 203 00:32:19.980 --> 00:32:20.989 Michael Carbin: You're like, Okay, 204 00:32:21.000 --> 00:32:32.489 Michael Carbin: that's interesting, like, Why, this is a cool optimization like um. I feel like my compiler should generate that. Well, not only might a compiler try to generate that your processor actually I dynamically identify this 205 00:32:32.500 --> 00:32:37.489 Michael Carbin: rewrite the instructions. So if you go and actually point around through the Intel, you 206 00:32:37.500 --> 00:32:53.210 Michael Carbin: and you'll find that there are zero idioms. Well, the processor itself has been manually optimized to find these sorts of zero idioms. And if you go and find this list, you'll see our instruction that we have. That's our mnemonic that we were looking with before we we wrote it to be simpler, 207 00:32:53.220 --> 00:33:04.540 Michael Carbin: and what we'll find is the processor finds all instances of these zero idioms, and they get removed, and they have no execution, and so great. Suddenly our latency goes to zero, 208 00:33:04.890 --> 00:33:11.890 Michael Carbin: and now we can, instead start thinking about throughput right. There's no dated prints here and throughput here said We can execute three at a time. 209 00:33:12.250 --> 00:33:15.999 Michael Carbin: So what that means is now. Well, maybe this is going to be thirty three cycles. 210 00:33:16.770 --> 00:33:17.990 Michael Carbin: That's it bigger. 211 00:33:19.050 --> 00:33:20.060 Michael Carbin: But 212 00:33:20.210 --> 00:33:38.269 Michael Carbin: you do some more poking, and you actually go and measure this, which you end up with this twenty, five cycles, or if you use Intel's proprietary performance, model performance analysis tool by Aaka that they had released at the time. You see this twenty five cycles, and then you start to wonder what's what's going on here. 213 00:33:38.280 --> 00:33:53.759 Michael Carbin: And, in fact, what eventually ends up happening is once you go through this renaming process, You're no longer even bottlenecked by these instruction specifications, and instead, you become bottleneck by the front end, which can actually do four instructions for cycles. And so you get an entirely different 214 00:33:53.790 --> 00:33:55.730 Michael Carbin: ultimately when she published this. 215 00:33:55.920 --> 00:33:58.120 Michael Carbin: So this is this is hard. 216 00:33:58.650 --> 00:34:00.539 Michael Carbin: This is hard, because 217 00:34:00.940 --> 00:34:04.000 Michael Carbin: how do we go from what Intel gave us? 218 00:34:04.250 --> 00:34:17.889 Michael Carbin: It's this very reasonable instruction specification which you know an actual production compiler was using, because this was the process that they use to the actual measured performance or the proprietary performance. When Intel has worked in all the optimizations and smarts, 219 00:34:17.900 --> 00:34:22.680 Michael Carbin: how how can we cobble together all this documentation, our understanding and measurements, to get there? 220 00:34:22.730 --> 00:34:28.480 Michael Carbin: Well, in practice. What people do is they do this. They will build their simplified performance, model the 221 00:34:28.630 --> 00:34:34.599 Michael Carbin: they will stack that against their actual processor They'll get a data set of examples. 222 00:34:34.850 --> 00:34:36.720 Michael Carbin: I'll pass that through. Both. 223 00:34:36.929 --> 00:34:39.950 Michael Carbin: Compare the difference. See where they're wrong. 224 00:34:41.560 --> 00:34:47.969 Michael Carbin: This by and large. If you go and look through the development process of many of these certainly open source 225 00:34:48.230 --> 00:34:51.789 Michael Carbin: performance models. You'll see this iter of refinement process. 226 00:34:51.800 --> 00:34:59.349 Michael Carbin: And so in the process. When we were doing this research at the time. We sort of looked at this and said, This looks an awful lot like machine learning 227 00:34:59.960 --> 00:35:11.500 Michael Carbin: good data set. I've got labels. There's some sort of model. I'm trying to tune. I can look at the difference, and i'm refining my model. And so why Don't, we just go ahead and replace that with the neural network. Let's let's just see what happens. 228 00:35:11.510 --> 00:35:30.189 Michael Carbin: And so on. Some work called Iphone. All this is exactly what we did, right where we came up with the neuron at work. We built a data set, a very extensive data set at the time that had some specific invariance designed to capture and variance that we have for these performance models. Things like no cache misses preemptions, 229 00:35:30.200 --> 00:35:38.600 Michael Carbin: very simplified, setting not the full month thing, but this is, in fact, how systems like Llvm's Yay, they actually do model themselves. 230 00:35:38.660 --> 00:35:41.259 Michael Carbin: And then, of course, we get a comprehensive dataset. 231 00:35:41.270 --> 00:35:52.160 Michael Carbin: And so we went and ran this, and we found that we can actually using this technique Just moral neural networks build a system that had half of the error of Llvm in into as Aka. No, at that time. 232 00:35:52.170 --> 00:36:05.229 Michael Carbin: And so this this was great. This was an amazing success. I think it was one of the first papers really out there, at least in the modern era to be able to demonstrate that we could scale up this type of deep learning to actually target this type of problem. 233 00:36:05.340 --> 00:36:18.979 Michael Carbin: And so I them all. There are some details to its architecture. I don't really want to focus on those too much. There's some novelties, but I think the most important part here is really that it's entirely an unrepresentation 234 00:36:18.990 --> 00:36:31.099 Michael Carbin: so unlike previous work and incorporating machine learning into these types of recording volunteering situations. What I from all did is, it takes text, tokenized text and predicts a number. 235 00:36:31.630 --> 00:36:36.790 Michael Carbin: There's no featureization of the code. We don't need to extract related penses. All these sorts of things. 236 00:36:36.800 --> 00:36:43.359 Michael Carbin: Nothing right. Learned representations has has been the interest and fair of, Let's say, the past, 237 00:36:43.510 --> 00:36:55.340 Michael Carbin: you know ten years of modern deep learning, which is that we can move away from the explicit futurization of our data and have these incredibly capable systems come in and learn purely from inputs or text 238 00:36:55.390 --> 00:36:57.380 Michael Carbin: to get a very effective result. 239 00:36:58.160 --> 00:37:00.609 Michael Carbin: And so, if you look at it, if you come back, 240 00:37:00.750 --> 00:37:17.989 Michael Carbin: this example that we have, if Mall ends up getting pretty close to this measured version. So it actually is able to figure out this optimization that is happening beneath the scenes by the processor that's been detailed truly from date purely from from yesterday. 241 00:37:19.380 --> 00:37:24.370 Michael Carbin: Now, so I them. All this has been great. It's taking on our standard methodology. We 242 00:37:25.180 --> 00:37:27.149 Michael Carbin: replaced it with a neural network. 243 00:37:27.160 --> 00:37:27.950 Bye, 244 00:37:28.150 --> 00:37:30.189 Michael Carbin: but not all is perfect 245 00:37:30.200 --> 00:37:46.519 Michael Carbin: right? Because one benefit of your traditional performance model is, you can in principle understand what's happening. These These performance models oftentimes when they're written, they sort of have a trace to some sort of explanation. There's a There's a simulation 246 00:37:46.530 --> 00:38:02.269 Michael Carbin: that you can read off that models individual stages of the processor that gives you a hypothesis on how the system believes the underlying processor is actually behaving. And so this can be very helpful for things like performance to. I really trying to figure out. How do I optimize my system, change my code 247 00:38:02.400 --> 00:38:06.080 Michael Carbin: based on the observations that i'm staying from the simulation. 248 00:38:06.540 --> 00:38:11.589 Michael Carbin: So there's lots of interpretability motivated us to think a little bit differently here, 249 00:38:11.600 --> 00:38:20.020 Michael Carbin: once we had I them all, which was, you know. Can we maybe try to combine the best of both worlds, where, instead of entirely throwing out the 250 00:38:20.710 --> 00:38:25.039 Michael Carbin: our analytical model and replacing with the neural network, maybe what we can do is 251 00:38:25.340 --> 00:38:29.249 Michael Carbin: a combining both and more of a neural symbolic approach. 252 00:38:30.360 --> 00:38:45.589 Michael Carbin: And so in the next line of work called Diff tune, we did exactly that, and the way that we started out is, we said, Okay, let's look at this llvm. Um um Mca. Simul that we have, which had a very detailed simulation pipeline, 253 00:38:45.600 --> 00:38:47.419 Michael Carbin: but unfortunately 254 00:38:47.580 --> 00:38:51.010 Michael Carbin: had some wrong intuition on what was actually going on. 255 00:38:51.460 --> 00:39:10.119 Michael Carbin: In particular. It's parametrized. It's parametrized by these sorts of numbers that we're seeing down here at the bottom, and it has its own hypothesis on what these numbers are, and we said, could we use machine learning to delivery, figure out better sets of parameters to then plug into Llvm: Yeah. To, therefore, get better accuracy, but still 256 00:39:10.460 --> 00:39:14.829 Michael Carbin: be able to interpret essentially a trace of the behavior of the underlying system. 257 00:39:16.070 --> 00:39:26.009 Michael Carbin: So we dug into a l of themca, and, like I said, it is a It's a fairly detailed simulation again. This isn't going to be, you know, architecture or level research 258 00:39:26.020 --> 00:39:36.899 Michael Carbin: psychoactive assimilation. But it's reasonable enough, as is used inside of a compiler to be able to validate various scheduling-based optimizations that are going on inside of 259 00:39:37.710 --> 00:39:46.390 Michael Carbin: now. L. O. Mmca is is not small. It's not super massive, but not small. It's about ten thousand ten thousand lines a Cpos, 260 00:39:46.400 --> 00:39:52.089 Michael Carbin: but where it does become big, is. It has about eleven thousand integer-valued parameters 261 00:39:52.100 --> 00:39:54.440 Michael Carbin: that determine its behavior. 262 00:39:54.990 --> 00:40:06.419 Michael Carbin: And so this this is the challenge and these parameters come from it trying to plug in the pieces of this model that it has right? So does the dispatcher 263 00:40:06.590 --> 00:40:18.119 Michael Carbin: right, which essentially is a very simplified model of of a front-end instructions coming in. At what rate did they get depoted and dispatched up to the rest of the processing pipeline, and was single prime. 264 00:40:18.420 --> 00:40:21.340 Michael Carbin: How many instructions can be dispatched per cycle. 265 00:40:22.540 --> 00:40:28.460 Michael Carbin: There's also reorder buff, right? So it's also modeling out of order execution and the 266 00:40:28.640 --> 00:40:32.590 Michael Carbin: the size of the reorder buffer. Right. There's some number of instructions that it has. 267 00:40:32.680 --> 00:40:49.830 Michael Carbin: But where things start to get more complicated is that it starts to have per instruction Parameters such as these right latency and meat advice and read advanced cycles, which is, if I have instruction, and, for example, it has a right. How soon can that 268 00:40:49.840 --> 00:40:56.550 Michael Carbin: computation be made available to a subsequent instruction before it actually really needs to be fully retired 269 00:40:56.690 --> 00:41:13.620 Michael Carbin: and as well. There are execution units with things like Port Mac, so I don't expect everyone really to understand all the the details in here. But the real primary point here is that every single one of these individual components we have some manner of parameters, some of them global, some of them being very specific to the individual instructions that we have, 270 00:41:13.700 --> 00:41:18.079 Michael Carbin: we're all untold together. We end up with eleven thousand parameters 271 00:41:18.090 --> 00:41:19.080 Michael Carbin: where 272 00:41:19.110 --> 00:41:28.689 Michael Carbin: the low Vmca. Developers. Unfortunately, the way they do this is how we've talked about spending a lot of time looking at design documents, and also doing some measurement and refining. 273 00:41:29.200 --> 00:41:31.240 Michael Carbin: So with our system d tune. 274 00:41:31.310 --> 00:41:37.379 Michael Carbin: When we wanted to apply learning, we thought about this really as an optimization problem 275 00:41:37.660 --> 00:41:40.789 Michael Carbin: where the way we can think about this is, 276 00:41:40.800 --> 00:41:51.240 Michael Carbin: I have some simulator, right? So F is my simulator here, right? It has some set of parameters data, and on a given input X. It's going to produce 277 00:41:51.450 --> 00:41:56.849 Michael Carbin: protection, right? Some some model the performance of that particular input meaning a basic block, 278 00:41:57.100 --> 00:41:58.990 Michael Carbin: all right. How many cycles it's going to take. 279 00:41:59.340 --> 00:42:04.089 Michael Carbin: Now, the optimization problem is Well, if I have a data set right, 280 00:42:04.100 --> 00:42:16.789 Michael Carbin: can I essentially minimize or find the best set of parameters that minimize the discrepancy between this prediction that I have, and a true measured ground-truth prediction that I have for the data. Set 281 00:42:16.800 --> 00:42:18.380 So nothing Fancy here, 282 00:42:18.390 --> 00:42:24.850 Michael Carbin: right? This is just optimization problem, specification of thinking about. How can we 283 00:42:25.110 --> 00:42:35.589 Michael Carbin: optimize the simulator or it's indexed again by about these eleven thousand parameters, which together give us an absolutely massive number of possible configurations. 284 00:42:35.600 --> 00:42:54.159 Michael Carbin: So this problem, as it's posed is unfortunately horribly interesting. Um, using search or stochastic search as we validated by using other tools available in the techniques, such as open tuner um available in the community, such as open tuner, These existing tools that we have just simply do not scale 285 00:42:54.170 --> 00:42:55.870 Michael Carbin: to this type of problem. 286 00:42:56.350 --> 00:43:01.739 Michael Carbin: So what we said is perhaps we can build a surrogate optimization problem. 287 00:43:01.900 --> 00:43:08.090 Michael Carbin: So the key thing here is that this is exactly the same optimization problem except you have an F hat here. Now, 288 00:43:08.260 --> 00:43:11.250 Michael Carbin: where the idea is, F hat is going to be a circuit. 289 00:43:11.720 --> 00:43:16.879 Michael Carbin: It's going to be a neural network that approximates the behavior of the simulator. 290 00:43:16.960 --> 00:43:26.320 Michael Carbin: No conditioned on a center parameters right. So given some set of parameters. Theta This surrogate is going to intuit what our simulator would do 291 00:43:26.490 --> 00:43:29.049 Michael Carbin: given those parameters. Theta 292 00:43:29.210 --> 00:43:39.290 Michael Carbin: And the reason why we did this, and with this approach is because now it's going to allow us to do gradient-based optimization. What we can do is now take gradients with respect to these parameters. 293 00:43:39.840 --> 00:43:45.900 Michael Carbin: Some judges gradient descent to try to figure out if we can find effective parameters. They're going to minimize 294 00:43:46.030 --> 00:43:48.070 Michael Carbin: this this problem that we have. 295 00:43:48.470 --> 00:43:52.690 Michael Carbin: And just to put this on pictorially what we're trying to do here 296 00:43:52.700 --> 00:43:55.520 Michael Carbin: is this is sort of a visualization of 297 00:43:55.650 --> 00:44:06.130 Michael Carbin: in an instruction that we're going to have inside of our of our architecture, where there's a parameter that we can control, particularly the dispatch with 298 00:44:06.660 --> 00:44:19.930 Michael Carbin: right. And this parameter of llamca, we can look at the timing or the latency of this particular, we look at the throughput of this particular instruction as a function of this parameter, 299 00:44:21.060 --> 00:44:29.899 Michael Carbin: and so what we can see is Lvmca. What it's going to do is for each individual discrete point of this parameter. It will have a prediction. 300 00:44:30.170 --> 00:44:41.929 Michael Carbin: Now, again, the problem is, this is going to be essentially a discrete optimization, and even using Zeroth order methods, we are unable to find effective ways to navigate to space. But if we build a surrogate, 301 00:44:42.340 --> 00:44:50.580 Michael Carbin: What the Surrogate can do is now give us a smooth interpolation once trained between all of these points, 302 00:44:50.590 --> 00:45:05.989 Michael Carbin: and so now, what it allows us to do is, try to find and navigate through this curve to identify the exact timing right where this is the ground truth timing that we're going to have for this instruction, and this is going to intersect sleep between here and here, 303 00:45:06.000 --> 00:45:12.430 Michael Carbin: and this is going to give us some understanding of what this parameter should be, which we can now find. Your gradient descent 304 00:45:16.360 --> 00:45:18.419 Michael Carbin: is our surrogate optimization problem. 305 00:45:18.540 --> 00:45:24.719 We're gonna have this f hat instead of our traditional simulator, our discrete simulator, which is otherwise 306 00:45:24.730 --> 00:45:42.799 Michael Carbin: not differentiable. This is a highly discrete, highly discrete computation that has not only discrete integers that it's operating it, but very many discrete data structures that it's using to organize the simulation. So this neural network gives us now a very differentiable surrogate of this otherwise discrete contradiction. 307 00:45:43.670 --> 00:46:02.140 Michael Carbin: Now, where's the surrogate? Come from. Well, what we did is we use? I am all so. We used our previous approach. If the Mall changed it ever so slightly, so it can be conditioned on a set of parameters. We used a variant, an iphomol architecture to then mimic the behavior of our surrogate. So now this is going to introduce a surrogate learning problem. 308 00:46:02.150 --> 00:46:04.580 Michael Carbin: Our goal here is now to find a hat 309 00:46:04.770 --> 00:46:13.389 Michael Carbin: again over this data set that we're going to have. This is a data set we're using for learning slightly different, enumerating over this data set over the 310 00:46:13.400 --> 00:46:18.870 Michael Carbin: parameters and also examples. What we want to do is minimize the difference between our surrogate here 311 00:46:19.770 --> 00:46:21.219 Michael Carbin: it are actual, simply 312 00:46:21.280 --> 00:46:25.050 Michael Carbin: so. We are training our surrogate to exactly mimic 313 00:46:25.090 --> 00:46:31.889 Michael Carbin: and the best that it can. The behavior of our simply so over a wide variety of different parameter settings. 314 00:46:32.090 --> 00:46:33.709 Michael Carbin: So the result here 315 00:46:34.210 --> 00:46:54.139 Michael Carbin: are, or actually quite great, so compared to open tuner. Again, using essentially at that time the state of the art that we had for doing Zeroth order methods. It uses a wide portfolio of random search techniques and zero other methods to to try to find solutions to tuning problems for for systems 316 00:46:54.320 --> 00:47:12.129 Michael Carbin: and so on. The left. Here. What we've got is the mean, absolute percentage error. So lower is going to be better here and here. This is me validated on various different micro-structures that we're going to have from intel and also eighteen. And so what we see is open intruder by far unsuccessful in this problem. 317 00:47:12.140 --> 00:47:31.469 Michael Carbin: Our experts, these were the default expert parameters that had been provided from us, and it had been tuned manually over the years through this open source effort. And then, finally, we had. We had this tune, which came in essentially all instances with either competitive or or better accuracy, and this is again learned entirely from data. 318 00:47:33.400 --> 00:47:35.689 Michael Carbin: So we go back to our example again. 319 00:47:35.700 --> 00:47:47.070 Michael Carbin: No, L. Ovm. Where we started at was one hundred. We had it them all, which did a great job of really getting much closer to the true measured performance, and even again expert, modeled, proprietary expert model performance, 320 00:47:47.080 --> 00:47:55.009 Michael Carbin: and then dive. Tune comes in at twenty seven. So it's actually able to come in and get very close to our at the moment. 321 00:47:55.020 --> 00:48:09.420 Michael Carbin: But again we're covering the interpretability now of Llvm, in the sense that we can get full trace of what we anticipate, or what Lvm Nca. Hallucinates, or believes is the underlying execution behavior of a piece of code on our march. 322 00:48:10.630 --> 00:48:12.040 Michael Carbin: So now 323 00:48:12.900 --> 00:48:20.989 Michael Carbin: I them all d to, and even efforts, that we've had continuing on from there have been tackling this performance, modeling problem learning. 324 00:48:21.000 --> 00:48:38.510 Michael Carbin: We've We've had some work in behavioral modeling coming from a variety of different directions, and oftentimes from our approximate computing. Mark. I'm not going to talk about that there. But we've also made efforts in the past several years to think differently about even learning the compiler and the compiler itself can now take 325 00:48:38.520 --> 00:48:47.389 Michael Carbin: It's different. Or this he's learned performance models. These learn behavioral models. And really now, can we think about automatically generating compiler or automatically generating a back-end. 326 00:48:47.400 --> 00:48:51.920 Michael Carbin: Can we do all of this really from first principles from data? 327 00:48:52.070 --> 00:49:10.580 Michael Carbin: Without necessarily the the manual process that we normally have Here and there's been a very vibrant community standing up doing work in this space right now. But there is by and large a of a grand challenge here of automatically synthesizing a compiler from data. And of course, a hardware 328 00:49:11.230 --> 00:49:15.090 Michael Carbin: now. But what I want to highlight here again is the process that we went through. 329 00:49:15.100 --> 00:49:18.809 Michael Carbin: So we had this standard development methodology for a performance modeling the 330 00:49:18.990 --> 00:49:37.070 Michael Carbin: um. We then realized that what we're trying to do is it's a modeling problem, right? And in particular, this is we're trying to model uncertainty. We have some fundamental uncertainty about this this hardware, substrate, and neural networks. In the last decade or so have emerged as a very effective technique for modeling uncertainty. 331 00:49:37.170 --> 00:49:39.469 Michael Carbin: So that's how we moved to I 332 00:49:40.090 --> 00:49:54.359 Michael Carbin: now, of course, with item all we lacked. Now the interpretability of understanding our system. While this may be great, this does, while it may be great in performance in terms of how well the model matches reality. 333 00:49:54.370 --> 00:50:02.579 Michael Carbin: It's, unfortunately, not necessarily usable for all circumstances, because in many circumstances we do want to understand what our system is doing, 334 00:50:02.740 --> 00:50:06.859 Michael Carbin: and So with that we move to div tune, which is now this neural symbolic approach. 335 00:50:07.940 --> 00:50:21.659 Michael Carbin: Now, of course, if you've been paying it, I imagine you've been paying attention to Jack Gbt and everything that's been going on in the world for the past few months of nine years. But you can see now this trend of what we have with software. 336 00:50:21.670 --> 00:50:30.040 Michael Carbin: Again, where we were before neural networks, neural symbolic combinations of neural networks and traditionally written components. And now, 337 00:50:30.380 --> 00:50:48.390 Michael Carbin: really a future of software that may even include large language models, instances where our program is now going to be half-code half-trained neural networks. We're not half a third code third trade neural networks, and maybe another third of just just English. Right? You're really starting to see people 338 00:50:48.400 --> 00:50:57.120 Michael Carbin: start to report these very strange experiences of Yeah. My program is about half code and half English, and you prompts a large language model. 339 00:50:57.240 --> 00:51:00.389 Michael Carbin: And this this is a very weird feature, right? 340 00:51:00.400 --> 00:51:08.289 Michael Carbin: But I personally think this feature wasn't, man, before this is where we've been moving towards for some number of years. 341 00:51:08.300 --> 00:51:15.410 Michael Carbin: Again, as I've said before, uncertainty has been creeping in downwards from our environment, 342 00:51:15.730 --> 00:51:35.330 Michael Carbin: thinking about how we're deploying systems on all these new places where we're trying to get them to model uncertainty and machine learning has finally emerged as an effective way to model a certainty really end-to-end large-scale, differentiable systems have really been you know a huge way for thinking about how to address many of these people, 343 00:51:36.090 --> 00:51:39.929 Michael Carbin: and we're seeing that as well with things like Eisenhower and do 344 00:51:41.010 --> 00:51:45.589 Michael Carbin: coming up below as I've talked about right. This is the other case where we're seeing uncertainty 345 00:51:45.600 --> 00:52:05.379 Michael Carbin: and with tiff tun. Yeah, we're we're with iythm on diff tune. We are modeling this uncertainty in the hardware system as I've talked about right using these techniques where, of course, this hardware system is our environment. So this performance modeling particular system. But even moving forward, you know, as I talked about four, with the approximate computing example, 346 00:52:05.390 --> 00:52:19.439 Michael Carbin: our hardware systems are just going to be increasingly less precise, right? And so here we see that with again this Jacobi iterative method where we saw. Well, maybe if we recognize that we're actually 347 00:52:19.500 --> 00:52:31.070 Michael Carbin: fundamentally operating on a uncertain computing, substrate, and perhaps architect our software. With that in mind we can get performance benefits versus a replication traction. 348 00:52:31.740 --> 00:52:43.360 Michael Carbin: Right? So for me, when I see a diagram where i'm being pushed from above and being pushed from below with uncertainty, coming in all these directions. I think it's it's meant to me that it was an inevitable that we're going to start thinking about 349 00:52:43.370 --> 00:53:00.590 Michael Carbin: how we apply these sorts of neural symbolic systems systems themselves are starting to use uncertain components, things like large language models to really tackle the fact that they're all trying to model we're modeling this type of uncertainty in our in our modern software or where we're deploying our modern software systems 350 00:53:00.600 --> 00:53:05.150 Michael Carbin: really a first-order concern. And as we're seeing it just can no longer be more. 351 00:53:06.720 --> 00:53:26.210 Michael Carbin: And I think personally, there's a wide variety of different interesting research directions to take that we've been discovering, and I think that many people in the community have been working on as well, and I hope people are inspired, perhaps today to go and think about some of these problems as well, but just as a simple preview of some of the things that we've been doing. 352 00:53:26.220 --> 00:53:32.720 Michael Carbin: For example, we've been thinking about what what is a programming methodology. Now we think of methodologies like you 353 00:53:33.550 --> 00:53:35.689 Michael Carbin: declared a program or functional program, 354 00:53:35.700 --> 00:53:37.389 our object-oriented program 355 00:53:37.400 --> 00:53:54.039 Michael Carbin: like, What what are the types of programming methodologies that make sense in this in the space? And one thing that we've had in the previous few years inspired by what we've done in tune is actually a surrogate program which is starting to outline techniques and tactics for thinking about building these types of surrogates, such as what we saw with fifty. 356 00:53:54.800 --> 00:54:02.420 Michael Carbin: So, for example, the circuit, just as we we had talked about the Surrogate is going to be a neural network that's going to model the I o behavior of some. 357 00:54:03.340 --> 00:54:21.170 Michael Carbin: And what we've wanted to work towards is, can we start thinking of a theory of learning surrogates? And maybe we've been associated program analysis that allow us to write a program that we're trying to surrogatize for some definition to the problem that we're trying to or to the model that we are. 358 00:54:22.150 --> 00:54:29.970 Michael Carbin: We've made initial progress towards this right towards this theory of learning surrogates. So you know, programs are just functions. Is, 359 00:54:30.180 --> 00:54:40.689 Michael Carbin: There's nothing super special about that. And the literature you actually comb through the machine learning literature. You'll find, for example, sample complexity bounds on the 360 00:54:40.700 --> 00:54:55.459 Michael Carbin: a neural networks ability to learn classes of functions so correspondingly. There. There could be opportunities here to think about which classes of programs that neural networks can learn effectively if we're thinking about using surrogates as a primary programming tool, 361 00:54:55.950 --> 00:55:02.390 Michael Carbin: so something that we've done over the past year or so has been using the intentional representations of the program. Since we 362 00:55:02.400 --> 00:55:03.789 can analyze a program 363 00:55:03.800 --> 00:55:06.879 Michael Carbin: and then use that to provide better learning bounds. 364 00:55:06.940 --> 00:55:11.389 Michael Carbin: So, as a conqueror, it is concreteization here. If I want to learn a circuit of this type of program 365 00:55:11.400 --> 00:55:22.399 Michael Carbin: where it has a branch in it like there's a large program inside. If there's a branch I can go left, or I can go right left to F and left to G right, and what I want to do is again learn a surrogate here. 366 00:55:22.410 --> 00:55:26.389 Michael Carbin: But the question now is to learn a surrogate is, I need data. 367 00:55:26.400 --> 00:55:45.789 Michael Carbin: What What data should I generate here? And clearly I need to generate data that's going to exercise both side of these branches. Should I generate this uniformly? Should I give away? Is there some sort of the input data distribution in which i'm going to deploy it. There. There are all sorts of strategies here, but this is going to be a fundamental question. If you're trying to build a surrogate for a system. 368 00:55:45.820 --> 00:55:47.939 Michael Carbin: So what we found 369 00:55:48.260 --> 00:55:57.419 Michael Carbin: that what we can do is actually do program analysis. We can try to use program analysis to reason about the complexity of the functions 370 00:55:57.770 --> 00:56:17.599 Michael Carbin: below this branch, where so, for example, with a nice linear function over here versus something much more wiggly, which is going to have higher frequency. What you'll find is based on our analysis. Our analysis will tell you when you generate inputs. What you want to do is bias. Your examples more towards the complex programs. Complex part of the program rather than a simple part, 371 00:56:18.700 --> 00:56:23.429 Michael Carbin: seems obvious in hindsight. But it's something where we can actually now start to build 372 00:56:23.490 --> 00:56:30.599 Michael Carbin: programming languages, tool program analysis tools that now automatically inform the way in which we are going to train these sorts of neural networks 373 00:56:30.610 --> 00:56:32.399 Michael Carbin: and in preliminary results, 374 00:56:32.590 --> 00:56:48.780 Michael Carbin: where we applied this to a renderer, which is again, No. If you take diff tune, which is a no physical simulator of hardware systems as me, a physical simulator of the that amps of light. So we apply this to a renderer, and what we can see is 375 00:56:48.790 --> 00:56:58.829 Michael Carbin: of our ground. Truth image here, and if we do a naive uniform sampling which you might otherwise do versus our guided sampling is maybe a much higher quality result for the same number. 376 00:57:00.850 --> 00:57:12.829 Michael Carbin: The last one I wanted to talk about is really more open challenges. I myself have had work in the space of reasoning. How can we actually reason now about the correctness of software systems builds in these ways. 377 00:57:12.850 --> 00:57:16.969 Michael Carbin: Um. But many people in the community have also as well. 378 00:57:16.980 --> 00:57:26.810 Michael Carbin: And so I put this here. More it more is, you know, inspiration into a rope discussion, which is, How do we? What is? What does it mean for these systems to even be correct now? 379 00:57:26.930 --> 00:57:39.370 Michael Carbin: Or certainly we have traditional properties like safety. Does a program satisfy basic invariance? Right? Does it always return a result that is greater than zero, right? It is a distance metric, and always needs to do this. No fire 380 00:57:39.380 --> 00:57:54.849 Michael Carbin: do something strange. I approximate this program. It's a way. This is a basic property that needs to fold at the final property, or it could be something even more intentional where i'm. Never going to reference at all. These are classic types of invariance that we may want to be true of a piece of software. 381 00:57:54.860 --> 00:58:06.030 Michael Carbin: Well, in this concept of uncertainty we may need to and not know what we do, we may need to. We really do need to rethink these types of properties, where now, perhaps, they're more probabilistic. 382 00:58:06.050 --> 00:58:08.720 Michael Carbin: Safety hold with some probability, the 383 00:58:09.270 --> 00:58:18.049 Michael Carbin: or are my results within epsilon of some idealized right that a history or results doing both of these things. 384 00:58:18.450 --> 00:58:21.390 Michael Carbin: Now there are also concepts like generalization, 385 00:58:21.450 --> 00:58:34.089 Michael Carbin: it starts to become new, You know. I think we've only informally thought about this in some with our work, in testing and even thinking about the difference between our dev environments and reduction. But 386 00:58:34.100 --> 00:58:51.749 Michael Carbin: fundamentally, what we're What we're thinking about is a generalization problem. If I have one of these systems crafted in this way it's been developed. Now it's been fundamentally developed with with the data set that we've used during the training of this system. Will this system generalize in production to a distribution of unseen inputs. 387 00:58:52.260 --> 00:58:55.189 Michael Carbin: Now, also, questions about robustness. 388 00:58:55.200 --> 00:58:59.599 Michael Carbin: Do systems, behaviors change under slight perturbations, perturbations of data. 389 00:58:59.880 --> 00:59:18.560 Michael Carbin: Now, if I were to ever so slightly regenerate my data set. Am I going to get an entirely different system with entirely different behaviors? Because that's not necessarily going to be the most comfortable idea in the world. I'm thinking about how robust our system may be to. For example, replication. We need to build it again. If someone is trying to replicate 390 00:59:18.570 --> 00:59:20.250 Michael Carbin: the underlying system. 391 00:59:20.640 --> 00:59:26.310 Michael Carbin: Interpretability, right? Can we actually understand and triage failures of these systems? What can we do? We 392 00:59:26.320 --> 00:59:30.879 Michael Carbin: now? There's been a wide variety of different results going on the community of 393 00:59:31.100 --> 00:59:44.060 Michael Carbin: all sorts of adversarial verification of neural networks, thinking about reinforcement, learning? How do you develop monitors and shields for reinforcement learning. I think a lot of this stuff is very interesting, and I think where this system or the 394 00:59:44.070 --> 01:00:01.679 Michael Carbin: community broadly should go is how you start to really connect these to some of the the real problems people are trying to do in their day to day and constructing these types of software systems, and my hope is that we'll be able to make strong progress here, and maybe what it requires us is to rethink 395 01:00:01.690 --> 01:00:19.659 Michael Carbin: how our systems behave a little bit differently, because, for example, with we go back to our Jacobiative method. So back in two thousand and eighteen, you know, we had this result. We demonstrated the benefits of the fact that we were only going to get two point, six percent more overhead. But we also sat down and wrote some proofs, 396 01:00:19.670 --> 01:00:26.369 Michael Carbin: right? We actually realized that we actually bound the number of additional iterations that you may incur 397 01:00:26.500 --> 01:00:35.389 Michael Carbin: by using this type of strategy for this particular problem. And so essentially, you have a delta T. That's going to be some function of the Epsilon that you're going to have. 398 01:00:35.400 --> 01:00:37.189 Michael Carbin: How big of an area you're going to have. 399 01:00:37.200 --> 01:00:49.790 Michael Carbin: And the reason why this worked is because these systems are self-stabilizing systems. So joel Kobe is very interesting. Where, if you encounter one of these errors, so Jacobi will converge to the right answer, 400 01:00:49.800 --> 01:00:59.749 Michael Carbin: regardless from wherever you start. In the space of Guesses initial initializations, it's always going to converge. So that's good. And so, when you have an error, 401 01:00:59.830 --> 01:01:01.770 Michael Carbin: what happens is 402 01:01:01.960 --> 01:01:06.609 Michael Carbin: essentially, you just restart right. It's like you're restarting. 403 01:01:06.860 --> 01:01:18.290 Michael Carbin: Now, fortunately, the way these errors propagate through these systems is not catastrophic. So rather than restarting from a random point. The reason why we're converging much more quickly is because we're restarting from a point 404 01:01:18.340 --> 01:01:27.519 Michael Carbin: that is still preserved some of the information that we've made in these initial ten thousand iterations of progress. It's not truly a 405 01:01:27.860 --> 01:01:40.810 Michael Carbin: right. So, starting to rethink our systems, as perhaps more of self-stabilizing systems systems where we can bound the impacts of errors. The impacts of uncertainty may lead us towards new directions and thinking about software. 406 01:01:41.680 --> 01:01:49.339 Michael Carbin: So with that just going to wrap up my my personal opinion landscape the software is just fundamentally changing the 407 01:01:49.480 --> 01:02:03.009 Michael Carbin: it's been changing, and I think it's It's It's finally here. I don't think you can deny the fact that virtually most programs are going to be written going forward, or now going to be programmed using a large language model which are generating code that 408 01:02:03.050 --> 01:02:05.650 Michael Carbin: who knows who understands. 409 01:02:05.690 --> 01:02:08.639 Michael Carbin: So I mean, this is the feature. It's 410 01:02:09.020 --> 01:02:23.699 Michael Carbin: again future coming from below, new hardware substrates future coming from above, deploying these systems in very rich, expressive, partially observable environments. And again computations with these opaque machine learning systems. 411 01:02:24.240 --> 01:02:29.990 Michael Carbin: Now the consequence of all of this is again that there's an Epsilon. There's these up along the way. 412 01:02:30.000 --> 01:02:47.870 Michael Carbin: We really, as a community, need to start thinking more carefully about Epsilon correctness, probabilistic correctness. Because this is these are the tools that we need to think about what these systems do, and it's not a way. That, at least is piece of me as a classic computer scientist has been trained to really think about what our systems are doing. 413 01:02:47.880 --> 01:02:49.600 Michael Carbin: But I think there's an opportunity, 414 01:02:49.650 --> 01:02:57.959 Michael Carbin: if we may progress here. I think we can get much more amazing, much more capable systems, but systems that we'll also still be able to understand. 415 01:02:58.110 --> 01:02:59.360 Michael Carbin: And so with that, 416 01:03:00.800 --> 01:03:02.740 Michael Carbin: have you taken any questions in the Qa. 417 01:03:04.610 --> 01:03:10.830 Dilma Da Silva: Thank you so much, Mike. This was, you know, an amazing 418 01:03:11.030 --> 01:03:14.089 Dilma Da Silva: set up, or things for all of us to think about, 419 01:03:14.100 --> 01:03:32.169 Michael Carbin: and really ah! And we have people in in in the Q. And A. Thank you for your talk, and ah, and and things like that I'll take. Ah! The first question that came early in your presentation, and I think that go ahead. 420 01:03:32.180 --> 01:03:36.990 Michael Carbin: Can you see the the Q. And A. Or I can, I can. 421 01:03:37.000 --> 01:03:43.759 Dilma Da Silva: I will. I will be, you know, triaging and asking you the questions. 422 01:03:43.770 --> 01:03:47.790 Michael Carbin: Okay, yeah, go ahead. Let me see, I might be able to get there, but it might take a little time. 423 01:03:47.800 --> 01:03:54.720 Michael Carbin: They just Yeah, that's okay. That's okay. So 424 01:03:56.170 --> 01:04:26.000 Dilma Da Silva: So the the work that you started, you know, reminding us the A. Certain comes from everywhere, and then you start to talk about those applications and the Jacobite the ratio, as one example of applications that have some tolerance for faults that you know we had in the in the Q. And A. The comment that there was this direction of the and I think that this may refer to the working. 425 01:04:26.010 --> 01:04:47.490 Dilma Da Silva: It's like supercomputing. Did you know they were because they worked so many with those models, right, those numerical analysis, iterative models. So they have been looking at that. Do you think this is coming from the same perspective, or how you may relate to when people thought twenty years ago about fault, tolerance, outreach. 426 01:04:47.500 --> 01:04:59.050 Michael Carbin: Oh, no, absolutely. I do think this is coming from that same vein. Um! You know the work that we had there in two thousand and eighteen built on some of the observations that we're seeing in. So, for example, or systems like fault, tolerance 427 01:04:59.060 --> 01:05:15.690 Michael Carbin: gymnasts that we're going on. These were again. Systems are coming from this intersection between the scientific computing, numerical analysts, mathematicians as well as a high performance. Hpc. Community of thinking about how we can build a fault, tolerant solvers, 428 01:05:15.700 --> 01:05:29.189 Michael Carbin: and I think in that space there are some very interesting results that came out. I'm not sure how broadly they've been adopted since. But I I think my my take on these is really, how do we move that to other parts of her stack. 429 01:05:29.200 --> 01:05:31.110 Michael Carbin: Right? How did we think about 430 01:05:31.590 --> 01:05:41.109 Michael Carbin: fault, tolerance, robustness for more traditional applications as well? Where there's not necessarily quite a nice mathematical formulas? 431 01:05:41.230 --> 01:05:52.959 Michael Carbin: We all need to start something, but I think that results. Those results of those times are very interesting and impactful, and I think that's the open question. Is, Can we generalize the observations there to other manners? 432 01:05:53.810 --> 01:05:55.290 Dilma Da Silva: Thank you. 433 01:05:55.300 --> 01:05:59.560 Dilma Da Silva: Now a question on 434 01:05:59.860 --> 01:06:07.809 Dilma Da Silva: neural network correctness. So when you say that a neural net is correct with some probability. What does that even mean? 435 01:06:07.820 --> 01:06:29.310 Dilma Da Silva: It is often the case that if you give the same, input, it reproduce the same error. So the correctness. Probability is zero, and I I think that there is a weak wing here. Ah! From from our colleague on that. But what a you know, if you want to elaborate On What do you mean about a narrow net correctness with some probabilities? 436 01:06:29.700 --> 01:06:58.040 Michael Carbin: So yeah. So what is your event space? That's really the question right? Are we talking about? This? Probability is quantified over the distribution of inputs that may have passes in the system that that's one way to think about this, because otherwise yes, you're right. This is the deterministic at least a neural network as constructed is deterministic. There are certainly other techniques for randomizing behaviors that you're going to have coming out of a neural network. And so that's another alternative way of instantiating this probability 437 01:06:58.050 --> 01:07:10.730 Michael Carbin: Um, personally. So I have played games with all sorts of different styles of these types of bounds. When we're doing the approximate computing work of can we, for example, again, quantify 438 01:07:10.740 --> 01:07:26.440 Michael Carbin: What are we quantifying with That's always a fundamental question. Is it over outputs? We look at some of the work with hardware reliability, sometimes what it is is the stochasticity is going to be the hardware substrate itself. So it's subject to potential errors that we're going to have in the distribution over those. 439 01:07:26.450 --> 01:07:34.289 Michael Carbin: But we've also even looked up techniques of randomizing the algorithms themselves. And so that way we're going to have a probability associated with that. 440 01:07:35.500 --> 01:08:05.309 Dilma Da Silva: Um, We have now a question that relates to, you know, a different domain in terms of uncertainty. So possibly another extreme way. For example, with space shuttle control computers in those systems they would have a number. The the guest here said five independent computers, and they are configured to have this very high level redundancy, computing the same thing. You know the kind of thing that you you you mentioned as one 441 01:08:05.320 --> 01:08:19.719 Michael Carbin: a way of doing that. And then there was another one computer that was completely independent computer and different group with different software, really meaning that it's not only replication, but there is 442 01:08:19.729 --> 01:08:34.830 Michael Carbin: ah another entity that was trying to work with the same thing. So the question would be in, You know, when you see your research, what do you see yet to fit this domain, for example, flights computers 443 01:08:34.840 --> 01:08:49.169 Dilma Da Silva: in computer control aviation. So if lives are in the line, or if it is really a space where reliability is higher than average softer, how do you see 444 01:08:49.580 --> 01:08:53.539 Dilma Da Silva: space evolving regarding handling uncertainty, 445 01:08:53.550 --> 01:08:57.990 Michael Carbin: and I hope I made a reasonable original presentation for me. 446 01:08:58.000 --> 01:09:05.229 Michael Carbin: Yeah. So things like redundancy, i'm still core supportive of these, you know, some of the idea of doing approximation rather than redundancy is, 447 01:09:05.390 --> 01:09:20.350 Michael Carbin: is illustrative of instances where perhaps, we can think differently, we don't necessarily always need our time to see if lives are online. Perhaps you don't want to be that it's not something you want to do. You don't accept this at all. Now, having said that there are other challenges thinking in 448 01:09:20.470 --> 01:09:35.019 Michael Carbin: in this type of space. So there's a line of work that we've done, my student era excellent, and um has done what we called belief program, which the idea, and where we actually applied. This was with the Mars Polar liner and So the mars-polar liner is an interesting case study. 449 01:09:35.029 --> 01:09:43.949 Michael Carbin: So their Gpl. Actually did a report out of this lander where the little Hendridge unfortunately Mars border Lander crashed into the surface. 450 01:09:44.029 --> 01:09:46.289 Michael Carbin: Um! And so what happened? 451 01:09:47.290 --> 01:09:51.859 Michael Carbin: And an abstract view is, the Lander was coming close to the surface. 452 01:09:51.910 --> 01:10:08.190 Michael Carbin: It has a sensor system, a sensor and essentially detection system that's trying to detect whether or not it has landed on the surface. So, unfortunately, several hundred feet above the surface, it believed it had actually touched the surface, even though it had not. 453 01:10:08.200 --> 01:10:13.890 Michael Carbin: And so it turned off at his engines, and therefore fell just into the into the surface. 454 01:10:13.900 --> 01:10:23.769 Michael Carbin: Now, if you look at that report. What's interesting there is what happened was there? There was a failure to account for the uncertainty in the senses. So what happened? 455 01:10:23.820 --> 01:10:38.750 Michael Carbin: Um! Is that the altimeter that they have is detecting the distance between the craft and the surface of the moon could actually be influenced by deploying the landing gears, Or, essentially what you do to these false positives that you had 456 01:10:38.760 --> 01:10:57.590 Michael Carbin: now in designing a system. This is something they knew that was a problem, or, to say a a source of uncertainty and sensor design. They had written essentially an analysis of how the system should operate, what should be done, how to account for the uncertainty the fact that these false positives would happen. But in the software implementation they actually implemented it incorrectly. 457 01:10:57.600 --> 01:10:59.900 Michael Carbin: And so this case wasn't handled appropriate 458 01:11:00.030 --> 01:11:11.389 Michael Carbin: Now, in this line of work that we have. What we've done is we've come up with a way of a modeling language for someone to write down this sensor uncertainty, essentially saying, Okay, this sensor to produce this altitude. However, 459 01:11:11.400 --> 01:11:23.989 Michael Carbin: if the if um if we deploy the landing gear, then it's possible that this sensor or this altitude or reading has been corrupted right? And so you can actually write this down essentially as a program. 460 01:11:24.000 --> 01:11:28.590 Michael Carbin: And then what we do is we have a system that can take this uncertainty model 461 01:11:28.600 --> 01:11:42.290 Michael Carbin: and actually dynamically track the insert? It can tell you. Is it possible, right, that there's an error with this sensor at this point in time, right? And if so, you can make a different decision. So what it does is actually automate some of the 462 01:11:42.300 --> 01:12:01.469 Michael Carbin: or work that they were attempting to do there in the software implementation, but had failed to implement correctly. And so, when you think about uncertainty, at least at that safety critical level. I would say there are multiple sources. Yes, there's the reliability issue thinking about hardware reliability. So the natural liability of being in a sensor based environment. And in that environment, 463 01:12:01.480 --> 01:12:13.589 Michael Carbin: in many instances there's a manual process for trying to manage that uncertainty that we think that there's a rise of new languages, things like published to programming. Like, I said, belief programming is what we had talked about, which, 464 01:12:13.600 --> 01:12:21.190 Michael Carbin: as a similar concept of, instead of tracking probabilities of events, and states that you may be in It's going to track sets, 465 01:12:21.290 --> 01:12:30.749 Michael Carbin: and then having systems that dynamically allow you to interrogate that uncertainty to then make decisions right without you needing to encode that sort of uncertainty yourself. 466 01:12:32.360 --> 01:12:33.190 Dilma Da Silva: It's 467 01:12:33.200 --> 01:12:44.389 Michael Carbin: that's That's quite interesting, and I think i'm going now to i'm not sure if i'm going to ask a question or give you a top of my, you know core 468 01:12:44.400 --> 01:12:49.209 Dilma Da Silva: reaction to this whole story. So you know. So 469 01:12:49.870 --> 01:13:00.189 Dilma Da Silva: people who have debugged from Gcc. Errors that only appears when you have a thousand cores or something like that. 470 01:13:00.380 --> 01:13:28.590 Dilma Da Silva: It's been, you know, months, months, months trying to find this fast that of course, the designers didn't think about. So it was, you know, a corner case. So when i'm thinking now that you know my next project, I may decide to use a compiler from your group that is, you know, generated. And i'm thinking now on this one, how we can explore a behavior and try to do that, debugging that we did before, because we understood exactly like by line we could try to follow 471 01:13:28.600 --> 01:13:51.889 Michael Carbin: the follow the interpretability by doing, tracing, as you mentioned in your talk. So that part of a problem determination becomes really hard. So when systems became really large, distributed systems explored, you would look at the top conferences in systems, Osdi sosb, And there are so many papers on helping people to find problems on those systems. 472 01:13:51.900 --> 01:13:58.520 Michael Carbin: They became it right really hard. Now, when I look that if we do go, you know, Surrogate, 473 01:13:58.530 --> 01:14:28.099 Dilma Da Silva: it's part of the methodology. I'm not even saying that I heard you that it were replaced. But let's say that now the software developer developed me a new file system. Then I will have a piece that is a surrogate to capture some things, and it will have other things that may come from, you know Posix Semantics, or whatever you know, legacy you need, and i'm going to marry. The hard part is that what is this story for problem determination? Who are the people who get the call? And now there is this problem, and it's really high, and we need a swath team to fix and find it 474 01:14:28.110 --> 01:14:49.699 Michael Carbin: how we're going to work on it. And I wonder if, when you're thinking about the program with the dollars you're thinking about that about as you put pieces together. If there is some breadcrumbs or some traces that you can give to help people to to find the problems, because that may be really hard. Now, another part of it 475 01:14:49.710 --> 01:14:51.870 Dilma Da Silva: is that. Uh, 476 01:14:51.880 --> 01:15:21.810 Dilma Da Silva: if you look at things like file systems. Most files are small and die soon, right? So a lot of our file systems are inefficient. If you go to the worst possible case scenario to guarantee correctness or file systems. But in your very, you know, approach. I was trying to just to find the distribution. They put data that you said, That is a hard problem how you do those Ah, ah data sets in order to get to some notion of reliability how that all fits together, since we may not 477 01:15:21.820 --> 01:15:41.119 Michael Carbin: see those Corner Corner Corner cases very common. So the reliability i'm repeating your last slide, just saying that I think this reliability and this this story would be a big challenge mine. So you probably have more answers about, You know you probably don't see as many 478 01:15:41.130 --> 01:15:45.150 Dilma Da Silva: her obstacles as I see right now. 479 01:15:45.160 --> 01:15:48.789 Michael Carbin: Uh, no, I I see lots of obstacles, right, I, 480 01:15:48.800 --> 01:15:56.459 Michael Carbin: as you pointed out on the dip-tune side you know this. This was part of our inspiration. How can we make this interpret? How can we still 481 01:15:56.470 --> 01:16:13.930 Michael Carbin: use the power of data, but give a description that essentially fully interpretable? So the output of dip tune is just a tuned version of Ah Lvm Mmca. Where you can interrogate. Still, essentially, all the decisions that are being made. So the final output is not something where you need to understand what? An our our system 482 01:16:14.030 --> 01:16:17.359 Michael Carbin: right now more broadly than that 483 01:16:17.380 --> 01:16:18.910 Michael Carbin: it is. 484 01:16:19.120 --> 01:16:24.419 Michael Carbin: It's going to be a fundamentally interesting future. I have to say I I 485 01:16:25.560 --> 01:16:31.480 Michael Carbin: I think the forces of software developing are moving in a direction where 486 01:16:32.530 --> 01:16:35.999 Michael Carbin: to access new capabilities. These types of 487 01:16:36.050 --> 01:16:40.690 Michael Carbin: perhaps naturally uninterpretable components are going to be integrated. Right? 488 01:16:40.700 --> 01:16:41.780 Michael Carbin: So 489 01:16:42.100 --> 01:16:59.990 Michael Carbin: I think the challenges you're outlining are ones that we have to face and be prepared to face right? And think of ways of enabling people to lose this type of technology in ways that overall give a good understanding of the system, right? So it's. I 490 01:17:00.000 --> 01:17:05.690 Michael Carbin: I think these are all I don't know if i'm hopeful. I think it's unavoidable 491 01:17:05.700 --> 01:17:13.709 Michael Carbin: Right? I think the power of these tools put in people's hands are such that they can do things that they couldn't do before, and so they will do that. 492 01:17:14.700 --> 01:17:25.809 Michael Carbin: And so that that's where we're going to end up. And so what comment are we going to make? We can tell people not to do it. But I I don't think that's really going to help us 493 01:17:25.820 --> 01:17:51.190 Dilma Da Silva: in the in your talk. Ah, that's a ah question here. Ah, there was an example of a program that was mixing cold and English, you know, saying that this is a way of doing ah becoming a common way of developing software. And you, I think you showed an example. But it went ah fast to read. So you want to tell it to tell us more about an example of a programs people are holding now where they have really just natural language as piece of it. 494 01:17:52.010 --> 01:17:57.399 Michael Carbin: So there there was an example that I was seeing where someone was trying to write. 495 01:17:57.970 --> 01:18:03.259 Michael Carbin: Essentially, After that there are a few things. So 496 01:18:03.450 --> 01:18:13.389 Michael Carbin: one example is actually a student mine who was working on a a search essentially a a semantic search, a semantic, abstract search of of art. 497 01:18:13.400 --> 01:18:32.449 Michael Carbin: So he's got this program. That's largely okay. There are a couple of keywords that I want to put together, and here is suddenly this pl: There's a prompt out to this large language model that says that I want you to, you know, search for a whole bunch of papers that, let's say, have cinema of all these words that I have here, and then bring those back to 498 01:18:32.460 --> 01:18:35.890 Michael Carbin: so cinema's of this word, 499 01:18:35.900 --> 01:18:45.779 Michael Carbin: where you know word is something that's been written. This is something that's not coming as input, but synonyms of is an algorithm that is now being interpreted by a large language model. 500 01:18:46.380 --> 01:18:54.690 Michael Carbin: I. There's we we just short of encoding all possible synonyms and 501 01:18:54.700 --> 01:19:10.660 Michael Carbin: right of the words that we have and expected's inputs. This is not something that's going to be programmed manually, but can be written in English, given an intuitive understanding, and can be interpreted by a large language model. To then go and retrieve results that are related to that. 502 01:19:10.670 --> 01:19:16.669 Michael Carbin: So these are the types of computations, I think, are 503 01:19:16.830 --> 01:19:36.489 Michael Carbin: in the short term being integrated where we can give natural language specifications to what they do. But the programming implementation is I mean it's. I guess you can go and try to develop a massive thesaurus and do this. Or perhaps there's an available source that's out there. But this can just be done automatically. 504 01:19:36.500 --> 01:19:37.630 Michael Carbin: It's 505 01:19:38.300 --> 01:19:41.119 Dilma Da Silva: in that. That's okay. That's reasonable. 506 01:19:41.130 --> 01:19:59.479 Dilma Da Silva: Okay? I think we Ah, we don't have Ah, more questions. Ah, for the Nsf. People! We'll be talking to Mike again at three Pm. Eastern. Ah, for a little bit, If we have, you know people to discuss more. Ah, but thank you so much. Thank you for people who attended.