WEBVTT 1 00:00:08.849 --> 00:00:24.450 Rance Cleaveland: Okay, let's go ahead and get started. So, um, welcome everybody to the next installment of the Distinguished Lecture Series sponsored by the Computer and Information Science and Engineering Directorate at NSF 2 00:00:24.990 --> 00:00:35.460 Rance Cleaveland: My name is rants Cleveland and I'm the Director of the computing and communication foundations division within sys and it's my great pleasure to introduce our speaker today. 3 00:00:36.630 --> 00:00:39.540 Rance Cleaveland: Professor Nancy Blanche from MIT. 4 00:00:40.740 --> 00:00:46.320 Rance Cleaveland: I am debated, how to frame my introduction, because I have to admit to have that 5 00:00:47.190 --> 00:00:55.770 Rance Cleaveland: Nancy's been a fixture of the research community as long as I've been aware of computing research and so I thought I would start with maybe a little 6 00:00:56.670 --> 00:01:06.030 Rance Cleaveland: Personal note in that regard. So I started graduate school and I'm hesitant to admit this but in 1982 the year is important because 7 00:01:06.540 --> 00:01:15.330 Rance Cleaveland: In 1982 Nancy and her co authors published what I think is the most, the most influential paper on theory of distributed systems. 8 00:01:15.870 --> 00:01:24.300 Rance Cleaveland: So I was a young graduate student and sort of trying to find my way work our way through the research literature, I was reading a lot of papers and understanding, almost none of them. 9 00:01:24.750 --> 00:01:42.180 Rance Cleaveland: And I remember coming across this paper and probably 1983 or so and reading it and not really understanding it, but understanding that I better understand it because there was something deep and fundamental in the impossibility result of consensus that was proven in that paper. 10 00:01:43.410 --> 00:01:44.850 Rance Cleaveland: Well, little did I know that 11 00:01:46.230 --> 00:01:57.720 Rance Cleaveland: That this paper would lead to other people lead to other papers and other papers and other papers that a lot of them by Nancy that 12 00:01:58.440 --> 00:02:11.160 Rance Cleaveland: Laid the firm mathematical foundations for understanding distributed computation. So I'm Nancy is, as I said, a professor at MIT, she also got her PhD at MIT and mathematics. 13 00:02:11.820 --> 00:02:23.100 Rance Cleaveland: Before that her bachelor's or Bs and also mathematics Brooklyn College and she's been at MIT for a number of years. I'm 14 00:02:25.590 --> 00:02:28.200 Rance Cleaveland: Super Bowl supervising students and conducting truly 15 00:02:29.280 --> 00:02:38.160 Rance Cleaveland: fundamental and important research in the area of distributed systems. She's won every award that I can think of in the 16 00:02:38.940 --> 00:02:47.730 Rance Cleaveland: In the theory of computing community. If I were to list all of these will use up most of the hour and I know we're here. You're here to hear her talk not hear me talk 17 00:02:48.150 --> 00:02:59.580 Rance Cleaveland: I will say that she's what at least one of these awards twice. She's also a an ACM fellow and a member of the National Academies of both science and engineering. 18 00:03:01.980 --> 00:03:07.710 Rance Cleaveland: I think I'll I will conclude my introduction with another sort of little aside. 19 00:03:09.660 --> 00:03:19.020 Rance Cleaveland: From time to time in my career, I've heard of. I've heard Nancy referred to as a force of nature. And this is a testament to her truly astonishing. 20 00:03:20.130 --> 00:03:25.140 Rance Cleaveland: Record for years and years and years and years of fundamental 21 00:03:26.400 --> 00:03:35.460 Rance Cleaveland: Results in a variety of the theoretical areas involving distributed systems. However, I sometimes think that that should be flipped around a little bit. 22 00:03:36.900 --> 00:03:42.570 Rance Cleaveland: Instead of referring to Nancy is a force of nature. I like to think of nature is being a force of Nancy 23 00:03:43.950 --> 00:03:56.010 Rance Cleaveland: And again, in recognition of her truly remarkable record and the very important research that she's done so without further ado, I'll turn the floor over to Nancy lunch. 24 00:03:57.990 --> 00:04:05.070 Nancy Lynch: Well, thank you. Excuse me. Thank you for that introduction. Ransom, thank you all for inviting me for this talk. 25 00:04:06.540 --> 00:04:07.560 Nancy Lynch: Can you hear me OK. 26 00:04:09.660 --> 00:04:24.480 Nancy Lynch: I assume. Yes. Okay. So this talk is going to be on a theoretical view of distributed systems. It's a very broad talk about what I have worked on over decades with many collaborators and students. 27 00:04:25.620 --> 00:04:31.350 Nancy Lynch: So yeah, so basically we've been involved in developing the theory for distributed systems. 28 00:04:32.760 --> 00:04:37.260 Nancy Lynch: Were trying to understand distributed systems in mathematical terms that includes 29 00:04:38.310 --> 00:04:42.000 Nancy Lynch: What they do what they are able to do and what the limitations are 30 00:04:42.630 --> 00:04:55.260 Nancy Lynch: And specifically, what is that, well, it is involved defining math mathematical models for problems that are solved and distributed systems and for the algorithms that are used to solve them. 31 00:04:56.010 --> 00:05:03.000 Nancy Lynch: Sometimes it's involved developing new algorithms and sometimes considering algorithms studying algorithms developed by others. 32 00:05:03.810 --> 00:05:15.570 Nancy Lynch: Involved producing proofs of correctness also performance fault tolerance properties and then as Ron mentioned, we've had a lot of work on proving in possibility results and lower bounce. 33 00:05:17.040 --> 00:05:22.650 Nancy Lynch: So these are expressing inherent limitations of distributed systems for solving problems. 34 00:05:23.970 --> 00:05:27.180 Nancy Lynch: And along the way, we've had to do a lot of work on developing 35 00:05:28.410 --> 00:05:33.840 Nancy Lynch: Basically concurrency theory type foundations for modeling and analyzing distributed systems. 36 00:05:34.590 --> 00:05:42.210 Nancy Lynch: So what kinds of systems. We started out working on distributed data management systems, mostly, and more recently, we're working on. 37 00:05:42.720 --> 00:05:57.900 Nancy Lynch: Communication systems with wired and wireless and very recently, my group is focused on biological systems which are systems like insect colonies developmental biology and brain networks. 38 00:05:59.580 --> 00:06:06.390 Nancy Lynch: Okay. So in this talk. So it does anybody keep their camera on no looks like all the cameras are dead. 39 00:06:07.710 --> 00:06:11.520 Nancy Lynch: Oh, well, if somebody can keep a camera on. That would be nice. It's 40 00:06:12.540 --> 00:06:14.160 Nancy Lynch: Nice to get a little reaction. 41 00:06:15.960 --> 00:06:26.490 Nancy Lynch: Okay. So in this talk, I'm going to. It's very broad brush talk, I'm going to talk about work we've done an algorithms for traditional distributed systems. 42 00:06:27.300 --> 00:06:39.300 Nancy Lynch: Talk about in possibility results and then very, very briefly talk about the Foundation's work and some of the newer work on algorithms for new types of distributed systems. 43 00:06:40.920 --> 00:06:54.480 Nancy Lynch: Okay, so for traditional distributed systems. Well, we began our work in this area with a study of mutual exclusion, the costs of achieving mutual exclusion and shared memory multi multi processor settings. 44 00:06:55.080 --> 00:07:00.630 Nancy Lynch: This is work with Mike Fisher Jim burns and others. And then we move from there into 45 00:07:01.260 --> 00:07:19.320 Nancy Lynch: Studying some abstract distributed network problems, including consensus approximate consensus clock synchronization. And so these two papers listed here you have some new algorithms for fault tolerant clock synchronization and fault tolerant approximate agreement. 46 00:07:21.420 --> 00:07:30.750 Nancy Lynch: There's a lot of references that you're going to see on the slides. So assuming that we can post the slides later, you might want to browse some, some of these papers later on. 47 00:07:31.590 --> 00:07:41.070 Nancy Lynch: The first example I'm going to go into in some detail is a paper with some good work and Larry stock motor on consensus in the presence of partial synchrony. 48 00:07:42.150 --> 00:07:56.850 Nancy Lynch: Okay, so this is a addresses the problem of exact agreement in the presence of failures, not just approximate agreement on. It's kind of a counterpoint to the fisher Lynch Patterson results that round was talking about. 49 00:07:58.590 --> 00:08:09.000 Nancy Lynch: I'll go back. I'll go over that result later in the talk but that shows impossibilities reaching fault tolerant consensus and asynchronous networks. 50 00:08:10.080 --> 00:08:19.110 Nancy Lynch: Whereas this paper by work Lynch and stock Meyer presents an approach to dealing with the limitation from the NFL P paper. 51 00:08:20.100 --> 00:08:27.180 Nancy Lynch: Also present several algorithms that use this approach and tolerate different types of failures and timing uncertainties. 52 00:08:27.870 --> 00:08:35.010 Nancy Lynch: Okay. Um, so yeah. Just as a side this papers been very well recognized lately and has won. 53 00:08:35.520 --> 00:08:47.580 Nancy Lynch: The Dykstra prize, but it wasn't very well stated the time we wrote it because it was written as a purely theoretical paper it's been kind of rediscovered lately because it's relevant for things like Blockchain algorithms. 54 00:08:48.270 --> 00:08:57.270 Nancy Lynch: Okay. So I'll start with them some excerpts from the word citation for the Dykstra prize. This was written by Dahlia mouthy 55 00:08:58.950 --> 00:09:06.810 Nancy Lynch: So the idea is that this paper is introducing a bunch of models for partial synchrony between synchronous and asynchronous. 56 00:09:07.980 --> 00:09:12.120 Nancy Lynch: And these are our models in which the consensus problem is solvable. 57 00:09:13.260 --> 00:09:20.580 Nancy Lynch: The claim in the citation is that this provided practitioners with the right tool for building fault tolerant systems. 58 00:09:21.690 --> 00:09:29.550 Nancy Lynch: The key is that safety, the safety properties are maintained at all times. Despite the impossibility of consensus. 59 00:09:30.030 --> 00:09:48.060 Nancy Lynch: And it progress that the thing that gets weakened is the progress condition that's facilitated. Not always, but just during periods when the system is stable. Okay. And then he goes on to say that these are the pillars on which modern fault tolerant systems are built 60 00:09:49.620 --> 00:10:05.760 Nancy Lynch: Okay, so just a quick review distributed consensus, you have a bunch of processors and and distributed network that have to agree on a value every processor starts with some initial preference and they have to agree on some value, but some of the processors might be faulty 61 00:10:07.440 --> 00:10:19.350 Nancy Lynch: People studied different kinds of failures. The key conditions are everybody has to agree. And if all the processors happened to start with the same initial value. That's the only allowed decision value. 62 00:10:21.360 --> 00:10:23.760 Nancy Lynch: Yeah, I mean, this condition is is sometimes 63 00:10:25.020 --> 00:10:32.550 Nancy Lynch: Strengthened to say that any decision in any case must be somebody initial value. And of course, we also want the algorithm to 64 00:10:33.210 --> 00:10:47.340 Nancy Lynch: Terminate we'd like all the non faulty processors to actually succeed in reaching agreement. So this problem arose originally in two places the database commit problem study by Jim Gray, way back 65 00:10:48.660 --> 00:10:51.180 Nancy Lynch: And the Byzantine Agreement problem. 66 00:10:52.200 --> 00:10:57.510 Nancy Lynch: Where the agreement was on aircraft altimeter readings that piece by piece shot back and forth. 67 00:10:59.700 --> 00:11:19.860 Nancy Lynch: Okay, so what does the paper. Do it considers a variety of partial synchrony model and a variety failure models gave algorithms that always reach agreement always guarantee the agreement and validity properties but termination is guaranteed in the case when the system's behavior stabilizes. 68 00:11:21.060 --> 00:11:28.380 Nancy Lynch: Okay, so if the system stabilizes for long enough, then we guarantee that agreement is actually reached 69 00:11:29.340 --> 00:11:43.830 Nancy Lynch: So I'll actually go into some detail the key algorithmic ideas are different processors, try to take charge of reaching agreement and in our paper, they do this in a rotating fashions is a paradigm, called a rotating coordinator 70 00:11:45.390 --> 00:11:47.130 Nancy Lynch: Paradigm, and you have to 71 00:11:48.210 --> 00:11:57.270 Nancy Lynch: Do some work to reconcile what happens if different coordinators, try to coordinate decisions have to make sure they don't contradict each other. 72 00:11:58.710 --> 00:12:02.880 Nancy Lynch: Okay, so here's just a sketch of how it works. 73 00:12:04.230 --> 00:12:16.710 Nancy Lynch: Let's look at the simplest case we have synchronous rounds and just simple stopping failures. So we have messages that could be lost. But after some time we assume that all the messages get delivered in a timely fashion. 74 00:12:18.090 --> 00:12:30.210 Nancy Lynch: Okay, so we have phases, each one. Some coordinator is in charge of and we can do it by coordinator number and some the number of the phase my end. 75 00:12:31.650 --> 00:12:41.130 Nancy Lynch: So processor can lock of value with the certain phase number. That means it thinks that the coordinator might decide that value at that phase. 76 00:12:42.120 --> 00:12:52.500 Nancy Lynch: And here's what happens during the phase K under the control of some coordinator first everybody sends the coordinators. What you think's are acceptable decision values. 77 00:12:53.190 --> 00:13:07.440 Nancy Lynch: That means they don't hold a lock on something different. And actually it's known to be somebody initial value. So everybody says they're acceptable values and then the process picks the value that is acceptable. 78 00:13:08.790 --> 00:13:16.530 Nancy Lynch: To a majority of processors broadcast that value and all the recipients lock that value for the current face. 79 00:13:17.280 --> 00:13:23.730 Nancy Lynch: Okay. Now, anybody locked the value sense of acknowledgement to be the coordinator, and I remember these messages could get lost. 80 00:13:24.330 --> 00:13:34.650 Nancy Lynch: But if the processor coordinator receives the majority of acknowledgments, then it's able to decide on that value and then round four is just a bunch of cleanup stuff going on. 81 00:13:35.820 --> 00:13:43.500 Nancy Lynch: Okay, so that's basically the flavor of all these protocols. It's just repeating it again. And this is depicting the communication pattern. 82 00:13:45.510 --> 00:13:54.540 Nancy Lynch: Okay, so some of these ideas come from schemes three phase commit is inspired by Dale schemes are all your work on three phase commit 83 00:13:55.290 --> 00:14:04.170 Nancy Lynch: And then somewhat later lampert's Paxos consensus protocol uses very similar ideas to reach agreement. 84 00:14:04.890 --> 00:14:12.690 Nancy Lynch: Okay, I should say that there's a difference. But the important difference in Paxos protocol is it has more concurrency. 85 00:14:13.500 --> 00:14:25.920 Nancy Lynch: It doesn't use a strictly rotating coordinator, but it allows different coordinators to make attempts concurrently, but it's still the majority checking discipline that's used to manage the conflicts. 86 00:14:26.940 --> 00:14:27.390 Okay. 87 00:14:29.490 --> 00:14:31.860 Nancy Lynch: Alright, so that's the first example. 88 00:14:33.630 --> 00:14:40.530 Nancy Lynch: The next two contributions. I'm going to talk about are mostly about modeling and proofs for algorithms that 89 00:14:41.250 --> 00:14:48.030 Nancy Lynch: Had a written in the design of distributed systems, not so much about new algorithms like the previous example. 90 00:14:48.720 --> 00:14:54.900 Nancy Lynch: So both of these deal with distributed data management, which was the main focus of our work for many years. 91 00:14:55.650 --> 00:15:03.630 Nancy Lynch: After that, I'll look at one more algorithmic contribution and I like it's called the Rambo algorithm which is designed to implement 92 00:15:04.560 --> 00:15:12.930 Nancy Lynch: Atomic shared memory on a dynamically changing platform. And that can be regarded as another precursor to block chain style algorithms. 93 00:15:13.680 --> 00:15:21.960 Nancy Lynch: Okay, so let's look at concurrency control algorithms for nested transactions. Okay, so 94 00:15:22.890 --> 00:15:41.880 Nancy Lynch: Yeah, so my collaborators and I spent several years mostly in the 80s, trying to use theoretical approaches to understand and generalize and crew correctness of concurrency control algorithms for implementing atomic transactions. It's the central problem of distributed data processing. 95 00:15:43.050 --> 00:15:52.920 Nancy Lynch: Can mostly that work had been done by systems people and there was not all that much theory yet. So some more background transactions. 96 00:15:54.540 --> 00:16:07.170 Nancy Lynch: Okay well transaction. You probably know it's you can think of it as a sequence of operations typically just read and write operations that are supposed to appear as if they're executing as one indivisible unit. 97 00:16:08.040 --> 00:16:13.350 Nancy Lynch: And then a collection of transactions are supposed to appear to execute in some sequential order. 98 00:16:14.400 --> 00:16:28.740 Nancy Lynch: Concurrency control just refers to algorithms that are used for implementing atomic transactions. So the main or leader in this field and we began with Turing Award winner Jim Gray and Bernstein and Goodman followed on 99 00:16:29.760 --> 00:16:44.640 Nancy Lynch: Beginning work on an abstract of formal theory for some types of transactions. Okay. Then there were a lot of there were a lot of systems papers that had implementing algorithms represented and little theory. 100 00:16:48.660 --> 00:16:51.210 Nancy Lynch: Oh yeah, I didn't mention but the 101 00:16:52.590 --> 00:17:09.990 Nancy Lynch: Extensions of transactions to the nested case that was done by Barbara was part of another Turing Award winner, she extended the idea of transactions to allow them to have a nested tree structure were transactions and child transactions, all the way down to leave transactions. 102 00:17:11.430 --> 00:17:26.070 Nancy Lynch: Any transaction could abort during an execution in its parents and accommodated could learn about the board react by, for example, trying something else. And the key condition there is that every set of sibling transactions should execute in some order. 103 00:17:28.680 --> 00:17:42.600 Nancy Lynch: Each sibling should appear atomic. So this is a very flexible programming discipline that list got included in her artists distributed programming language so that work seem really interesting, but it was complicated 104 00:17:44.070 --> 00:18:03.300 Nancy Lynch: The new Nessa transaction concept and all the algorithms implementing it had no theory, just the descriptive systems papers and the implementations. So our work involves trying to model these things on the transaction requirements and the algorithms mathematically. 105 00:18:04.500 --> 00:18:19.530 Nancy Lynch: Along the way we ended up generalizing some existing algorithms from single level transactions to nested case. And then we were actually able to push this the point of proven correctness of these algorithms. 106 00:18:21.600 --> 00:18:25.800 Nancy Lynch: Okay. So we wrote a book that included all of these all of this work. 107 00:18:27.150 --> 00:18:35.580 Nancy Lynch: Is based on a whole series of research papers. So we did was develop a general theory for nested transactions. It included 108 00:18:36.330 --> 00:18:44.820 Nancy Lynch: An ethnicity theorem that provided one method for proven correctness of concurrency control algorithms, kind of a compositional method. 109 00:18:45.750 --> 00:18:52.260 Nancy Lynch: And then we did all this, these types of algorithms, there's lock based algorithms which is what was used in the 110 00:18:52.950 --> 00:19:02.580 Nancy Lynch: The artist system. But there's also algorithms that are based do concurrency control based on timestamps on a combination of the two optimistic algorithms. 111 00:19:03.390 --> 00:19:19.500 Nancy Lynch: Orphan management algorithms deal with what happens to children, child transactions of aborted transactions and this algorithms that involved replicated data. We did all this in terms of a mathematical framework called Iowa Tamika so we developed 112 00:19:21.030 --> 00:19:35.850 Nancy Lynch: A model for asynchronous systems with interacting components. And this was being developed at around the same time that we were able to use that as the basis for all of this modeling and verification 113 00:19:36.510 --> 00:19:49.290 Nancy Lynch: So what I've learned is that recently the effort of verifying at least safety properties of complex systems of the sort, has become quite popular and systems communities. 114 00:19:50.430 --> 00:20:00.240 Nancy Lynch: Some of that work involves not mathematical proofs of the sword that we did, but code level verification using interactive there improvers, so our work is different. 115 00:20:00.720 --> 00:20:09.330 Nancy Lynch: Is that it's based on abstract models and our proofs are mostly by hand, of course, abstract proofs could also be done with your improvers, and we did a few of us. 116 00:20:11.940 --> 00:20:12.240 Okay. 117 00:20:13.560 --> 00:20:14.010 Nancy Lynch: All right. 118 00:20:15.120 --> 00:20:24.510 Nancy Lynch: So following the same style is for are nested transactions work we studied other algorithms in the distributed data systems database community. 119 00:20:25.590 --> 00:20:34.950 Nancy Lynch: Again trying to understand how they work in a mathematical way and prove their key properties. So here's the next work that we 120 00:20:35.640 --> 00:20:51.900 Nancy Lynch: We did we've tackled the notion of distributed shared memory which is basically a shared memory extraction implemented in a distributed network we focused on a system called Orca, which was developed by ball cash shock and Tannenbaum 121 00:20:54.420 --> 00:21:09.930 Nancy Lynch: The Arca system included some interesting algorithms to achieve distributed shared memory, but the algorithms were kind of complicated, it wasn't obvious they were correct. So, or why they were correct. So it seemed like a good subject for theoretical work. 122 00:21:11.820 --> 00:21:19.920 Nancy Lynch: Okay, so what did we do, we considered an algorithm to implement sequentially consistent. Read, Update shared memory. 123 00:21:20.610 --> 00:21:39.120 Nancy Lynch: Using a basic communication service that had a combination of broadcast and point to point communication. This was as in the amoeba operating system. So this algorithm is based on the work the system, which was used for writing applications for clusters of workstations. 124 00:21:40.380 --> 00:21:52.200 Nancy Lynch: So orca divided up the work into two parts. First defined an abstract multicast channel which has very strong ordering and causality properties. 125 00:21:52.860 --> 00:22:01.980 Nancy Lynch: And then it implemented sequentially consistent memory over and arbitrary multicast channel using a partial replication strategy. 126 00:22:02.820 --> 00:22:20.520 Nancy Lynch: You had copies of the data at some of the nodes you can read any copy but you have to update all copies may also implemented the multicast channel over their basic broadcast and point to point communication services using a protocol that's based on sequence numbers. 127 00:22:23.250 --> 00:22:31.080 Nancy Lynch: Okay, so what did we do well we specified the message delivery in the ordering requirements of the multicast channel mathematically. Okay. 128 00:22:32.190 --> 00:22:51.420 Nancy Lynch: Then we defined an algorithm like there's a sequence number based algorithm to implement the multicast channel over basic communication services and we tried to prove the algorithm. Correct. But it didn't work. We discovered an algorithmic error in the orca implementation. 129 00:22:52.890 --> 00:23:00.870 Nancy Lynch: It was something subtle. They didn't piggyback certain needed sequence numbers on some response messages. 130 00:23:01.980 --> 00:23:15.330 Nancy Lynch: Okay, the error was fixed in the algorithm and in the actual system. It also had to be repaired. In fact, the system developers were very surprised that this bug was found using theory. 131 00:23:16.890 --> 00:23:26.820 Nancy Lynch: They were kind of skeptical of the theoretical efforts at the time. But yes, doing a proof tried to do a proof on Earth. The error. And then of course we could complete the proof. 132 00:23:28.560 --> 00:23:28.800 Okay. 133 00:23:30.360 --> 00:23:35.700 Nancy Lynch: So, all right. So this is what I just said, we define this algorithm and prove to correct 134 00:23:36.390 --> 00:23:49.080 Nancy Lynch: And then for the higher level we defined a version of the partial replication algorithm to implement the sequentially consistent memory over the multicast channel, which is a generalization of the orca algorithm. 135 00:23:50.190 --> 00:23:55.590 Nancy Lynch: And we developed a proof technique a little bit new for proving sequential consistency. 136 00:23:56.010 --> 00:24:10.440 Nancy Lynch: And that was all based on Iowa Tama composition abstraction techniques, the things that we were studying in terms of basic foundations for concurrency were exactly what we were using for approving these newsreel algorithms. Okay. 137 00:24:11.760 --> 00:24:26.820 Nancy Lynch: Alright so summary of just these last two parts is what did we do, we looked at a bunch of existing distributed systems we use Iowa Tama these formal methods to model and verify many such systems. 138 00:24:28.140 --> 00:24:35.580 Nancy Lynch: Especially the kind that had strong consistency requirements for the data and we specified the properties that were required 139 00:24:36.840 --> 00:24:52.560 Nancy Lynch: Defined abstract versions of the systems and prove the algorithms correct that meant we had to clarify ambiguities and we found in fixed some errors when I told you about. Also, at least one other in an existing system. 140 00:24:54.300 --> 00:24:54.810 Nancy Lynch: Okay. 141 00:24:57.390 --> 00:25:10.860 Nancy Lynch: Alright, so one more algorithm example is the work on reconfigure atomic memory. So here I'm going to talk about this last paper by Gilbert Lynch and sportsman. 142 00:25:11.340 --> 00:25:21.690 Nancy Lynch: On Rambo which is very configurable atomic memory service that works in dynamic network. So this is just a picture from a new story on this. 143 00:25:23.160 --> 00:25:29.940 Nancy Lynch: Which shows a centralized memory brain being implemented on a bunch of distributed mobile devices. 144 00:25:33.690 --> 00:25:39.810 Nancy Lynch: Okay, so our goal is to implement atomic rewrite shared memory in a dynamic network setting. 145 00:25:40.530 --> 00:25:53.160 Nancy Lynch: So atomic memories is what it's supposed to look like a centralized shared memory. Everybody lacks us is that the participants can join and leave and fail during computation that's a dynamic network. 146 00:25:54.240 --> 00:25:57.870 Nancy Lynch: Examples would be mobile networks and peer to peer networks. 147 00:26:00.150 --> 00:26:11.880 Nancy Lynch: We want high availability low latency and we want to maintain at emissivity in all cases. But, you know, we'd like good performance when there's some limits on the changes. 148 00:26:13.650 --> 00:26:23.700 Nancy Lynch: So examples that inspired this work, our soldiers in a military operation that don't have any fixed infrastructure that they're using, like, in the first 149 00:26:25.080 --> 00:26:42.810 Nancy Lynch: Word in Afghanistan, where you have a bunch of soldiers going into an area and they don't have any communication infrastructure. So they have to either bring their own on a truck or else there ought to be some techniques that can run just on the mobile devices. 150 00:26:45.720 --> 00:26:52.980 Nancy Lynch: Okay. The starting point for this work was a work on static networks by a TFR know and deliver 151 00:26:54.120 --> 00:27:03.000 Nancy Lynch: So basically the protocol for maintaining atomic memory and static network involved core processors, because I think majorities, if you like, but 152 00:27:03.600 --> 00:27:12.810 Nancy Lynch: They basically have some reports and some right corner of the key condition as for majorities is every report has to intersect every right corner. 153 00:27:15.750 --> 00:27:17.880 Nancy Lynch: They replicated the objects everywhere. 154 00:27:19.080 --> 00:27:21.420 Nancy Lynch: With tags indicating versions. 155 00:27:22.470 --> 00:27:30.270 Nancy Lynch: And then their algorithm to read was contact record them determine the latest version propagated to a right forum and return it. 156 00:27:31.500 --> 00:27:39.870 Nancy Lynch: To write you contact a record to determine the latest tag. Choose a bigger tag and write your value with that tag to a right port. 157 00:27:41.280 --> 00:27:51.810 Nancy Lynch: Okay, so they both these sort of to face protocols, you can optimize so that you don't need to always propagate a for read that saves some 158 00:27:53.130 --> 00:27:54.750 Nancy Lynch: Communication and time. 159 00:27:57.630 --> 00:28:09.570 Nancy Lynch: So in this algorithm. The nice thing is that the operations can all proceed concurrently, and they can enter leave at a very fine granularity just the individual sends and receives 160 00:28:10.920 --> 00:28:14.370 Nancy Lynch: And yet you still get at emissivity out of the entire algorithm. 161 00:28:15.990 --> 00:28:19.500 Nancy Lynch: OK, so now the Rambo algorithm inspired by this one. 162 00:28:20.670 --> 00:28:31.230 Nancy Lynch: Okay, first of all, Rambo is stands for reconfigure atomic memory for basic objects which is rewrite objects. It also stands for a soldier. 163 00:28:32.970 --> 00:28:34.680 Nancy Lynch: You know, military setting but 164 00:28:35.970 --> 00:28:51.480 Nancy Lynch: Whenever the Rambo is going to use. It's going to be similar, but now it's going to use separate configurations. Each configuration has instead of members that you're just subset of processors and has records and right parts. 165 00:28:53.040 --> 00:29:14.760 Nancy Lynch: Okay. So instead of just having one configuration as the to Barney deliver them. Now you have a set of possible configurations. Okay, the objects are replicated now not everywhere, but just all members of configuration. The reads and writes just access corns that configuration. 166 00:29:15.990 --> 00:29:37.530 Nancy Lynch: And that works the way it did and add it to our idol and that's fine and handling small and trends and changes. But if you want to have more permanent changes, you have to reconfigure to a new configuration which means moving the object copies to a Member to members of the new configuration. 167 00:29:39.690 --> 00:29:55.140 Nancy Lynch: OK. So the algorithm at a high level is just divided into the main algorithm that does all reads and writes and handles the new configurations and a reconfiguration service that basically determines the new configurations 168 00:29:56.160 --> 00:30:01.230 Nancy Lynch: The reconfiguration service just supplies a consistent series of configurations 169 00:30:02.340 --> 00:30:18.900 Nancy Lynch: And it can be triggered by, you know, some external administrator decides. Now it's time to start a new configuration. The main algorithm does all the other work it handles the reading and writing of the objects removes the old configurations and then 170 00:30:20.370 --> 00:30:27.780 Nancy Lynch: The reads and writes the trick is that the reads and writes us all the currently active configurations 171 00:30:28.650 --> 00:30:41.040 Nancy Lynch: Okay. So instead of just trying to synchronize one configuration of the time, you might have some overlap, during which period of overlap. You simply have the reads and writes, do a little extra work by using both 172 00:30:42.120 --> 00:30:43.320 Nancy Lynch: Configurations 173 00:30:45.450 --> 00:30:53.040 Nancy Lynch: OK, so the reads and writes to a two phase protocol that's like the VD protocol collect object information from report comes 174 00:30:53.760 --> 00:31:07.410 Nancy Lynch: But no, it's not just one configuration. It's all the currently active configurations and then in phase two it propagates the latest object information to the right course of all the known active configurations 175 00:31:09.540 --> 00:31:13.770 Nancy Lynch: And you can have a lots of reads and writes going on. Concurrently, just as an add 176 00:31:14.880 --> 00:31:16.650 Nancy Lynch: You get Adam a city is before 177 00:31:17.670 --> 00:31:31.350 Nancy Lynch: And the termination is a kind of a fixed point condition which involves a quorum for every known active configuration. So while you're doing this new configurations. Can you can learn about new configurations and then you have to use them to 178 00:31:32.370 --> 00:31:38.640 Nancy Lynch: But at some point, you have a fixed point where you have satisfied the condition for every configuration you know about 179 00:31:41.340 --> 00:31:49.230 Nancy Lynch: And then you have this protocol to remove all configurations. Turns out that's also a two phase protocol. It's kind of cute. 180 00:31:50.670 --> 00:31:51.660 Nancy Lynch: In phase one. 181 00:31:52.830 --> 00:32:00.420 Nancy Lynch: For every old configuration everything but the latest that you know about you tell a right quorum of the old configuration about the new one. 182 00:32:01.470 --> 00:32:11.820 Nancy Lynch: And you collect the latest object information from the old configuration phase two. You just propagate that latest object information to a right corner of the new configuration. 183 00:32:13.440 --> 00:32:27.000 Nancy Lynch: OK, so now you have reads and writes and garbage collection all going on concurrently interweaving at a fine granularity, and you still get out in the city. So it's kind of a neat algorithm. 184 00:32:28.230 --> 00:32:33.990 Nancy Lynch: Okay. And then what what do you do about implementing reconfiguration so that you can use consensus. 185 00:32:35.040 --> 00:32:38.580 Nancy Lynch: Just use consensus to determine the success of configurations 186 00:32:40.380 --> 00:32:55.020 Nancy Lynch: That's a heavy weight mechanism, but we don't use it on every operation we just use it for occasional changes of configuration. So you're not you're not delaying the read and write operations. 187 00:32:56.070 --> 00:32:56.400 Okay. 188 00:32:57.990 --> 00:33:03.690 Nancy Lynch: And you can implement consensus using the DNS paper or taxes. 189 00:33:04.920 --> 00:33:15.240 Nancy Lynch: And all this used Iowa time it again actually timed version of the IO atomic model that took into account time delays takes into account time delays. 190 00:33:17.070 --> 00:33:30.900 Nancy Lynch: And there's a nice method. If you have to prove out of necessity for data system is a nice method based on partial orders, it's described there and used in many other places. Okay. 191 00:33:33.210 --> 00:33:41.790 Nancy Lynch: Alright, so that's what I wanted to say about algorithms. She is odd, not to have feedback and pauses for questions. 192 00:33:42.900 --> 00:33:46.860 Nancy Lynch: But I guess you can't do that in this format. Okay, well, I'll go on. 193 00:33:48.450 --> 00:33:56.520 Nancy Lynch: The next part. We're going to talk about the impact some of the selected impossibility results, try to give you the flavor of these results. 194 00:33:57.900 --> 00:34:16.320 Nancy Lynch: Starting point here is that distributed algorithms have very strong inherent limitations. Why, because they have to work in settings where processors only know what's happening locally and there's all this uncertainty about what's happening when and it can be failures. 195 00:34:18.330 --> 00:34:25.530 Nancy Lynch: So once you have theoretical models as we do for the algorithms, you can actually prove that these limitations hold 196 00:34:26.730 --> 00:34:41.130 Nancy Lynch: So the starting point for me in this area was a nice little paper by tremors and Hibbard way back in 1976 as far as I know, that's the first impossibility result for distributed computing 197 00:34:43.110 --> 00:34:50.610 Nancy Lynch: So these were my colleagues at the University of Southern California. At the time, the work was in a shared memory setting. 198 00:34:52.440 --> 00:34:58.830 Nancy Lynch: They looked at a setting with Boolean shared variables arbitrary operations, not just reads and writes 199 00:34:59.520 --> 00:35:10.980 Nancy Lynch: And the problem may consider was fair mutual exclusion processors can come in, they want the resource and you want this to be fair, so everybody that request the resource it eventually get it. 200 00:35:11.700 --> 00:35:29.220 Nancy Lynch: Turns out that and they prove that it's impossible for this to work unsolvable for two processors with a single Boolean shared variable, no matter how powerful and operation you did on the on this variable two processors camp arbitrate 201 00:35:31.050 --> 00:35:35.550 Nancy Lynch: To get fair access to a resource using a Boolean variable. 202 00:35:37.740 --> 00:35:38.310 Nancy Lynch: So, 203 00:35:40.380 --> 00:35:49.140 Nancy Lynch: Yeah. So, the proof. This is based on constructing a collection of executions and showing that one of these has to lead to an incorrect results. 204 00:35:49.740 --> 00:35:58.650 Nancy Lynch: They use the fact that the processors will behave the same into executions in which their own states and the shared memory contents are the same. 205 00:35:59.760 --> 00:36:03.330 Nancy Lynch: Regardless of the internal states of the other processes. 206 00:36:04.710 --> 00:36:07.590 Nancy Lynch: So this is kind of a, you know, a light bulb. 207 00:36:08.970 --> 00:36:16.170 Nancy Lynch: Going off. Oh, you could prove this type of limitation based on this limitation of locality. 208 00:36:17.190 --> 00:36:27.600 Nancy Lynch: I should apologize because I'm mixing the terms process and processor, but it's mixed in the research literature. So I'm mostly say process now. 209 00:36:28.770 --> 00:36:30.360 Nancy Lynch: With the impossibility results. 210 00:36:31.590 --> 00:36:39.540 Nancy Lynch: Okay, so I'm going to do a quick tour of a little lower bound on the number of shared variables needed for mutual exclusion. 211 00:36:40.800 --> 00:36:54.480 Nancy Lynch: And then I'll look at lower bound on the time to reach consensus and that's warming up to the NFL P result or the impossibility of distributed consensus and asynchronous systems. 212 00:36:55.800 --> 00:37:03.000 Nancy Lynch: Okay. And then if there's time I'll talk briefly about the CAP theorem, which is some more recent 213 00:37:05.370 --> 00:37:20.280 Nancy Lynch: Okay, so first of our work on mutual exclusion, we started working on mutual exclusion. First we focused on the size of the shared memory and try to generalize from creditors and Hubert's results. We got results about in processes, instead of two processes. 214 00:37:22.260 --> 00:37:39.930 Nancy Lynch: So that's the first paper on the slide. The second one is the one I'll talk about now. This gives you about not on the size of a shared variable. But on the number of shared variables and now we're looking at shared variables that just support reads and writes 215 00:37:41.160 --> 00:37:46.020 Nancy Lynch: So this was appeared in 1993 but actually proved way back 216 00:37:47.700 --> 00:38:05.250 Nancy Lynch: 10 years before. So this actually gives an important limitation on the power of read rideshare variables compared to variables with more powerful operations. If you try to solve the same problem with more powerful read modify right operations one variable is enough. 217 00:38:07.620 --> 00:38:07.980 Okay. 218 00:38:09.240 --> 00:38:18.240 Nancy Lynch: So here we go, mutual exclusion for in processes using rewrite shared memory requires at least in separate shared variables. 219 00:38:18.540 --> 00:38:31.440 Nancy Lynch: And this is true even if you don't require fairness. So, you know, the members and Hibbert result doesn't apply here. You just require the system to keep making progress. If there's someone who wants that resource. Someone should get it. 220 00:38:32.700 --> 00:38:40.650 Nancy Lynch: We allow every process to read and write all the variables and the variables. Now, can be not just Boolean but unbounded size. 221 00:38:41.880 --> 00:38:43.560 Nancy Lynch: So this is a very 222 00:38:44.670 --> 00:38:53.880 Nancy Lynch: Permissive setting. You just have to try to make progress. Everybody can read and write all variables unbounded size variables and still 223 00:38:54.600 --> 00:39:15.660 Nancy Lynch: You can solve the problem. And I'll just show an example why you can't do it two processes and only one shared variable, okay. Suppose you have to process these p one, p two. And they sell mutual exclusion with with progress, using just one rewrite shared variable called x 224 00:39:17.100 --> 00:39:24.600 Nancy Lynch: Okay, so let's look at this scenario. This is how these proofs go you construct scenarios that had contradictory requirements. 225 00:39:25.260 --> 00:39:36.150 Nancy Lynch: Suppose process one arrives. Nobody's around process one then comes in, by itself, wanting the resource. So by the progress requirements, it has to be able to get it. 226 00:39:36.780 --> 00:39:55.470 Nancy Lynch: Somebody wants to resource. Somebody has to get it only process. One is around. Okay. Along the way, I claim that the process must write to the shared memory. Why, because it didn't write anything. The other process wouldn't know it was there wouldn't know that it announced from 227 00:39:56.580 --> 00:40:13.020 Nancy Lynch: Outside the system into the critical region. So if process to then came in, it should be able to get the resource because it thinks it's alone, and then they both have the resource that that would contradict mutual exclusion. 228 00:40:14.250 --> 00:40:23.040 Nancy Lynch: Okay, so here's a picture process. Want to write it has to write to the shared variable and then it has to eventually get the resource. 229 00:40:24.120 --> 00:40:34.350 Nancy Lynch: OK, so now let's consider an execution which process one pauses right here. Just before was ready to write, it's in a state where it's about to right 230 00:40:35.730 --> 00:40:40.740 Nancy Lynch: At that point, let's let process to arrive and 231 00:40:42.150 --> 00:40:48.540 Nancy Lynch: It arrives, just one process. One is part of process one hasn't written the process to fix. It's alone. 232 00:40:49.830 --> 00:40:59.460 Nancy Lynch: And it gets the resource because it can't see process one at all along the way it has to write the shared variable same reason as before. 233 00:41:00.720 --> 00:41:08.730 Nancy Lynch: Okay, so it comes in, gets the resource and now at this point. So we're just looking at this branch of the execution. 234 00:41:09.960 --> 00:41:12.000 Nancy Lynch: At this point, let's let 235 00:41:13.440 --> 00:41:23.190 Nancy Lynch: Process one resume. The first thing it does is to write the shared variable. But what does it do it overrides whatever process to wrote there. 236 00:41:24.510 --> 00:41:29.010 Nancy Lynch: So now process to his presence isn't visible, any longer. 237 00:41:30.360 --> 00:41:41.850 Nancy Lynch: Okay, so it overrides whatever process to wrote there and then process, one can tell process to is there so it will behave exactly the way it did in the original execution. 238 00:41:43.020 --> 00:41:52.410 Nancy Lynch: Advancing and getting the resource, it will overwrite and then it gets the resource. And at this point, you have two processes. 239 00:41:53.460 --> 00:41:58.320 Nancy Lynch: In the critical section and you violated the mutual exclusion condition. 240 00:41:59.970 --> 00:42:03.030 Nancy Lynch: Okay, at this point, I usually look to see if people 241 00:42:04.650 --> 00:42:08.070 Nancy Lynch: Are more or less following that I can't see anything. So, oh well. 242 00:42:09.300 --> 00:42:17.580 Nancy Lynch: Okay, so it's a contradiction. If more processes you need more variables. And it's the same kind of argument. 243 00:42:18.570 --> 00:42:28.920 Nancy Lynch: For in processes, you need to ensure variables. It's more intricate the construction, but it uses the same ideas writing to a shared variable overrides the previous contents. 244 00:42:29.340 --> 00:42:39.270 Nancy Lynch: And then press can see only its own state and the values that it actually reads from the shared variables. So using those facts you get this extended 245 00:42:40.320 --> 00:42:41.190 Nancy Lynch: Impossibility. 246 00:42:44.850 --> 00:42:46.680 Nancy Lynch: OK, so 247 00:42:48.240 --> 00:42:49.110 Nancy Lynch: Moving along 248 00:42:51.870 --> 00:42:51.960 I'm 249 00:42:53.430 --> 00:43:08.700 Nancy Lynch: Going to look at two results on consensus on listing several papers. The first three or four the synchronous model. The first one is the lower bound and time for consensus in the synchronous model. 250 00:43:09.750 --> 00:43:28.770 Nancy Lynch: The next paper actually extends that two cases consensus, where you can agree on K values, instead of one, and this one is a good paper that gives unifies many impossibilities proofs are distributed consensus problem and then I'll talk about the NLP result. 251 00:43:33.000 --> 00:43:42.360 Nancy Lynch: Okay, this is just the same slides before defining the consensus problem. Recall everyone starts to the value they have to agree on a value in the set. 252 00:43:43.770 --> 00:43:50.940 Nancy Lynch: And they have to satisfy the agreement property and validity property that says they all start with the same value. That's the only decision value. 253 00:43:54.180 --> 00:43:59.670 Nancy Lynch: OK, so the lower bound on the number of rounds for consensus and synchronous systems. 254 00:44:00.960 --> 00:44:05.400 Nancy Lynch: Okay, the starting point here was to notice that all the known algorithms for 255 00:44:06.420 --> 00:44:15.060 Nancy Lynch: fault tolerant agreement and synchronous systems had used at least f plus one rounds, where there's F possible failures. 256 00:44:16.620 --> 00:44:29.580 Nancy Lynch: And so, you wonder why is that it doesn't have to be the case. So turns out that's inherent the F plus one rounds are actually needed in the worst case, even if you're only talking about simple stopping failures. 257 00:44:31.260 --> 00:44:31.650 Nancy Lynch: Okay. 258 00:44:32.790 --> 00:44:42.600 Nancy Lynch: So all right, brief idea assume that for contradiction that you have an F round agreement algorithm that tolerates F failures. 259 00:44:45.180 --> 00:44:51.780 Nancy Lynch: And will assume to make things simple, but this is all without loss of generality that you have an end node complete graph. 260 00:44:52.800 --> 00:45:05.100 Nancy Lynch: The decision said is just zero and one decisions are reached at the end. And everybody send something to everyone else, that every round. Okay, so these are just for simplicity 261 00:45:07.800 --> 00:45:25.530 Nancy Lynch: Okay, so let's just look at the base case. And I'll just give a hint at the next case. So the base case is just, you can't have an end process one fault stopping agreement algorithm that always decides at the end of round one. 262 00:45:27.390 --> 00:45:43.110 Nancy Lynch: Okay, so you need at least two rounds to tolerate one failures, the same. And we know this. Okay, prove a contradiction. Suppose you have an algorithm for and processes that only works in one round and tolerance one failure. 263 00:45:44.850 --> 00:45:47.010 Nancy Lynch: Okay, so this is a new kind of argument. 264 00:45:48.150 --> 00:45:52.890 Nancy Lynch: You construct a whole chain of executions, to get the contradiction. 265 00:45:54.300 --> 00:46:02.340 Nancy Lynch: So, I mean, in the last examples. You just saw particular executions, but here is going to systematically construct a whole chain of executions. 266 00:46:03.150 --> 00:46:15.570 Nancy Lynch: Each execution in the chain has at most one failure. The first execution will be designed so that it has to have decision day zero and the last one would have to have a decision, though, you want 267 00:46:16.740 --> 00:46:32.070 Nancy Lynch: But any two consecutive executions in the chain are going to look the same to some process that's non faulty in both of the executions. So what does that mean, it means that process has to decide the same in both of those executions. 268 00:46:33.210 --> 00:46:50.100 Nancy Lynch: So those two executions must have the same unique decision downloads. So you're creeping your way from all zero from guaranteed zero decision to guarantee one decision, but at every step along the way you have to maintain the same decision see a contradiction that way. 269 00:46:52.320 --> 00:47:08.670 Nancy Lynch: How might you do this. Okay, start with an execution where everyone is input zero and there are no failures and ends with an execution where everybody is in one and there are no failures. Okay, so here's everybody with zero 270 00:47:10.020 --> 00:47:16.680 Nancy Lynch: initial values. Everyone succeeds in communicating with everyone and we know the final decisions have to be zero. 271 00:47:19.200 --> 00:47:30.330 Nancy Lynch: So I'm going to try to work my way to a similar execution, where they all start with one. Okay. So I'll start with this initial execution. 272 00:47:31.080 --> 00:47:44.250 Nancy Lynch: And now I'm going to creep slowly toward the other end of the chain. My first execution in the chain will involve removing the one message from process one to process to so that's missing here. 273 00:47:45.750 --> 00:47:58.560 Nancy Lynch: Okay, so these two executions look the same to everybody except presence minor process to so they're going to decide the same in both executions. So again here zero is the only possible decision value. 274 00:48:01.200 --> 00:48:05.520 Nancy Lynch: Okay, next execution. All right, now you can remove the message from 123 275 00:48:06.720 --> 00:48:07.620 Nancy Lynch: You still have 276 00:48:08.640 --> 00:48:12.480 Nancy Lynch: Processes that can tell the difference between. Now these two 277 00:48:13.590 --> 00:48:15.360 Nancy Lynch: And so they will decide in the same way. 278 00:48:17.700 --> 00:48:17.940 Right. 279 00:48:19.050 --> 00:48:21.810 Nancy Lynch: And finally, you remove the message from one to four. 280 00:48:22.950 --> 00:48:25.950 Nancy Lynch: Still indistinguishable. Some non faulty process. 281 00:48:27.450 --> 00:48:30.210 Nancy Lynch: And you just keep going but 282 00:48:31.440 --> 00:48:46.860 Nancy Lynch: Once you've removed all of process ones messages you want. You have to do something else. You change process one input from zero to one. So you can do that because process one isn't sending me messages. 283 00:48:48.120 --> 00:48:54.090 Nancy Lynch: So no one else can tell the difference with their process one started with zero or started with the one 284 00:48:57.060 --> 00:49:05.280 Nancy Lynch: Okay, so in one step, we change the value. So now we can say, now I want to change the value for two, but I can't just start removing messages from to 285 00:49:05.880 --> 00:49:15.630 Nancy Lynch: Why, because we're only allowed to have one failure in each execution and already processed one has failed because all the messages are missing. 286 00:49:16.590 --> 00:49:29.190 Nancy Lynch: So now we can't just say, what's the process to fail as well. So instead we first replace the missing messages, one at a time until we get this kind of setup reprocess one is faulty anymore. 287 00:49:31.170 --> 00:49:39.180 Nancy Lynch: But starts with a one and everyone else starts with zero. So now I think you can see the pattern we do the same play the same game to change the initial value. 288 00:49:39.870 --> 00:49:57.900 Nancy Lynch: Of process to and then do process three and eventually we reached the end of the chain where everybody has input one tiny, tiny changes that every step in this chain maintain the decision value because consecutive executions look the same to someone 289 00:49:59.610 --> 00:50:00.000 Okay. 290 00:50:02.820 --> 00:50:15.360 Nancy Lynch: Alright, so the discontinued is that you have a second theorem that says there's no process and process to fault stopping agreement algorithm in which everybody decides at the end of room to 291 00:50:16.860 --> 00:50:25.650 Nancy Lynch: And you just have a more complicated chain. We're going to have a chain of executions. Now we're allowed to failures at each execution in the chain. 292 00:50:26.370 --> 00:50:38.790 Nancy Lynch: Again, we start with all zeros and no failures and with all ones and no failures and maintain the property to each pair of consecutive executions is indistinguishable to some non fault the process. 293 00:50:40.080 --> 00:50:46.200 Nancy Lynch: Okay. So, for example, how would we change the process ones initial value from zero to one. 294 00:50:47.940 --> 00:50:55.590 Nancy Lynch: Well, okay, we want to work toward killing process one at the beginning of execution. So we can change its initial value. 295 00:50:57.480 --> 00:50:58.110 Nancy Lynch: Okay. 296 00:51:00.750 --> 00:51:16.470 Nancy Lynch: All right, well, we can, how are we going to remove all of process once messages we can remove all this one it's around two messages. Just the way we did before, one by one, but we have this little problem, we can then start moving it's round one messages, because suppose 297 00:51:17.700 --> 00:51:25.740 Nancy Lynch: For example, we removed the round one message from process one to process to it wouldn't look the same to anybody. 298 00:51:26.880 --> 00:51:37.440 Nancy Lynch: Those two executions would not look the same to anybody because removing this one message still allows process to to tell everybody about the failure. 299 00:51:39.210 --> 00:51:50.100 Nancy Lynch: Okay, so you have this picture, either the message is there and process to can tell everyone you got the message. When the message is not there and process to can tell everybody that he didn't get the message. 300 00:51:51.150 --> 00:52:03.060 Nancy Lynch: So it's more complicated. And the trick here is just to use more steps to remove the round one message from one to two and then those steps you have to feel both process one and process to 301 00:52:04.230 --> 00:52:10.350 Nancy Lynch: So part of this is removing all of the tutors round two messages, one by one and replacing them one by one. 302 00:52:12.690 --> 00:52:19.020 Nancy Lynch: Yeah, not going to go through any more details like that, but it's a more complicated chain involving two failures at each step. 303 00:52:20.700 --> 00:52:21.120 Nancy Lynch: Okay. 304 00:52:22.710 --> 00:52:33.690 Nancy Lynch: So, all right. Finally, I get to the most famous resulted rounds mentioned reaching consensus and asynchronous systems. 305 00:52:37.590 --> 00:52:38.400 Nancy Lynch: Excuse me. 306 00:52:39.900 --> 00:52:43.230 Nancy Lynch: So the theorem is an asynchronous distributed system. 307 00:52:44.850 --> 00:52:56.040 Nancy Lynch: In which at most one process may fail by just simply stopping without any warning, it's impossible for the non faulty processes to reach agreement reliably 308 00:52:59.220 --> 00:53:07.830 Nancy Lynch: So this is a possibility hold, even if this most one process that ever fails and the fail process just stops. This is 309 00:53:08.370 --> 00:53:16.410 Nancy Lynch: Quite surprising. It seems counterintuitive because you have so many processes and we know that it most, one is going to fail. 310 00:53:16.920 --> 00:53:34.320 Nancy Lynch: Then couldn't all the rest of them managed to ignore that process that might have failed a brief and then later tell the remaining process with the answer is in case he hasn't actually fail. I'm certainly steam. So, but that doesn't work. 311 00:53:36.660 --> 00:53:53.040 Nancy Lynch: And the proof goes like this. I contradiction. Suppose you have a one fault tolerant asynchronous algorithm itself consensus. And again, we just use the problem requirements to show that something has to go wrong. Okay, the value said can still be just binary 312 00:53:54.630 --> 00:54:13.620 Nancy Lynch: Okay, so let's look at an execution. Now it's not proceeding round by round. They synchronous and execute execution knows a sequence of steps were in one step one process gets one message updated state and sends out a finite number of other messages which go on to the channels. 313 00:54:15.840 --> 00:54:22.050 Nancy Lynch: That are delivered yet to just somebody receives a message and sends a bunch of resulting messages. 314 00:54:24.960 --> 00:54:31.980 Nancy Lynch: Okay, we can assume that every message that sent eventually gets delivered, but we don't have any time bounce here. This is a synchronous 315 00:54:34.650 --> 00:54:43.620 Nancy Lynch: Okay, so this this execution produces a sequence of configurations of the system global configurations of all processes and all the channels. 316 00:54:45.330 --> 00:54:52.650 Nancy Lynch: Okay, if you have an execution which everybody starts with zero, you know, the allowable decision is zero. And the same for one 317 00:54:54.300 --> 00:54:58.290 Nancy Lynch: If you have mixed inputs. You're allowed to have either decision. 318 00:55:00.450 --> 00:55:11.640 Nancy Lynch: And the core of the argument is to prove that the algorithm has to have a certain kind of pattern of configurations, where a decision can get made. 319 00:55:12.570 --> 00:55:27.540 Nancy Lynch: A pattern consisting of four configurations. Let's call them C zero C 131 so d zero results from see one just by having a particular process I receiving a particular message me 320 00:55:29.340 --> 00:55:39.180 Nancy Lynch: Okay configuration d one results from configuration see one in the same way. Same process I receiving the same message me 321 00:55:41.400 --> 00:55:52.590 Nancy Lynch: And now the relationship between zero and c one c one follows from C zero and one step where some process PJ receives a message m prime 322 00:55:54.660 --> 00:55:55.170 Nancy Lynch: Okay. 323 00:55:56.220 --> 00:55:56.880 Nancy Lynch: So, 324 00:55:58.170 --> 00:56:21.780 Nancy Lynch: We have this this funny configuration instead of configurations and d zero determines that the final decision will be zero. It's called zero balance, the only decision that's possible after this to figuration is zero, the only decision that's possible after this configuration is one 325 00:56:23.190 --> 00:56:33.750 Nancy Lynch: So somehow a decision is getting made in this very simple, very simple way localizing a decision to a particular pattern of configurations 326 00:56:34.740 --> 00:56:51.750 Nancy Lynch: Turns out to get to this point you make an argument that says if not if you never have this kind of a pattern. You could make the algorithm continue executing forever all processes continuing to take steps and no one ever deciding which would contradict the termination requirement. 327 00:56:53.190 --> 00:56:55.650 Nancy Lynch: So yeah, eventually you get down to this. 328 00:56:57.180 --> 00:57:00.300 Nancy Lynch: And now you just have to consider too simple cases. 329 00:57:01.530 --> 00:57:05.610 Nancy Lynch: Supposed to suppose these are two different processes I NJ are two different processes. 330 00:57:06.750 --> 00:57:16.650 Nancy Lynch: Well, okay then you're looking at a situation where a different process receive some message process I received some message. 331 00:57:17.910 --> 00:57:19.920 Nancy Lynch: What happens if you let 332 00:57:21.150 --> 00:57:25.920 Nancy Lynch: Process j receive a message in prime after d zero 333 00:57:27.990 --> 00:57:34.740 Nancy Lynch: Well, since it's after the zero, you still have to be zero balance because you really determine that the decision has to be zero. 334 00:57:36.420 --> 00:57:38.730 Nancy Lynch: So we have if we deliver 335 00:57:40.230 --> 00:57:47.220 Nancy Lynch: MTI and then and prior to PJ, you get zero balance. But if you do in the other order, you get one balance. 336 00:57:49.680 --> 00:57:55.920 Nancy Lynch: But these are two steps that happened at completely different processes. So the relative order can possibly matter. 337 00:57:56.700 --> 00:58:05.700 Nancy Lynch: You know I'm receiving a message here and receiving another message there and either case, I just put some output messages on the channels, doesn't matter what order that happens. 338 00:58:06.840 --> 00:58:23.220 Nancy Lynch: So it can't be two different processes. So now you're down to a case where they must be the same process i equals j and the only thing that's happening here is I could be receiving one message em or different message in prime and somehow that's going to make the difference. 339 00:58:24.300 --> 00:58:36.990 Nancy Lynch: But suppose you started from C zero. So, another key is, we have failure to play with. We're, we're trying to tolerate a single failure. So what you can do is just say fail process nine 340 00:58:38.100 --> 00:58:56.670 Nancy Lynch: Everybody else can take steps from this configuration. And we know that the since you can tolerate and failure of this one process I they eventually have to reach a decision, but if you apply that same execution from t zero or d one 341 00:58:57.690 --> 00:58:59.790 Nancy Lynch: You'll still get the same decision. 342 00:59:01.560 --> 00:59:08.430 Nancy Lynch: But we know that anything after the zero as opposed to reach a decision of zero. And after the one supposed to reach a decision of one 343 00:59:09.690 --> 00:59:14.520 Nancy Lynch: Other processes can distinguish which of these cases they're running in because 344 00:59:16.440 --> 00:59:31.560 Nancy Lynch: They're not we're not looking at the internals of process I were just looking at everybody else. So this yields a contradiction because they can't distinguish a situation where they have to decide zero from a situation where they have to decide one 345 00:59:33.690 --> 00:59:40.110 Nancy Lynch: Okay, so I suggested this looks kind of mysterious. This is a paper, you probably want to take a look at 346 00:59:41.310 --> 00:59:44.970 Nancy Lynch: Okay, now I'm supposed to finish precisely at 12 347 00:59:48.090 --> 00:59:56.760 Rance Cleaveland: That's bizarre. But I think we have the we have the zoom as a word, as opposed to the room until 1230 but I'd like to leave some time for questions. 348 00:59:57.720 --> 00:59:58.890 Nancy Lynch: Well alright so 349 00:59:59.820 --> 01:00:16.530 Nancy Lynch: I think I'll skip the the CAP theorem part and probably the wireless network march and just skip to something near the end of the talk. So here the consensus. The upshot of this is very important problem and practice. 350 01:00:17.850 --> 01:00:26.130 Nancy Lynch: As I said, For database permit, for example, and the FOP results shows and limitations on the kinds of algorithms, you can hope to find 351 01:00:27.540 --> 01:00:35.370 Nancy Lynch: Know this problem has to be solved in practice. So what could you do, you could use rather than this, you can rely on timing assumptions. 352 01:00:36.090 --> 01:00:49.050 Nancy Lynch: Or you can very carefully weaken the problem requirements so that you always have agreement and validity, but only require termination within the system behavior stabilizes. 353 01:00:49.890 --> 01:01:00.570 Nancy Lynch: So that's what's in fact been done. Okay, so there's lots of other possibility results that have been proved. I'm going to skip through the CAP theorem. 354 01:01:02.250 --> 01:01:11.910 Nancy Lynch: Okay. Oh, foundations, well, all of this is based on Iowa commoner and things like that so 355 01:01:13.620 --> 01:01:14.010 Nancy Lynch: Yeah. 356 01:01:15.420 --> 01:01:15.900 Nancy Lynch: I'm 357 01:01:17.370 --> 01:01:22.590 Nancy Lynch: Not sure what to say about that. Maybe I've already said enough. These are papers to look at to 358 01:01:24.570 --> 01:01:27.060 Nancy Lynch: Review the Iowa Tanga framework. 359 01:01:28.290 --> 01:01:46.200 Nancy Lynch: And this is just an example of a system described using interacting I automaton implementing a higher level abstract component also an automaton. This is Australian composition and abstraction relations or automaton. 360 01:01:47.460 --> 01:01:51.630 Nancy Lynch: But we also did times versions of this. There's a book about that. 361 01:01:52.680 --> 01:01:58.470 Nancy Lynch: And probabilistic versions. One was just recognized at concur last year. 362 01:01:59.880 --> 01:02:04.740 Nancy Lynch: This is work mainly by Roberto the gala on probabilistic automaton. 363 01:02:06.060 --> 01:02:12.540 Nancy Lynch: Okay, so the algorithms for new distributed systems are basically 364 01:02:14.460 --> 01:02:28.680 Nancy Lynch: things we've been working on. More recently, the difference is that these new types of systems noise and uncertainty and change predominate. So we're looking particularly at wireless networks and biological systems. 365 01:02:30.090 --> 01:02:33.090 Nancy Lynch: So, but it's the same type of research again. 366 01:02:34.290 --> 01:02:51.600 Nancy Lynch: Again, we're looking at abstract models for problems and algorithms new algorithms ways to prove correctness and performance properties in possibility results and they're still work needed on general foundations for these very new types of assistance. 367 01:02:53.520 --> 01:02:54.270 Nancy Lynch: So, 368 01:02:56.130 --> 01:03:02.820 Nancy Lynch: I'm going to skip the wireless networks entirely. It's really fun stuff. But I'm not enough time so 369 01:03:04.140 --> 01:03:11.640 Nancy Lynch: Hopefully we can post the slides. So if people are interested, they can take a look okay for biological systems. I'll just say a few words. 370 01:03:12.420 --> 01:03:22.320 Nancy Lynch: The kinds of distributed algorithms we're looking at our insight colonies, a little bit on developing organisms and quite a bit more now on brain networks. 371 01:03:24.360 --> 01:03:34.380 Nancy Lynch: These are very simple models. I mean, the components are rather simple the algorithms tend to be fairly simple not elaborate 372 01:03:35.580 --> 01:03:45.510 Nancy Lynch: The goals of this work are both to try to help understand biological systems in a new way. As algorithms, but also 373 01:03:46.020 --> 01:03:53.580 Nancy Lynch: Feedback and go the other way, once you understand how the biological systems operate, sometimes you can get new ideas for engineering algorithms. 374 01:03:53.880 --> 01:04:05.130 Nancy Lynch: Like machine learning algorithms and results. We've got our understanding of the brain operation. Okay, this is these are some references on our insect colony work. 375 01:04:07.320 --> 01:04:22.680 Nancy Lynch: Problems. We've looked at it included task allocation searching problems very interesting consensus problem agreeing on a new nest and problems were inset colonies have to estimate how many other insects are 376 01:04:24.540 --> 01:04:25.350 Nancy Lynch: Actually where 377 01:04:26.460 --> 01:04:29.610 Nancy Lynch: We've been publishing some of this work in biology journals. 378 01:04:31.680 --> 01:04:32.610 Nancy Lynch: And then our 379 01:04:33.960 --> 01:04:36.570 Nancy Lynch: Recent work on brains uses 380 01:04:37.800 --> 01:04:48.570 Nancy Lynch: A stochastic spiking neural network model we studied problems like winner take all which is a kind of leader election from the focus and attention in the brain. 381 01:04:49.530 --> 01:04:59.940 Nancy Lynch: Waves of coding concepts in neural networks and ways of learning. And here's just a sampling of some of our recent papers on that. 382 01:05:01.380 --> 01:05:10.530 Nancy Lynch: OK, so the conclusions are that we've worked on the theory for distributed systems to help understand capabilities and limitations. 383 01:05:11.580 --> 01:05:26.760 Nancy Lynch: These are the different types of work that's been involved and this work has covered many kinds of systems over the year starting with distributed data management systems and moving through communication systems and currently different kinds of biological systems. 384 01:05:27.810 --> 01:05:35.700 Nancy Lynch: It's the same kind of aims, the same lens of algorithmic lens that we're applying to all of these different areas. 385 01:05:37.050 --> 01:05:44.310 Nancy Lynch: And there's still a lot more to do. For those of you who are interested in research, especially in these newer areas. 386 01:05:45.630 --> 01:05:51.270 Nancy Lynch: And I have to thank my literally hundreds and hundreds of collaborators on all this work. 387 01:05:52.440 --> 01:05:53.100 Nancy Lynch: And thank you. 388 01:05:55.080 --> 01:06:08.100 Rance Cleaveland: All right. Thank you very much. Nancy, and indeed it is unusual giving a talk in this new normal situation is a word for us. Now, we would all be a clapping, I have to clap for everybody on behalf of everybody else. 389 01:06:08.910 --> 01:06:15.300 Rance Cleaveland: And including not only the people who are on the zoom meeting, but also the people who are attending via 390 01:06:16.440 --> 01:06:34.320 Rance Cleaveland: Webinar mechanism as well as on on YouTube. This is broadcast on YouTube, so I'll open the floor for questions. Now if you those who are attending. If you have a question, please use the Q AMP a feature in in zoom and I'll moderate the questions and read them out. 391 01:06:35.400 --> 01:06:44.040 Rance Cleaveland: I'll maybe get things started, though a lot of people start marshaling there of marshaling their questions. I'm, I'm curious in the in the 392 01:06:44.520 --> 01:06:57.990 Rance Cleaveland: In particular, in the sort of biological settings that you talked about before, but I'm adaptations, you had to make to the basic theoretical models of of have distributed computation. 393 01:07:00.090 --> 01:07:05.220 Nancy Lynch: Okay, so we're pretty well moved to or just using synchronous models in those cases. 394 01:07:07.590 --> 01:07:09.000 Nancy Lynch: The models are stochastic 395 01:07:10.980 --> 01:07:18.150 Nancy Lynch: For inset colonies, you basically have a sort of population protocol style model where you have a bunch of insects that 396 01:07:19.170 --> 01:07:21.780 Nancy Lynch: May bump into each other, but we 397 01:07:23.760 --> 01:07:35.790 Nancy Lynch: Need to have a simple kind of stage that could put it in different kinds of modes and modes would allow them to make certain decisions of the next action with certain probability their inputs. 398 01:07:38.580 --> 01:07:45.810 Nancy Lynch: correspond to what they see in their environment as well as interactions with other insects. 399 01:07:46.470 --> 01:07:59.130 Nancy Lynch: So yeah, I mean it's a sort of agent based model with some differences, but we stuck. Pretty much to keeping everything synchronous just because insects and biological systems have really good notions of time. 400 01:08:00.390 --> 01:08:05.220 Nancy Lynch: So there's no reason to think about, you know, arbitrary race conditions as we didn't meet 401 01:08:06.750 --> 01:08:16.920 Nancy Lynch: The data. The data system network kind of models for brain systems were using a sense to cast fix biking neural network. So that's I'm 402 01:08:17.910 --> 01:08:35.040 Nancy Lynch: Networks, where you have firing that occurs based on accumulated potential from neighboring neurons and for us stochastic so the firing isn't guaranteed to happen when you reach a threshold, but the incoming potential 403 01:08:36.120 --> 01:08:37.080 Nancy Lynch: Helps to 404 01:08:38.790 --> 01:08:40.770 Nancy Lynch: Determine the probability of fire. 405 01:08:42.360 --> 01:08:51.660 Nancy Lynch: So I'm in the probability automatically makes the algorithms noise tolerance, because if you're going to work with that type of probabilistic noise. 406 01:08:53.490 --> 01:08:58.320 Nancy Lynch: Yeah, you're going to be able to handle quite a lot of noise and uncertainty. 407 01:08:59.370 --> 01:08:59.850 Okay. 408 01:09:00.930 --> 01:09:09.240 Rance Cleaveland: So we have a question that came up on the on the Q AMP. A from you, Eugene. Have you discovered any errors and biological networks. 409 01:09:11.880 --> 01:09:14.190 Nancy Lynch: In real biology know 410 01:09:15.600 --> 01:09:29.280 Nancy Lynch: Biology seems to work. Okay. I mean, there are certainly biology papers that show bizarre behavior on the part of a real insight colony is like ants that start walking around in a circle forever until they die. 411 01:09:31.410 --> 01:09:35.460 Nancy Lynch: Yeah, I mean, these are simple state machines and maybe they have bugs. 412 01:09:38.250 --> 01:09:41.550 Rance Cleaveland: Thinking. Yeah. Yes, yes, yes. Well, in fact, 413 01:09:43.140 --> 01:09:47.370 Nancy Lynch: We need to find mistakes and a biological system. They are what they are, I guess. 414 01:09:49.020 --> 01:09:56.310 Rance Cleaveland: Right. And so there's a follow up saying that, indeed, he was he was interested in these sort of bizarre behaviors that are large 415 01:09:56.940 --> 01:10:13.620 Nancy Lynch: Know, well you can analyze that kind of thing. We haven't done that mode but um yeah I mean you can describe by describing the insects as simple state machines, you can try to reproduce the behavior we're doing a lot of simulations now. 416 01:10:14.910 --> 01:10:22.740 Nancy Lynch: Rather than plunging right into proving theorems. We testing out because there's a lot going on. And you also want to play with the parameter values. 417 01:10:25.020 --> 01:10:28.500 Nancy Lynch: Right, right. Once we get once we get observed behavior. We try to 418 01:10:28.500 --> 01:10:29.190 Nancy Lynch: Prove it. 419 01:10:31.440 --> 01:10:40.440 Rance Cleaveland: Right, I guess the other thing too, and in line with that is that you have in the case of biology. You also have this sort of this sort of great 420 01:10:41.700 --> 01:10:46.320 Rance Cleaveland: I don't know what the right word is debugging engine and evolution that 421 01:10:47.880 --> 01:10:52.500 Rance Cleaveland: If too many and start following each other in a circle, then the species is imperiled 422 01:10:52.770 --> 01:10:53.760 Rance Cleaveland: Right, I'm 423 01:10:55.350 --> 01:10:55.590 Rance Cleaveland: Some 424 01:10:56.910 --> 01:10:57.420 Rance Cleaveland: Sort of 425 01:10:59.370 --> 01:11:01.890 Nancy Lynch: Behavior like that. That's true, yeah. 426 01:11:02.160 --> 01:11:03.240 Rance Cleaveland: Yeah, no. 427 01:11:04.320 --> 01:11:04.830 Rance Cleaveland: Um, 428 01:11:06.030 --> 01:11:13.860 Rance Cleaveland: I had another word for questions have come in. I had another another question to this relates to something you mentioned at the beginning of your talk about the 429 01:11:15.210 --> 01:11:22.260 Rance Cleaveland: The 2007 Dykstra award paper that was, I guess, published in 88. Is that right, am I remembering that 430 01:11:25.200 --> 01:11:29.850 Rance Cleaveland: And you you alluded to the fact that the paper was sort of 431 01:11:30.900 --> 01:11:46.770 Rance Cleaveland: Published and AND WAS SATIN dormant from a practical perspective, until, you know, a decade or so later when it was suddenly picked up, you know what, what led the how. How about happened. 432 01:11:47.910 --> 01:11:49.500 Rance Cleaveland: What the story was behind 433 01:11:50.430 --> 01:11:55.830 Nancy Lynch: people needing to build certain kinds of systems cloud storage is a big thing and I needed to 434 01:11:57.510 --> 01:11:58.860 Nancy Lynch: You know, try to maintain 435 01:12:00.150 --> 01:12:05.190 Nancy Lynch: Correctness. And the number of failures of some number of servers. I'm not. I'm guessing you 436 01:12:05.760 --> 01:12:06.090 Nancy Lynch: Know I'm 437 01:12:06.780 --> 01:12:07.710 Wondering, because 438 01:12:08.730 --> 01:12:20.610 Rance Cleaveland: It's something sits for a while. A lot of times people just you know rediscover AND or OR without realizing it, or you know we discover something wrong, that are rediscover sort of 439 01:12:21.240 --> 01:12:35.340 Rance Cleaveland: An error erroneous version. I'm just really intriguing also encouraging to me that the system of people in this case we're able to uncover and put to good use the results that you've proven 440 01:12:37.110 --> 01:12:49.350 Nancy Lynch: Because there are people like DeLeon Maki LAN parties to stand between theory and the system so they know the theory and when they see the systems problem they can call him some of the very learned and apply it. 441 01:12:49.980 --> 01:12:51.660 Rance Cleaveland: Right, right, right. 442 01:12:56.460 --> 01:13:12.720 Rance Cleaveland: There was another another comment in the and the q AMP. A looting of picking up this theme of evolutionary evolution saying it would be weird if every security species we observed wasn't an evolutionary liberalism know behavioral bugs and 443 01:13:14.100 --> 01:13:30.360 Rance Cleaveland: Is what's taken, of course, one what evolutionary equilibrium is from the species to die out, I guess. And quite frankly, if you are in an equilibrium. That's probably the liberty to wind up that because the environmental conditions will ensure continue to change. Even the species doesn't 444 01:13:31.980 --> 01:13:37.440 Nancy Lynch: Well, certainly. Yeah, there's a species like fruit flies that evolved over many, many generations. 445 01:13:38.670 --> 01:13:48.600 Nancy Lynch: And you might think they're in kind of a stable configuration as far as the mechanisms that they use some work by socket and a blocker is worth looking at their 446 01:13:49.980 --> 01:13:55.260 Rance Cleaveland: Yeah, yeah. So I have another another question just came in from funder. 447 01:13:56.280 --> 01:13:57.810 Rance Cleaveland: On the on the 448 01:13:59.640 --> 01:14:15.600 Rance Cleaveland: Let's see. I'll just read you the question, how to goals distributed decision making translated in set colonies. They tend to be more complex or partial solutions more acceptable for instance, perhaps, you don't need all or that many in fact insects to come to an agreement. 449 01:14:17.070 --> 01:14:18.810 Rance Cleaveland: Enable any simple vacations. 450 01:14:23.760 --> 01:14:35.130 Nancy Lynch: And we say yes, but you get algorithms with certain probabilities of failure, but so I'm thinking of an example that we're working on currently and this is 451 01:14:36.480 --> 01:14:39.660 Nancy Lynch: protocols that answer us to find a new nest. 452 01:14:40.170 --> 01:14:54.660 Nancy Lynch: So this is a consensus problem, but it's a consensus problem involving random searching and trying to persuade you decide what you think is the best nest and trying to persuade others. And once you decide and forcing others to to follow you. 453 01:14:55.140 --> 01:15:03.330 Nancy Lynch: Know, observing in the ant experiments you observed a certain probability that this doesn't exceed doesn't succeed. Within a short amount of time. 454 01:15:04.260 --> 01:15:22.650 Nancy Lynch: Um, so, you know, that kind of probability. Maybe unacceptable in an engineering system also a lot of those errors are self correcting like if you wait a day, all we have will wind up in the same nest, whereas the initial execution of the algorithm didn't accomplish that. 455 01:15:23.970 --> 01:15:37.200 Nancy Lynch: So yeah, I mean they are noise tolerant and self correcting but there's there's new aspects like short timeframe succeeds with a certain probability much longer timeframe. 456 01:15:38.850 --> 01:15:40.500 Nancy Lynch: it succeeds with a higher probability 457 01:15:42.600 --> 01:15:42.990 Nancy Lynch: So we 458 01:15:43.020 --> 01:15:51.240 Nancy Lynch: We've been working with insect biologists done on this project and we just submitted paper to our own biology, the journal 459 01:15:52.260 --> 01:16:04.710 Nancy Lynch: Is based on on a simulator which acts just like real ounce of a certain species and you can mimic the observed behavior very closely with the experiment experimental biologists and thumbs. 460 01:16:06.510 --> 01:16:10.560 Nancy Lynch: But now we get to play with the simulations, see what else we can make the species do 461 01:16:13.980 --> 01:16:24.300 Rance Cleaveland: Some of the just a curiosity, what causes the these decisions to be made that we need a new colony. Is there some sort of external environmental stressors. 462 01:16:24.300 --> 01:16:24.780 Rance Cleaveland: Yeah, it's 463 01:16:25.980 --> 01:16:28.470 Nancy Lynch: Kind of like the experimentalists Rex the nest. 464 01:16:29.340 --> 01:16:31.740 Rance Cleaveland: Okay. Okay, so then the answer. 465 01:16:32.100 --> 01:16:34.020 Nancy Lynch: Hey, I'm forced to go find a new nest. 466 01:16:34.200 --> 01:16:37.140 Rance Cleaveland: Okay, so they're highly motivated to do so presumably 467 01:16:37.710 --> 01:16:38.190 Nancy Lynch: Yeah. 468 01:16:38.790 --> 01:16:39.180 Mm hmm. 469 01:16:40.800 --> 01:16:43.920 Nancy Lynch: It does this in the lab. They also do it in the desert. We visited 470 01:16:45.090 --> 01:16:53.490 Nancy Lynch: Yeah, that's another story. But we looked at some activities announced in in the desert. The people in Arizona. 471 01:16:54.600 --> 01:17:00.840 Nancy Lynch: Study ants because they have quite a few local and available for study. 472 01:17:03.000 --> 01:17:03.600 Rance Cleaveland: Sort of the 473 01:17:04.800 --> 01:17:12.240 Rance Cleaveland: Are these algorithms typically fully decentralized or is are there sort of leader. Is there something like leader election that goes on in these amped oh 474 01:17:12.240 --> 01:17:13.560 Nancy Lynch: Nothing is centralized 475 01:17:13.950 --> 01:17:15.900 Nancy Lynch: Nothing centralized in that 476 01:17:16.650 --> 01:17:24.120 Nancy Lynch: Well, I mean, in the brain algorithms. Yeah, I mean, there's, there could be some chemical that's flooding double mean can flood neurons. And that's kind of a 477 01:17:24.990 --> 01:17:37.170 Nancy Lynch: Little thing, but in the insect colonies. No, there isn't any central direction of any come. Okay. Okay. You might think there's a queen and the queen would tell the others what to do, but she does not 478 01:17:37.890 --> 01:17:38.460 Okay. 479 01:17:39.510 --> 01:17:40.680 Rance Cleaveland: I was wondering about that. 480 01:17:41.790 --> 01:17:51.450 Rance Cleaveland: And so does that mean that like bees and the answer, different or or B is also sort of decentralized in the midst of the sort of all powerful queen is sort of not 481 01:17:53.430 --> 01:17:57.900 Nancy Lynch: Because that's a different, it's pretty much the same. These are very similar. 482 01:17:58.440 --> 01:17:58.830 Nancy Lynch: Right. 483 01:17:59.340 --> 01:18:07.020 Nancy Lynch: So much smarter have more elaborate behaviors that can indicate directions, for example. 484 01:18:09.150 --> 01:18:10.440 Nancy Lynch: Dancing in a certain way. 485 01:18:12.180 --> 01:18:18.480 Nancy Lynch: Fascinating behavior, maybe has limitations on how much that contributes to engineering 486 01:18:19.980 --> 01:18:36.870 Nancy Lynch: Because this the insects work under a lot of constraints that many computers don't have. But on the other hand, they also have strategies for resilience robustness highly fault tolerant noise tolerant and we can learn from from their strategies. 487 01:18:37.260 --> 01:18:38.520 Rance Cleaveland: Right, right. 488 01:18:42.660 --> 01:18:45.660 Rance Cleaveland: Okay. Well, I think we've exhausted the questions so 489 01:18:47.670 --> 01:18:57.630 Rance Cleaveland: Why don't we go ahead and conclude for today. So I'll just remind the NSF attendees, that there is a session from I think it's from three to four today. 490 01:18:59.100 --> 01:19:04.650 Rance Cleaveland: For sort of a more one on one interaction with with with Nancy 491 01:19:05.910 --> 01:19:15.150 Rance Cleaveland: In the meantime, Nancy. Thanks for a fabulous talk. It was on the one hand, great to hear revisit those results that were so influential me 492 01:19:16.230 --> 01:19:36.630 Rance Cleaveland: I've been so influential on me throughout my career. And I have to admit, I think we could spend another hour just talking about the the the insect and brain work to there so so much interesting going on there. So, um, so thank you again and we'll reconvene at that at three o'clock and 493 01:19:37.980 --> 01:19:43.530 Rance Cleaveland: I guess everybody at least on the East Coast now can go grab a sandwich or something and have some lunch. 494 01:19:44.850 --> 01:19:46.110 Nancy Lynch: And everybody location. 495 01:19:47.190 --> 01:19:49.050 Rance Cleaveland: It will. I think it's a different link. 496 01:19:49.980 --> 01:19:51.390 Rance Cleaveland: It. So I hope it's 497 01:19:51.810 --> 01:19:55.260 Rance Cleaveland: Do you have a do you have that link. I'll make sure that you get it. 498 01:19:56.700 --> 01:19:59.040 Nancy Lynch: Maybe send that to me. I don't remember it now. 499 01:19:59.550 --> 01:20:07.230 Rance Cleaveland: Yeah, yeah. So, Blaine, if you're online if you could just make sure that that Professor lunch gets the, the new link, that'd be great. 500 01:20:08.400 --> 01:20:11.400 Rance Cleaveland: Otherwise, we'll make sure that you get it, Nancy. I'll make sure it's email to you. 501 01:20:11.550 --> 01:20:13.050 Rance Cleaveland: Okay. All right. 502 01:20:13.230 --> 01:20:14.070 Rance Cleaveland: Thank you very much. 503 01:20:16.200 --> 01:20:16.560 Rance Cleaveland: All right. Bye bye.