WEBVTT

1
00:00:08.849 --> 00:00:24.450
Rance Cleaveland: Okay, let's go ahead and get started. So, um, welcome everybody to the next installment of the Distinguished Lecture Series sponsored by the Computer and Information Science and Engineering Directorate at NSF

2
00:00:24.990 --> 00:00:35.460
Rance Cleaveland: My name is rants Cleveland and I'm the Director of the computing and communication foundations division within sys and it's my great pleasure to introduce our speaker today.

3
00:00:36.630 --> 00:00:39.540
Rance Cleaveland: Professor Nancy Blanche from MIT.

4
00:00:40.740 --> 00:00:46.320
Rance Cleaveland: I am debated, how to frame my introduction, because I have to admit to have that

5
00:00:47.190 --> 00:00:55.770
Rance Cleaveland: Nancy's been a fixture of the research community as long as I've been aware of computing research and so I thought I would start with maybe a little

6
00:00:56.670 --> 00:01:06.030
Rance Cleaveland: Personal note in that regard. So I started graduate school and I'm hesitant to admit this but in 1982 the year is important because

7
00:01:06.540 --> 00:01:15.330
Rance Cleaveland: In 1982 Nancy and her co authors published what I think is the most, the most influential paper on theory of distributed systems.

8
00:01:15.870 --> 00:01:24.300
Rance Cleaveland: So I was a young graduate student and sort of trying to find my way work our way through the research literature, I was reading a lot of papers and understanding, almost none of them.

9
00:01:24.750 --> 00:01:42.180
Rance Cleaveland: And I remember coming across this paper and probably 1983 or so and reading it and not really understanding it, but understanding that I better understand it because there was something deep and fundamental in the impossibility result of consensus that was proven in that paper.

10
00:01:43.410 --> 00:01:44.850
Rance Cleaveland: Well, little did I know that

11
00:01:46.230 --> 00:01:57.720
Rance Cleaveland: That this paper would lead to other people lead to other papers and other papers and other papers that a lot of them by Nancy that

12
00:01:58.440 --> 00:02:11.160
Rance Cleaveland: Laid the firm mathematical foundations for understanding distributed computation. So I'm Nancy is, as I said, a professor at MIT, she also got her PhD at MIT and mathematics.

13
00:02:11.820 --> 00:02:23.100
Rance Cleaveland: Before that her bachelor's or Bs and also mathematics Brooklyn College and she's been at MIT for a number of years. I'm

14
00:02:25.590 --> 00:02:28.200
Rance Cleaveland: Super Bowl supervising students and conducting truly

15
00:02:29.280 --> 00:02:38.160
Rance Cleaveland: fundamental and important research in the area of distributed systems. She's won every award that I can think of in the

16
00:02:38.940 --> 00:02:47.730
Rance Cleaveland: In the theory of computing community. If I were to list all of these will use up most of the hour and I know we're here. You're here to hear her talk not hear me talk

17
00:02:48.150 --> 00:02:59.580
Rance Cleaveland: I will say that she's what at least one of these awards twice. She's also a an ACM fellow and a member of the National Academies of both science and engineering.

18
00:03:01.980 --> 00:03:07.710
Rance Cleaveland: I think I'll I will conclude my introduction with another sort of little aside.

19
00:03:09.660 --> 00:03:19.020
Rance Cleaveland: From time to time in my career, I've heard of. I've heard Nancy referred to as a force of nature. And this is a testament to her truly astonishing.

20
00:03:20.130 --> 00:03:25.140
Rance Cleaveland: Record for years and years and years and years of fundamental

21
00:03:26.400 --> 00:03:35.460
Rance Cleaveland: Results in a variety of the theoretical areas involving distributed systems. However, I sometimes think that that should be flipped around a little bit.

22
00:03:36.900 --> 00:03:42.570
Rance Cleaveland: Instead of referring to Nancy is a force of nature. I like to think of nature is being a force of Nancy

23
00:03:43.950 --> 00:03:56.010
Rance Cleaveland: And again, in recognition of her truly remarkable record and the very important research that she's done so without further ado, I'll turn the floor over to Nancy lunch.

24
00:03:57.990 --> 00:04:05.070
Nancy Lynch: Well, thank you. Excuse me. Thank you for that introduction. Ransom, thank you all for inviting me for this talk.

25
00:04:06.540 --> 00:04:07.560
Nancy Lynch: Can you hear me OK.

26
00:04:09.660 --> 00:04:24.480
Nancy Lynch: I assume. Yes. Okay. So this talk is going to be on a theoretical view of distributed systems. It's a very broad talk about what I have worked on over decades with many collaborators and students.

27
00:04:25.620 --> 00:04:31.350
Nancy Lynch: So yeah, so basically we've been involved in developing the theory for distributed systems.

28
00:04:32.760 --> 00:04:37.260
Nancy Lynch: Were trying to understand distributed systems in mathematical terms that includes

29
00:04:38.310 --> 00:04:42.000
Nancy Lynch: What they do what they are able to do and what the limitations are

30
00:04:42.630 --> 00:04:55.260
Nancy Lynch: And specifically, what is that, well, it is involved defining math mathematical models for problems that are solved and distributed systems and for the algorithms that are used to solve them.

31
00:04:56.010 --> 00:05:03.000
Nancy Lynch: Sometimes it's involved developing new algorithms and sometimes considering algorithms studying algorithms developed by others.

32
00:05:03.810 --> 00:05:15.570
Nancy Lynch: Involved producing proofs of correctness also performance fault tolerance properties and then as Ron mentioned, we've had a lot of work on proving in possibility results and lower bounce.

33
00:05:17.040 --> 00:05:22.650
Nancy Lynch: So these are expressing inherent limitations of distributed systems for solving problems.

34
00:05:23.970 --> 00:05:27.180
Nancy Lynch: And along the way, we've had to do a lot of work on developing

35
00:05:28.410 --> 00:05:33.840
Nancy Lynch: Basically concurrency theory type foundations for modeling and analyzing distributed systems.

36
00:05:34.590 --> 00:05:42.210
Nancy Lynch: So what kinds of systems. We started out working on distributed data management systems, mostly, and more recently, we're working on.

37
00:05:42.720 --> 00:05:57.900
Nancy Lynch: Communication systems with wired and wireless and very recently, my group is focused on biological systems which are systems like insect colonies developmental biology and brain networks.

38
00:05:59.580 --> 00:06:06.390
Nancy Lynch: Okay. So in this talk. So it does anybody keep their camera on no looks like all the cameras are dead.

39
00:06:07.710 --> 00:06:11.520
Nancy Lynch: Oh, well, if somebody can keep a camera on. That would be nice. It's

40
00:06:12.540 --> 00:06:14.160
Nancy Lynch: Nice to get a little reaction.

41
00:06:15.960 --> 00:06:26.490
Nancy Lynch: Okay. So in this talk, I'm going to. It's very broad brush talk, I'm going to talk about work we've done an algorithms for traditional distributed systems.

42
00:06:27.300 --> 00:06:39.300
Nancy Lynch: Talk about in possibility results and then very, very briefly talk about the Foundation's work and some of the newer work on algorithms for new types of distributed systems.

43
00:06:40.920 --> 00:06:54.480
Nancy Lynch: Okay, so for traditional distributed systems. Well, we began our work in this area with a study of mutual exclusion, the costs of achieving mutual exclusion and shared memory multi multi processor settings.

44
00:06:55.080 --> 00:07:00.630
Nancy Lynch: This is work with Mike Fisher Jim burns and others. And then we move from there into

45
00:07:01.260 --> 00:07:19.320
Nancy Lynch: Studying some abstract distributed network problems, including consensus approximate consensus clock synchronization. And so these two papers listed here you have some new algorithms for fault tolerant clock synchronization and fault tolerant approximate agreement.

46
00:07:21.420 --> 00:07:30.750
Nancy Lynch: There's a lot of references that you're going to see on the slides. So assuming that we can post the slides later, you might want to browse some, some of these papers later on.

47
00:07:31.590 --> 00:07:41.070
Nancy Lynch: The first example I'm going to go into in some detail is a paper with some good work and Larry stock motor on consensus in the presence of partial synchrony.

48
00:07:42.150 --> 00:07:56.850
Nancy Lynch: Okay, so this is a addresses the problem of exact agreement in the presence of failures, not just approximate agreement on. It's kind of a counterpoint to the fisher Lynch Patterson results that round was talking about.

49
00:07:58.590 --> 00:08:09.000
Nancy Lynch: I'll go back. I'll go over that result later in the talk but that shows impossibilities reaching fault tolerant consensus and asynchronous networks.

50
00:08:10.080 --> 00:08:19.110
Nancy Lynch: Whereas this paper by work Lynch and stock Meyer presents an approach to dealing with the limitation from the NFL P paper.

51
00:08:20.100 --> 00:08:27.180
Nancy Lynch: Also present several algorithms that use this approach and tolerate different types of failures and timing uncertainties.

52
00:08:27.870 --> 00:08:35.010
Nancy Lynch: Okay. Um, so yeah. Just as a side this papers been very well recognized lately and has won.

53
00:08:35.520 --> 00:08:47.580
Nancy Lynch: The Dykstra prize, but it wasn't very well stated the time we wrote it because it was written as a purely theoretical paper it's been kind of rediscovered lately because it's relevant for things like Blockchain algorithms.

54
00:08:48.270 --> 00:08:57.270
Nancy Lynch: Okay. So I'll start with them some excerpts from the word citation for the Dykstra prize. This was written by Dahlia mouthy

55
00:08:58.950 --> 00:09:06.810
Nancy Lynch: So the idea is that this paper is introducing a bunch of models for partial synchrony between synchronous and asynchronous.

56
00:09:07.980 --> 00:09:12.120
Nancy Lynch: And these are our models in which the consensus problem is solvable.

57
00:09:13.260 --> 00:09:20.580
Nancy Lynch: The claim in the citation is that this provided practitioners with the right tool for building fault tolerant systems.

58
00:09:21.690 --> 00:09:29.550
Nancy Lynch: The key is that safety, the safety properties are maintained at all times. Despite the impossibility of consensus.

59
00:09:30.030 --> 00:09:48.060
Nancy Lynch: And it progress that the thing that gets weakened is the progress condition that's facilitated. Not always, but just during periods when the system is stable. Okay. And then he goes on to say that these are the pillars on which modern fault tolerant systems are built

60
00:09:49.620 --> 00:10:05.760
Nancy Lynch: Okay, so just a quick review distributed consensus, you have a bunch of processors and and distributed network that have to agree on a value every processor starts with some initial preference and they have to agree on some value, but some of the processors might be faulty

61
00:10:07.440 --> 00:10:19.350
Nancy Lynch: People studied different kinds of failures. The key conditions are everybody has to agree. And if all the processors happened to start with the same initial value. That's the only allowed decision value.

62
00:10:21.360 --> 00:10:23.760
Nancy Lynch: Yeah, I mean, this condition is is sometimes

63
00:10:25.020 --> 00:10:32.550
Nancy Lynch: Strengthened to say that any decision in any case must be somebody initial value. And of course, we also want the algorithm to

64
00:10:33.210 --> 00:10:47.340
Nancy Lynch: Terminate we'd like all the non faulty processors to actually succeed in reaching agreement. So this problem arose originally in two places the database commit problem study by Jim Gray, way back

65
00:10:48.660 --> 00:10:51.180
Nancy Lynch: And the Byzantine Agreement problem.

66
00:10:52.200 --> 00:10:57.510
Nancy Lynch: Where the agreement was on aircraft altimeter readings that piece by piece shot back and forth.

67
00:10:59.700 --> 00:11:19.860
Nancy Lynch: Okay, so what does the paper. Do it considers a variety of partial synchrony model and a variety failure models gave algorithms that always reach agreement always guarantee the agreement and validity properties but termination is guaranteed in the case when the system's behavior stabilizes.

68
00:11:21.060 --> 00:11:28.380
Nancy Lynch: Okay, so if the system stabilizes for long enough, then we guarantee that agreement is actually reached

69
00:11:29.340 --> 00:11:43.830
Nancy Lynch: So I'll actually go into some detail the key algorithmic ideas are different processors, try to take charge of reaching agreement and in our paper, they do this in a rotating fashions is a paradigm, called a rotating coordinator

70
00:11:45.390 --> 00:11:47.130
Nancy Lynch: Paradigm, and you have to

71
00:11:48.210 --> 00:11:57.270
Nancy Lynch: Do some work to reconcile what happens if different coordinators, try to coordinate decisions have to make sure they don't contradict each other.

72
00:11:58.710 --> 00:12:02.880
Nancy Lynch: Okay, so here's just a sketch of how it works.

73
00:12:04.230 --> 00:12:16.710
Nancy Lynch: Let's look at the simplest case we have synchronous rounds and just simple stopping failures. So we have messages that could be lost. But after some time we assume that all the messages get delivered in a timely fashion.

74
00:12:18.090 --> 00:12:30.210
Nancy Lynch: Okay, so we have phases, each one. Some coordinator is in charge of and we can do it by coordinator number and some the number of the phase my end.

75
00:12:31.650 --> 00:12:41.130
Nancy Lynch: So processor can lock of value with the certain phase number. That means it thinks that the coordinator might decide that value at that phase.

76
00:12:42.120 --> 00:12:52.500
Nancy Lynch: And here's what happens during the phase K under the control of some coordinator first everybody sends the coordinators. What you think's are acceptable decision values.

77
00:12:53.190 --> 00:13:07.440
Nancy Lynch: That means they don't hold a lock on something different. And actually it's known to be somebody initial value. So everybody says they're acceptable values and then the process picks the value that is acceptable.

78
00:13:08.790 --> 00:13:16.530
Nancy Lynch: To a majority of processors broadcast that value and all the recipients lock that value for the current face.

79
00:13:17.280 --> 00:13:23.730
Nancy Lynch: Okay. Now, anybody locked the value sense of acknowledgement to be the coordinator, and I remember these messages could get lost.

80
00:13:24.330 --> 00:13:34.650
Nancy Lynch: But if the processor coordinator receives the majority of acknowledgments, then it's able to decide on that value and then round four is just a bunch of cleanup stuff going on.

81
00:13:35.820 --> 00:13:43.500
Nancy Lynch: Okay, so that's basically the flavor of all these protocols. It's just repeating it again. And this is depicting the communication pattern.

82
00:13:45.510 --> 00:13:54.540
Nancy Lynch: Okay, so some of these ideas come from schemes three phase commit is inspired by Dale schemes are all your work on three phase commit

83
00:13:55.290 --> 00:14:04.170
Nancy Lynch: And then somewhat later lampert's Paxos consensus protocol uses very similar ideas to reach agreement.

84
00:14:04.890 --> 00:14:12.690
Nancy Lynch: Okay, I should say that there's a difference. But the important difference in Paxos protocol is it has more concurrency.

85
00:14:13.500 --> 00:14:25.920
Nancy Lynch: It doesn't use a strictly rotating coordinator, but it allows different coordinators to make attempts concurrently, but it's still the majority checking discipline that's used to manage the conflicts.

86
00:14:26.940 --> 00:14:27.390
Okay.

87
00:14:29.490 --> 00:14:31.860
Nancy Lynch: Alright, so that's the first example.

88
00:14:33.630 --> 00:14:40.530
Nancy Lynch: The next two contributions. I'm going to talk about are mostly about modeling and proofs for algorithms that

89
00:14:41.250 --> 00:14:48.030
Nancy Lynch: Had a written in the design of distributed systems, not so much about new algorithms like the previous example.

90
00:14:48.720 --> 00:14:54.900
Nancy Lynch: So both of these deal with distributed data management, which was the main focus of our work for many years.

91
00:14:55.650 --> 00:15:03.630
Nancy Lynch: After that, I'll look at one more algorithmic contribution and I like it's called the Rambo algorithm which is designed to implement

92
00:15:04.560 --> 00:15:12.930
Nancy Lynch: Atomic shared memory on a dynamically changing platform. And that can be regarded as another precursor to block chain style algorithms.

93
00:15:13.680 --> 00:15:21.960
Nancy Lynch: Okay, so let's look at concurrency control algorithms for nested transactions. Okay, so

94
00:15:22.890 --> 00:15:41.880
Nancy Lynch: Yeah, so my collaborators and I spent several years mostly in the 80s, trying to use theoretical approaches to understand and generalize and crew correctness of concurrency control algorithms for implementing atomic transactions. It's the central problem of distributed data processing.

95
00:15:43.050 --> 00:15:52.920
Nancy Lynch: Can mostly that work had been done by systems people and there was not all that much theory yet. So some more background transactions.

96
00:15:54.540 --> 00:16:07.170
Nancy Lynch: Okay well transaction. You probably know it's you can think of it as a sequence of operations typically just read and write operations that are supposed to appear as if they're executing as one indivisible unit.

97
00:16:08.040 --> 00:16:13.350
Nancy Lynch: And then a collection of transactions are supposed to appear to execute in some sequential order.

98
00:16:14.400 --> 00:16:28.740
Nancy Lynch: Concurrency control just refers to algorithms that are used for implementing atomic transactions. So the main or leader in this field and we began with Turing Award winner Jim Gray and Bernstein and Goodman followed on

99
00:16:29.760 --> 00:16:44.640
Nancy Lynch: Beginning work on an abstract of formal theory for some types of transactions. Okay. Then there were a lot of there were a lot of systems papers that had implementing algorithms represented and little theory.

100
00:16:48.660 --> 00:16:51.210
Nancy Lynch: Oh yeah, I didn't mention but the

101
00:16:52.590 --> 00:17:09.990
Nancy Lynch: Extensions of transactions to the nested case that was done by Barbara was part of another Turing Award winner, she extended the idea of transactions to allow them to have a nested tree structure were transactions and child transactions, all the way down to leave transactions.

102
00:17:11.430 --> 00:17:26.070
Nancy Lynch: Any transaction could abort during an execution in its parents and accommodated could learn about the board react by, for example, trying something else. And the key condition there is that every set of sibling transactions should execute in some order.

103
00:17:28.680 --> 00:17:42.600
Nancy Lynch: Each sibling should appear atomic. So this is a very flexible programming discipline that list got included in her artists distributed programming language so that work seem really interesting, but it was complicated

104
00:17:44.070 --> 00:18:03.300
Nancy Lynch: The new Nessa transaction concept and all the algorithms implementing it had no theory, just the descriptive systems papers and the implementations. So our work involves trying to model these things on the transaction requirements and the algorithms mathematically.

105
00:18:04.500 --> 00:18:19.530
Nancy Lynch: Along the way we ended up generalizing some existing algorithms from single level transactions to nested case. And then we were actually able to push this the point of proven correctness of these algorithms.

106
00:18:21.600 --> 00:18:25.800
Nancy Lynch: Okay. So we wrote a book that included all of these all of this work.

107
00:18:27.150 --> 00:18:35.580
Nancy Lynch: Is based on a whole series of research papers. So we did was develop a general theory for nested transactions. It included

108
00:18:36.330 --> 00:18:44.820
Nancy Lynch: An ethnicity theorem that provided one method for proven correctness of concurrency control algorithms, kind of a compositional method.

109
00:18:45.750 --> 00:18:52.260
Nancy Lynch: And then we did all this, these types of algorithms, there's lock based algorithms which is what was used in the

110
00:18:52.950 --> 00:19:02.580
Nancy Lynch: The artist system. But there's also algorithms that are based do concurrency control based on timestamps on a combination of the two optimistic algorithms.

111
00:19:03.390 --> 00:19:19.500
Nancy Lynch: Orphan management algorithms deal with what happens to children, child transactions of aborted transactions and this algorithms that involved replicated data. We did all this in terms of a mathematical framework called Iowa Tamika so we developed

112
00:19:21.030 --> 00:19:35.850
Nancy Lynch: A model for asynchronous systems with interacting components. And this was being developed at around the same time that we were able to use that as the basis for all of this modeling and verification

113
00:19:36.510 --> 00:19:49.290
Nancy Lynch: So what I've learned is that recently the effort of verifying at least safety properties of complex systems of the sort, has become quite popular and systems communities.

114
00:19:50.430 --> 00:20:00.240
Nancy Lynch: Some of that work involves not mathematical proofs of the sword that we did, but code level verification using interactive there improvers, so our work is different.

115
00:20:00.720 --> 00:20:09.330
Nancy Lynch: Is that it's based on abstract models and our proofs are mostly by hand, of course, abstract proofs could also be done with your improvers, and we did a few of us.

116
00:20:11.940 --> 00:20:12.240
Okay.

117
00:20:13.560 --> 00:20:14.010
Nancy Lynch: All right.

118
00:20:15.120 --> 00:20:24.510
Nancy Lynch: So following the same style is for are nested transactions work we studied other algorithms in the distributed data systems database community.

119
00:20:25.590 --> 00:20:34.950
Nancy Lynch: Again trying to understand how they work in a mathematical way and prove their key properties. So here's the next work that we

120
00:20:35.640 --> 00:20:51.900
Nancy Lynch: We did we've tackled the notion of distributed shared memory which is basically a shared memory extraction implemented in a distributed network we focused on a system called Orca, which was developed by ball cash shock and Tannenbaum

121
00:20:54.420 --> 00:21:09.930
Nancy Lynch: The Arca system included some interesting algorithms to achieve distributed shared memory, but the algorithms were kind of complicated, it wasn't obvious they were correct. So, or why they were correct. So it seemed like a good subject for theoretical work.

122
00:21:11.820 --> 00:21:19.920
Nancy Lynch: Okay, so what did we do, we considered an algorithm to implement sequentially consistent. Read, Update shared memory.

123
00:21:20.610 --> 00:21:39.120
Nancy Lynch: Using a basic communication service that had a combination of broadcast and point to point communication. This was as in the amoeba operating system. So this algorithm is based on the work the system, which was used for writing applications for clusters of workstations.

124
00:21:40.380 --> 00:21:52.200
Nancy Lynch: So orca divided up the work into two parts. First defined an abstract multicast channel which has very strong ordering and causality properties.

125
00:21:52.860 --> 00:22:01.980
Nancy Lynch: And then it implemented sequentially consistent memory over and arbitrary multicast channel using a partial replication strategy.

126
00:22:02.820 --> 00:22:20.520
Nancy Lynch: You had copies of the data at some of the nodes you can read any copy but you have to update all copies may also implemented the multicast channel over their basic broadcast and point to point communication services using a protocol that's based on sequence numbers.

127
00:22:23.250 --> 00:22:31.080
Nancy Lynch: Okay, so what did we do well we specified the message delivery in the ordering requirements of the multicast channel mathematically. Okay.

128
00:22:32.190 --> 00:22:51.420
Nancy Lynch: Then we defined an algorithm like there's a sequence number based algorithm to implement the multicast channel over basic communication services and we tried to prove the algorithm. Correct. But it didn't work. We discovered an algorithmic error in the orca implementation.

129
00:22:52.890 --> 00:23:00.870
Nancy Lynch: It was something subtle. They didn't piggyback certain needed sequence numbers on some response messages.

130
00:23:01.980 --> 00:23:15.330
Nancy Lynch: Okay, the error was fixed in the algorithm and in the actual system. It also had to be repaired. In fact, the system developers were very surprised that this bug was found using theory.

131
00:23:16.890 --> 00:23:26.820
Nancy Lynch: They were kind of skeptical of the theoretical efforts at the time. But yes, doing a proof tried to do a proof on Earth. The error. And then of course we could complete the proof.

132
00:23:28.560 --> 00:23:28.800
Okay.

133
00:23:30.360 --> 00:23:35.700
Nancy Lynch: So, all right. So this is what I just said, we define this algorithm and prove to correct

134
00:23:36.390 --> 00:23:49.080
Nancy Lynch: And then for the higher level we defined a version of the partial replication algorithm to implement the sequentially consistent memory over the multicast channel, which is a generalization of the orca algorithm.

135
00:23:50.190 --> 00:23:55.590
Nancy Lynch: And we developed a proof technique a little bit new for proving sequential consistency.

136
00:23:56.010 --> 00:24:10.440
Nancy Lynch: And that was all based on Iowa Tama composition abstraction techniques, the things that we were studying in terms of basic foundations for concurrency were exactly what we were using for approving these newsreel algorithms. Okay.

137
00:24:11.760 --> 00:24:26.820
Nancy Lynch: Alright so summary of just these last two parts is what did we do, we looked at a bunch of existing distributed systems we use Iowa Tama these formal methods to model and verify many such systems.

138
00:24:28.140 --> 00:24:35.580
Nancy Lynch: Especially the kind that had strong consistency requirements for the data and we specified the properties that were required

139
00:24:36.840 --> 00:24:52.560
Nancy Lynch: Defined abstract versions of the systems and prove the algorithms correct that meant we had to clarify ambiguities and we found in fixed some errors when I told you about. Also, at least one other in an existing system.

140
00:24:54.300 --> 00:24:54.810
Nancy Lynch: Okay.

141
00:24:57.390 --> 00:25:10.860
Nancy Lynch: Alright, so one more algorithm example is the work on reconfigure atomic memory. So here I'm going to talk about this last paper by Gilbert Lynch and sportsman.

142
00:25:11.340 --> 00:25:21.690
Nancy Lynch: On Rambo which is very configurable atomic memory service that works in dynamic network. So this is just a picture from a new story on this.

143
00:25:23.160 --> 00:25:29.940
Nancy Lynch: Which shows a centralized memory brain being implemented on a bunch of distributed mobile devices.

144
00:25:33.690 --> 00:25:39.810
Nancy Lynch: Okay, so our goal is to implement atomic rewrite shared memory in a dynamic network setting.

145
00:25:40.530 --> 00:25:53.160
Nancy Lynch: So atomic memories is what it's supposed to look like a centralized shared memory. Everybody lacks us is that the participants can join and leave and fail during computation that's a dynamic network.

146
00:25:54.240 --> 00:25:57.870
Nancy Lynch: Examples would be mobile networks and peer to peer networks.

147
00:26:00.150 --> 00:26:11.880
Nancy Lynch: We want high availability low latency and we want to maintain at emissivity in all cases. But, you know, we'd like good performance when there's some limits on the changes.

148
00:26:13.650 --> 00:26:23.700
Nancy Lynch: So examples that inspired this work, our soldiers in a military operation that don't have any fixed infrastructure that they're using, like, in the first

149
00:26:25.080 --> 00:26:42.810
Nancy Lynch: Word in Afghanistan, where you have a bunch of soldiers going into an area and they don't have any communication infrastructure. So they have to either bring their own on a truck or else there ought to be some techniques that can run just on the mobile devices.

150
00:26:45.720 --> 00:26:52.980
Nancy Lynch: Okay. The starting point for this work was a work on static networks by a TFR know and deliver

151
00:26:54.120 --> 00:27:03.000
Nancy Lynch: So basically the protocol for maintaining atomic memory and static network involved core processors, because I think majorities, if you like, but

152
00:27:03.600 --> 00:27:12.810
Nancy Lynch: They basically have some reports and some right corner of the key condition as for majorities is every report has to intersect every right corner.

153
00:27:15.750 --> 00:27:17.880
Nancy Lynch: They replicated the objects everywhere.

154
00:27:19.080 --> 00:27:21.420
Nancy Lynch: With tags indicating versions.

155
00:27:22.470 --> 00:27:30.270
Nancy Lynch: And then their algorithm to read was contact record them determine the latest version propagated to a right forum and return it.

156
00:27:31.500 --> 00:27:39.870
Nancy Lynch: To write you contact a record to determine the latest tag. Choose a bigger tag and write your value with that tag to a right port.

157
00:27:41.280 --> 00:27:51.810
Nancy Lynch: Okay, so they both these sort of to face protocols, you can optimize so that you don't need to always propagate a for read that saves some

158
00:27:53.130 --> 00:27:54.750
Nancy Lynch: Communication and time.

159
00:27:57.630 --> 00:28:09.570
Nancy Lynch: So in this algorithm. The nice thing is that the operations can all proceed concurrently, and they can enter leave at a very fine granularity just the individual sends and receives

160
00:28:10.920 --> 00:28:14.370
Nancy Lynch: And yet you still get at emissivity out of the entire algorithm.

161
00:28:15.990 --> 00:28:19.500
Nancy Lynch: OK, so now the Rambo algorithm inspired by this one.

162
00:28:20.670 --> 00:28:31.230
Nancy Lynch: Okay, first of all, Rambo is stands for reconfigure atomic memory for basic objects which is rewrite objects. It also stands for a soldier.

163
00:28:32.970 --> 00:28:34.680
Nancy Lynch: You know, military setting but

164
00:28:35.970 --> 00:28:51.480
Nancy Lynch: Whenever the Rambo is going to use. It's going to be similar, but now it's going to use separate configurations. Each configuration has instead of members that you're just subset of processors and has records and right parts.

165
00:28:53.040 --> 00:29:14.760
Nancy Lynch: Okay. So instead of just having one configuration as the to Barney deliver them. Now you have a set of possible configurations. Okay, the objects are replicated now not everywhere, but just all members of configuration. The reads and writes just access corns that configuration.

166
00:29:15.990 --> 00:29:37.530
Nancy Lynch: And that works the way it did and add it to our idol and that's fine and handling small and trends and changes. But if you want to have more permanent changes, you have to reconfigure to a new configuration which means moving the object copies to a Member to members of the new configuration.

167
00:29:39.690 --> 00:29:55.140
Nancy Lynch: OK. So the algorithm at a high level is just divided into the main algorithm that does all reads and writes and handles the new configurations and a reconfiguration service that basically determines the new configurations

168
00:29:56.160 --> 00:30:01.230
Nancy Lynch: The reconfiguration service just supplies a consistent series of configurations

169
00:30:02.340 --> 00:30:18.900
Nancy Lynch: And it can be triggered by, you know, some external administrator decides. Now it's time to start a new configuration. The main algorithm does all the other work it handles the reading and writing of the objects removes the old configurations and then

170
00:30:20.370 --> 00:30:27.780
Nancy Lynch: The reads and writes the trick is that the reads and writes us all the currently active configurations

171
00:30:28.650 --> 00:30:41.040
Nancy Lynch: Okay. So instead of just trying to synchronize one configuration of the time, you might have some overlap, during which period of overlap. You simply have the reads and writes, do a little extra work by using both

172
00:30:42.120 --> 00:30:43.320
Nancy Lynch: Configurations

173
00:30:45.450 --> 00:30:53.040
Nancy Lynch: OK, so the reads and writes to a two phase protocol that's like the VD protocol collect object information from report comes

174
00:30:53.760 --> 00:31:07.410
Nancy Lynch: But no, it's not just one configuration. It's all the currently active configurations and then in phase two it propagates the latest object information to the right course of all the known active configurations

175
00:31:09.540 --> 00:31:13.770
Nancy Lynch: And you can have a lots of reads and writes going on. Concurrently, just as an add

176
00:31:14.880 --> 00:31:16.650
Nancy Lynch: You get Adam a city is before

177
00:31:17.670 --> 00:31:31.350
Nancy Lynch: And the termination is a kind of a fixed point condition which involves a quorum for every known active configuration. So while you're doing this new configurations. Can you can learn about new configurations and then you have to use them to

178
00:31:32.370 --> 00:31:38.640
Nancy Lynch: But at some point, you have a fixed point where you have satisfied the condition for every configuration you know about

179
00:31:41.340 --> 00:31:49.230
Nancy Lynch: And then you have this protocol to remove all configurations. Turns out that's also a two phase protocol. It's kind of cute.

180
00:31:50.670 --> 00:31:51.660
Nancy Lynch: In phase one.

181
00:31:52.830 --> 00:32:00.420
Nancy Lynch: For every old configuration everything but the latest that you know about you tell a right quorum of the old configuration about the new one.

182
00:32:01.470 --> 00:32:11.820
Nancy Lynch: And you collect the latest object information from the old configuration phase two. You just propagate that latest object information to a right corner of the new configuration.

183
00:32:13.440 --> 00:32:27.000
Nancy Lynch: OK, so now you have reads and writes and garbage collection all going on concurrently interweaving at a fine granularity, and you still get out in the city. So it's kind of a neat algorithm.

184
00:32:28.230 --> 00:32:33.990
Nancy Lynch: Okay. And then what what do you do about implementing reconfiguration so that you can use consensus.

185
00:32:35.040 --> 00:32:38.580
Nancy Lynch: Just use consensus to determine the success of configurations

186
00:32:40.380 --> 00:32:55.020
Nancy Lynch: That's a heavy weight mechanism, but we don't use it on every operation we just use it for occasional changes of configuration. So you're not you're not delaying the read and write operations.

187
00:32:56.070 --> 00:32:56.400
Okay.

188
00:32:57.990 --> 00:33:03.690
Nancy Lynch: And you can implement consensus using the DNS paper or taxes.

189
00:33:04.920 --> 00:33:15.240
Nancy Lynch: And all this used Iowa time it again actually timed version of the IO atomic model that took into account time delays takes into account time delays.

190
00:33:17.070 --> 00:33:30.900
Nancy Lynch: And there's a nice method. If you have to prove out of necessity for data system is a nice method based on partial orders, it's described there and used in many other places. Okay.

191
00:33:33.210 --> 00:33:41.790
Nancy Lynch: Alright, so that's what I wanted to say about algorithms. She is odd, not to have feedback and pauses for questions.

192
00:33:42.900 --> 00:33:46.860
Nancy Lynch: But I guess you can't do that in this format. Okay, well, I'll go on.

193
00:33:48.450 --> 00:33:56.520
Nancy Lynch: The next part. We're going to talk about the impact some of the selected impossibility results, try to give you the flavor of these results.

194
00:33:57.900 --> 00:34:16.320
Nancy Lynch: Starting point here is that distributed algorithms have very strong inherent limitations. Why, because they have to work in settings where processors only know what's happening locally and there's all this uncertainty about what's happening when and it can be failures.

195
00:34:18.330 --> 00:34:25.530
Nancy Lynch: So once you have theoretical models as we do for the algorithms, you can actually prove that these limitations hold

196
00:34:26.730 --> 00:34:41.130
Nancy Lynch: So the starting point for me in this area was a nice little paper by tremors and Hibbard way back in 1976 as far as I know, that's the first impossibility result for distributed computing

197
00:34:43.110 --> 00:34:50.610
Nancy Lynch: So these were my colleagues at the University of Southern California. At the time, the work was in a shared memory setting.

198
00:34:52.440 --> 00:34:58.830
Nancy Lynch: They looked at a setting with Boolean shared variables arbitrary operations, not just reads and writes

199
00:34:59.520 --> 00:35:10.980
Nancy Lynch: And the problem may consider was fair mutual exclusion processors can come in, they want the resource and you want this to be fair, so everybody that request the resource it eventually get it.

200
00:35:11.700 --> 00:35:29.220
Nancy Lynch: Turns out that and they prove that it's impossible for this to work unsolvable for two processors with a single Boolean shared variable, no matter how powerful and operation you did on the on this variable two processors camp arbitrate

201
00:35:31.050 --> 00:35:35.550
Nancy Lynch: To get fair access to a resource using a Boolean variable.

202
00:35:37.740 --> 00:35:38.310
Nancy Lynch: So,

203
00:35:40.380 --> 00:35:49.140
Nancy Lynch: Yeah. So, the proof. This is based on constructing a collection of executions and showing that one of these has to lead to an incorrect results.

204
00:35:49.740 --> 00:35:58.650
Nancy Lynch: They use the fact that the processors will behave the same into executions in which their own states and the shared memory contents are the same.

205
00:35:59.760 --> 00:36:03.330
Nancy Lynch: Regardless of the internal states of the other processes.

206
00:36:04.710 --> 00:36:07.590
Nancy Lynch: So this is kind of a, you know, a light bulb.

207
00:36:08.970 --> 00:36:16.170
Nancy Lynch: Going off. Oh, you could prove this type of limitation based on this limitation of locality.

208
00:36:17.190 --> 00:36:27.600
Nancy Lynch: I should apologize because I'm mixing the terms process and processor, but it's mixed in the research literature. So I'm mostly say process now.

209
00:36:28.770 --> 00:36:30.360
Nancy Lynch: With the impossibility results.

210
00:36:31.590 --> 00:36:39.540
Nancy Lynch: Okay, so I'm going to do a quick tour of a little lower bound on the number of shared variables needed for mutual exclusion.

211
00:36:40.800 --> 00:36:54.480
Nancy Lynch: And then I'll look at lower bound on the time to reach consensus and that's warming up to the NFL P result or the impossibility of distributed consensus and asynchronous systems.

212
00:36:55.800 --> 00:37:03.000
Nancy Lynch: Okay. And then if there's time I'll talk briefly about the CAP theorem, which is some more recent

213
00:37:05.370 --> 00:37:20.280
Nancy Lynch: Okay, so first of our work on mutual exclusion, we started working on mutual exclusion. First we focused on the size of the shared memory and try to generalize from creditors and Hubert's results. We got results about in processes, instead of two processes.

214
00:37:22.260 --> 00:37:39.930
Nancy Lynch: So that's the first paper on the slide. The second one is the one I'll talk about now. This gives you about not on the size of a shared variable. But on the number of shared variables and now we're looking at shared variables that just support reads and writes

215
00:37:41.160 --> 00:37:46.020
Nancy Lynch: So this was appeared in 1993 but actually proved way back

216
00:37:47.700 --> 00:38:05.250
Nancy Lynch: 10 years before. So this actually gives an important limitation on the power of read rideshare variables compared to variables with more powerful operations. If you try to solve the same problem with more powerful read modify right operations one variable is enough.

217
00:38:07.620 --> 00:38:07.980
Okay.

218
00:38:09.240 --> 00:38:18.240
Nancy Lynch: So here we go, mutual exclusion for in processes using rewrite shared memory requires at least in separate shared variables.

219
00:38:18.540 --> 00:38:31.440
Nancy Lynch: And this is true even if you don't require fairness. So, you know, the members and Hibbert result doesn't apply here. You just require the system to keep making progress. If there's someone who wants that resource. Someone should get it.

220
00:38:32.700 --> 00:38:40.650
Nancy Lynch: We allow every process to read and write all the variables and the variables. Now, can be not just Boolean but unbounded size.

221
00:38:41.880 --> 00:38:43.560
Nancy Lynch: So this is a very

222
00:38:44.670 --> 00:38:53.880
Nancy Lynch: Permissive setting. You just have to try to make progress. Everybody can read and write all variables unbounded size variables and still

223
00:38:54.600 --> 00:39:15.660
Nancy Lynch: You can solve the problem. And I'll just show an example why you can't do it two processes and only one shared variable, okay. Suppose you have to process these p one, p two. And they sell mutual exclusion with with progress, using just one rewrite shared variable called x

224
00:39:17.100 --> 00:39:24.600
Nancy Lynch: Okay, so let's look at this scenario. This is how these proofs go you construct scenarios that had contradictory requirements.

225
00:39:25.260 --> 00:39:36.150
Nancy Lynch: Suppose process one arrives. Nobody's around process one then comes in, by itself, wanting the resource. So by the progress requirements, it has to be able to get it.

226
00:39:36.780 --> 00:39:55.470
Nancy Lynch: Somebody wants to resource. Somebody has to get it only process. One is around. Okay. Along the way, I claim that the process must write to the shared memory. Why, because it didn't write anything. The other process wouldn't know it was there wouldn't know that it announced from

227
00:39:56.580 --> 00:40:13.020
Nancy Lynch: Outside the system into the critical region. So if process to then came in, it should be able to get the resource because it thinks it's alone, and then they both have the resource that that would contradict mutual exclusion.

228
00:40:14.250 --> 00:40:23.040
Nancy Lynch: Okay, so here's a picture process. Want to write it has to write to the shared variable and then it has to eventually get the resource.

229
00:40:24.120 --> 00:40:34.350
Nancy Lynch: OK, so now let's consider an execution which process one pauses right here. Just before was ready to write, it's in a state where it's about to right

230
00:40:35.730 --> 00:40:40.740
Nancy Lynch: At that point, let's let process to arrive and

231
00:40:42.150 --> 00:40:48.540
Nancy Lynch: It arrives, just one process. One is part of process one hasn't written the process to fix. It's alone.

232
00:40:49.830 --> 00:40:59.460
Nancy Lynch: And it gets the resource because it can't see process one at all along the way it has to write the shared variable same reason as before.

233
00:41:00.720 --> 00:41:08.730
Nancy Lynch: Okay, so it comes in, gets the resource and now at this point. So we're just looking at this branch of the execution.

234
00:41:09.960 --> 00:41:12.000
Nancy Lynch: At this point, let's let

235
00:41:13.440 --> 00:41:23.190
Nancy Lynch: Process one resume. The first thing it does is to write the shared variable. But what does it do it overrides whatever process to wrote there.

236
00:41:24.510 --> 00:41:29.010
Nancy Lynch: So now process to his presence isn't visible, any longer.

237
00:41:30.360 --> 00:41:41.850
Nancy Lynch: Okay, so it overrides whatever process to wrote there and then process, one can tell process to is there so it will behave exactly the way it did in the original execution.

238
00:41:43.020 --> 00:41:52.410
Nancy Lynch: Advancing and getting the resource, it will overwrite and then it gets the resource. And at this point, you have two processes.

239
00:41:53.460 --> 00:41:58.320
Nancy Lynch: In the critical section and you violated the mutual exclusion condition.

240
00:41:59.970 --> 00:42:03.030
Nancy Lynch: Okay, at this point, I usually look to see if people

241
00:42:04.650 --> 00:42:08.070
Nancy Lynch: Are more or less following that I can't see anything. So, oh well.

242
00:42:09.300 --> 00:42:17.580
Nancy Lynch: Okay, so it's a contradiction. If more processes you need more variables. And it's the same kind of argument.

243
00:42:18.570 --> 00:42:28.920
Nancy Lynch: For in processes, you need to ensure variables. It's more intricate the construction, but it uses the same ideas writing to a shared variable overrides the previous contents.

244
00:42:29.340 --> 00:42:39.270
Nancy Lynch: And then press can see only its own state and the values that it actually reads from the shared variables. So using those facts you get this extended

245
00:42:40.320 --> 00:42:41.190
Nancy Lynch: Impossibility.

246
00:42:44.850 --> 00:42:46.680
Nancy Lynch: OK, so

247
00:42:48.240 --> 00:42:49.110
Nancy Lynch: Moving along

248
00:42:51.870 --> 00:42:51.960
I'm

249
00:42:53.430 --> 00:43:08.700
Nancy Lynch: Going to look at two results on consensus on listing several papers. The first three or four the synchronous model. The first one is the lower bound and time for consensus in the synchronous model.

250
00:43:09.750 --> 00:43:28.770
Nancy Lynch: The next paper actually extends that two cases consensus, where you can agree on K values, instead of one, and this one is a good paper that gives unifies many impossibilities proofs are distributed consensus problem and then I'll talk about the NLP result.

251
00:43:33.000 --> 00:43:42.360
Nancy Lynch: Okay, this is just the same slides before defining the consensus problem. Recall everyone starts to the value they have to agree on a value in the set.

252
00:43:43.770 --> 00:43:50.940
Nancy Lynch: And they have to satisfy the agreement property and validity property that says they all start with the same value. That's the only decision value.

253
00:43:54.180 --> 00:43:59.670
Nancy Lynch: OK, so the lower bound on the number of rounds for consensus and synchronous systems.

254
00:44:00.960 --> 00:44:05.400
Nancy Lynch: Okay, the starting point here was to notice that all the known algorithms for

255
00:44:06.420 --> 00:44:15.060
Nancy Lynch: fault tolerant agreement and synchronous systems had used at least f plus one rounds, where there's F possible failures.

256
00:44:16.620 --> 00:44:29.580
Nancy Lynch: And so, you wonder why is that it doesn't have to be the case. So turns out that's inherent the F plus one rounds are actually needed in the worst case, even if you're only talking about simple stopping failures.

257
00:44:31.260 --> 00:44:31.650
Nancy Lynch: Okay.

258
00:44:32.790 --> 00:44:42.600
Nancy Lynch: So all right, brief idea assume that for contradiction that you have an F round agreement algorithm that tolerates F failures.

259
00:44:45.180 --> 00:44:51.780
Nancy Lynch: And will assume to make things simple, but this is all without loss of generality that you have an end node complete graph.

260
00:44:52.800 --> 00:45:05.100
Nancy Lynch: The decision said is just zero and one decisions are reached at the end. And everybody send something to everyone else, that every round. Okay, so these are just for simplicity

261
00:45:07.800 --> 00:45:25.530
Nancy Lynch: Okay, so let's just look at the base case. And I'll just give a hint at the next case. So the base case is just, you can't have an end process one fault stopping agreement algorithm that always decides at the end of round one.

262
00:45:27.390 --> 00:45:43.110
Nancy Lynch: Okay, so you need at least two rounds to tolerate one failures, the same. And we know this. Okay, prove a contradiction. Suppose you have an algorithm for and processes that only works in one round and tolerance one failure.

263
00:45:44.850 --> 00:45:47.010
Nancy Lynch: Okay, so this is a new kind of argument.

264
00:45:48.150 --> 00:45:52.890
Nancy Lynch: You construct a whole chain of executions, to get the contradiction.

265
00:45:54.300 --> 00:46:02.340
Nancy Lynch: So, I mean, in the last examples. You just saw particular executions, but here is going to systematically construct a whole chain of executions.

266
00:46:03.150 --> 00:46:15.570
Nancy Lynch: Each execution in the chain has at most one failure. The first execution will be designed so that it has to have decision day zero and the last one would have to have a decision, though, you want

267
00:46:16.740 --> 00:46:32.070
Nancy Lynch: But any two consecutive executions in the chain are going to look the same to some process that's non faulty in both of the executions. So what does that mean, it means that process has to decide the same in both of those executions.

268
00:46:33.210 --> 00:46:50.100
Nancy Lynch: So those two executions must have the same unique decision downloads. So you're creeping your way from all zero from guaranteed zero decision to guarantee one decision, but at every step along the way you have to maintain the same decision see a contradiction that way.

269
00:46:52.320 --> 00:47:08.670
Nancy Lynch: How might you do this. Okay, start with an execution where everyone is input zero and there are no failures and ends with an execution where everybody is in one and there are no failures. Okay, so here's everybody with zero

270
00:47:10.020 --> 00:47:16.680
Nancy Lynch: initial values. Everyone succeeds in communicating with everyone and we know the final decisions have to be zero.

271
00:47:19.200 --> 00:47:30.330
Nancy Lynch: So I'm going to try to work my way to a similar execution, where they all start with one. Okay. So I'll start with this initial execution.

272
00:47:31.080 --> 00:47:44.250
Nancy Lynch: And now I'm going to creep slowly toward the other end of the chain. My first execution in the chain will involve removing the one message from process one to process to so that's missing here.

273
00:47:45.750 --> 00:47:58.560
Nancy Lynch: Okay, so these two executions look the same to everybody except presence minor process to so they're going to decide the same in both executions. So again here zero is the only possible decision value.

274
00:48:01.200 --> 00:48:05.520
Nancy Lynch: Okay, next execution. All right, now you can remove the message from 123

275
00:48:06.720 --> 00:48:07.620
Nancy Lynch: You still have

276
00:48:08.640 --> 00:48:12.480
Nancy Lynch: Processes that can tell the difference between. Now these two

277
00:48:13.590 --> 00:48:15.360
Nancy Lynch: And so they will decide in the same way.

278
00:48:17.700 --> 00:48:17.940
Right.

279
00:48:19.050 --> 00:48:21.810
Nancy Lynch: And finally, you remove the message from one to four.

280
00:48:22.950 --> 00:48:25.950
Nancy Lynch: Still indistinguishable. Some non faulty process.

281
00:48:27.450 --> 00:48:30.210
Nancy Lynch: And you just keep going but

282
00:48:31.440 --> 00:48:46.860
Nancy Lynch: Once you've removed all of process ones messages you want. You have to do something else. You change process one input from zero to one. So you can do that because process one isn't sending me messages.

283
00:48:48.120 --> 00:48:54.090
Nancy Lynch: So no one else can tell the difference with their process one started with zero or started with the one

284
00:48:57.060 --> 00:49:05.280
Nancy Lynch: Okay, so in one step, we change the value. So now we can say, now I want to change the value for two, but I can't just start removing messages from to

285
00:49:05.880 --> 00:49:15.630
Nancy Lynch: Why, because we're only allowed to have one failure in each execution and already processed one has failed because all the messages are missing.

286
00:49:16.590 --> 00:49:29.190
Nancy Lynch: So now we can't just say, what's the process to fail as well. So instead we first replace the missing messages, one at a time until we get this kind of setup reprocess one is faulty anymore.

287
00:49:31.170 --> 00:49:39.180
Nancy Lynch: But starts with a one and everyone else starts with zero. So now I think you can see the pattern we do the same play the same game to change the initial value.

288
00:49:39.870 --> 00:49:57.900
Nancy Lynch: Of process to and then do process three and eventually we reached the end of the chain where everybody has input one tiny, tiny changes that every step in this chain maintain the decision value because consecutive executions look the same to someone

289
00:49:59.610 --> 00:50:00.000
Okay.

290
00:50:02.820 --> 00:50:15.360
Nancy Lynch: Alright, so the discontinued is that you have a second theorem that says there's no process and process to fault stopping agreement algorithm in which everybody decides at the end of room to

291
00:50:16.860 --> 00:50:25.650
Nancy Lynch: And you just have a more complicated chain. We're going to have a chain of executions. Now we're allowed to failures at each execution in the chain.

292
00:50:26.370 --> 00:50:38.790
Nancy Lynch: Again, we start with all zeros and no failures and with all ones and no failures and maintain the property to each pair of consecutive executions is indistinguishable to some non fault the process.

293
00:50:40.080 --> 00:50:46.200
Nancy Lynch: Okay. So, for example, how would we change the process ones initial value from zero to one.

294
00:50:47.940 --> 00:50:55.590
Nancy Lynch: Well, okay, we want to work toward killing process one at the beginning of execution. So we can change its initial value.

295
00:50:57.480 --> 00:50:58.110
Nancy Lynch: Okay.

296
00:51:00.750 --> 00:51:16.470
Nancy Lynch: All right, well, we can, how are we going to remove all of process once messages we can remove all this one it's around two messages. Just the way we did before, one by one, but we have this little problem, we can then start moving it's round one messages, because suppose

297
00:51:17.700 --> 00:51:25.740
Nancy Lynch: For example, we removed the round one message from process one to process to it wouldn't look the same to anybody.

298
00:51:26.880 --> 00:51:37.440
Nancy Lynch: Those two executions would not look the same to anybody because removing this one message still allows process to to tell everybody about the failure.

299
00:51:39.210 --> 00:51:50.100
Nancy Lynch: Okay, so you have this picture, either the message is there and process to can tell everyone you got the message. When the message is not there and process to can tell everybody that he didn't get the message.

300
00:51:51.150 --> 00:52:03.060
Nancy Lynch: So it's more complicated. And the trick here is just to use more steps to remove the round one message from one to two and then those steps you have to feel both process one and process to

301
00:52:04.230 --> 00:52:10.350
Nancy Lynch: So part of this is removing all of the tutors round two messages, one by one and replacing them one by one.

302
00:52:12.690 --> 00:52:19.020
Nancy Lynch: Yeah, not going to go through any more details like that, but it's a more complicated chain involving two failures at each step.

303
00:52:20.700 --> 00:52:21.120
Nancy Lynch: Okay.

304
00:52:22.710 --> 00:52:33.690
Nancy Lynch: So, all right. Finally, I get to the most famous resulted rounds mentioned reaching consensus and asynchronous systems.

305
00:52:37.590 --> 00:52:38.400
Nancy Lynch: Excuse me.

306
00:52:39.900 --> 00:52:43.230
Nancy Lynch: So the theorem is an asynchronous distributed system.

307
00:52:44.850 --> 00:52:56.040
Nancy Lynch: In which at most one process may fail by just simply stopping without any warning, it's impossible for the non faulty processes to reach agreement reliably

308
00:52:59.220 --> 00:53:07.830
Nancy Lynch: So this is a possibility hold, even if this most one process that ever fails and the fail process just stops. This is

309
00:53:08.370 --> 00:53:16.410
Nancy Lynch: Quite surprising. It seems counterintuitive because you have so many processes and we know that it most, one is going to fail.

310
00:53:16.920 --> 00:53:34.320
Nancy Lynch: Then couldn't all the rest of them managed to ignore that process that might have failed a brief and then later tell the remaining process with the answer is in case he hasn't actually fail. I'm certainly steam. So, but that doesn't work.

311
00:53:36.660 --> 00:53:53.040
Nancy Lynch: And the proof goes like this. I contradiction. Suppose you have a one fault tolerant asynchronous algorithm itself consensus. And again, we just use the problem requirements to show that something has to go wrong. Okay, the value said can still be just binary

312
00:53:54.630 --> 00:54:13.620
Nancy Lynch: Okay, so let's look at an execution. Now it's not proceeding round by round. They synchronous and execute execution knows a sequence of steps were in one step one process gets one message updated state and sends out a finite number of other messages which go on to the channels.

313
00:54:15.840 --> 00:54:22.050
Nancy Lynch: That are delivered yet to just somebody receives a message and sends a bunch of resulting messages.

314
00:54:24.960 --> 00:54:31.980
Nancy Lynch: Okay, we can assume that every message that sent eventually gets delivered, but we don't have any time bounce here. This is a synchronous

315
00:54:34.650 --> 00:54:43.620
Nancy Lynch: Okay, so this this execution produces a sequence of configurations of the system global configurations of all processes and all the channels.

316
00:54:45.330 --> 00:54:52.650
Nancy Lynch: Okay, if you have an execution which everybody starts with zero, you know, the allowable decision is zero. And the same for one

317
00:54:54.300 --> 00:54:58.290
Nancy Lynch: If you have mixed inputs. You're allowed to have either decision.

318
00:55:00.450 --> 00:55:11.640
Nancy Lynch: And the core of the argument is to prove that the algorithm has to have a certain kind of pattern of configurations, where a decision can get made.

319
00:55:12.570 --> 00:55:27.540
Nancy Lynch: A pattern consisting of four configurations. Let's call them C zero C 131 so d zero results from see one just by having a particular process I receiving a particular message me

320
00:55:29.340 --> 00:55:39.180
Nancy Lynch: Okay configuration d one results from configuration see one in the same way. Same process I receiving the same message me

321
00:55:41.400 --> 00:55:52.590
Nancy Lynch: And now the relationship between zero and c one c one follows from C zero and one step where some process PJ receives a message m prime

322
00:55:54.660 --> 00:55:55.170
Nancy Lynch: Okay.

323
00:55:56.220 --> 00:55:56.880
Nancy Lynch: So,

324
00:55:58.170 --> 00:56:21.780
Nancy Lynch: We have this this funny configuration instead of configurations and d zero determines that the final decision will be zero. It's called zero balance, the only decision that's possible after this to figuration is zero, the only decision that's possible after this configuration is one

325
00:56:23.190 --> 00:56:33.750
Nancy Lynch: So somehow a decision is getting made in this very simple, very simple way localizing a decision to a particular pattern of configurations

326
00:56:34.740 --> 00:56:51.750
Nancy Lynch: Turns out to get to this point you make an argument that says if not if you never have this kind of a pattern. You could make the algorithm continue executing forever all processes continuing to take steps and no one ever deciding which would contradict the termination requirement.

327
00:56:53.190 --> 00:56:55.650
Nancy Lynch: So yeah, eventually you get down to this.

328
00:56:57.180 --> 00:57:00.300
Nancy Lynch: And now you just have to consider too simple cases.

329
00:57:01.530 --> 00:57:05.610
Nancy Lynch: Supposed to suppose these are two different processes I NJ are two different processes.

330
00:57:06.750 --> 00:57:16.650
Nancy Lynch: Well, okay then you're looking at a situation where a different process receive some message process I received some message.

331
00:57:17.910 --> 00:57:19.920
Nancy Lynch: What happens if you let

332
00:57:21.150 --> 00:57:25.920
Nancy Lynch: Process j receive a message in prime after d zero

333
00:57:27.990 --> 00:57:34.740
Nancy Lynch: Well, since it's after the zero, you still have to be zero balance because you really determine that the decision has to be zero.

334
00:57:36.420 --> 00:57:38.730
Nancy Lynch: So we have if we deliver

335
00:57:40.230 --> 00:57:47.220
Nancy Lynch: MTI and then and prior to PJ, you get zero balance. But if you do in the other order, you get one balance.

336
00:57:49.680 --> 00:57:55.920
Nancy Lynch: But these are two steps that happened at completely different processes. So the relative order can possibly matter.

337
00:57:56.700 --> 00:58:05.700
Nancy Lynch: You know I'm receiving a message here and receiving another message there and either case, I just put some output messages on the channels, doesn't matter what order that happens.

338
00:58:06.840 --> 00:58:23.220
Nancy Lynch: So it can't be two different processes. So now you're down to a case where they must be the same process i equals j and the only thing that's happening here is I could be receiving one message em or different message in prime and somehow that's going to make the difference.

339
00:58:24.300 --> 00:58:36.990
Nancy Lynch: But suppose you started from C zero. So, another key is, we have failure to play with. We're, we're trying to tolerate a single failure. So what you can do is just say fail process nine

340
00:58:38.100 --> 00:58:56.670
Nancy Lynch: Everybody else can take steps from this configuration. And we know that the since you can tolerate and failure of this one process I they eventually have to reach a decision, but if you apply that same execution from t zero or d one

341
00:58:57.690 --> 00:58:59.790
Nancy Lynch: You'll still get the same decision.

342
00:59:01.560 --> 00:59:08.430
Nancy Lynch: But we know that anything after the zero as opposed to reach a decision of zero. And after the one supposed to reach a decision of one

343
00:59:09.690 --> 00:59:14.520
Nancy Lynch: Other processes can distinguish which of these cases they're running in because

344
00:59:16.440 --> 00:59:31.560
Nancy Lynch: They're not we're not looking at the internals of process I were just looking at everybody else. So this yields a contradiction because they can't distinguish a situation where they have to decide zero from a situation where they have to decide one

345
00:59:33.690 --> 00:59:40.110
Nancy Lynch: Okay, so I suggested this looks kind of mysterious. This is a paper, you probably want to take a look at

346
00:59:41.310 --> 00:59:44.970
Nancy Lynch: Okay, now I'm supposed to finish precisely at 12

347
00:59:48.090 --> 00:59:56.760
Rance Cleaveland: That's bizarre. But I think we have the we have the zoom as a word, as opposed to the room until 1230 but I'd like to leave some time for questions.

348
00:59:57.720 --> 00:59:58.890
Nancy Lynch: Well alright so

349
00:59:59.820 --> 01:00:16.530
Nancy Lynch: I think I'll skip the the CAP theorem part and probably the wireless network march and just skip to something near the end of the talk. So here the consensus. The upshot of this is very important problem and practice.

350
01:00:17.850 --> 01:00:26.130
Nancy Lynch: As I said, For database permit, for example, and the FOP results shows and limitations on the kinds of algorithms, you can hope to find

351
01:00:27.540 --> 01:00:35.370
Nancy Lynch: Know this problem has to be solved in practice. So what could you do, you could use rather than this, you can rely on timing assumptions.

352
01:00:36.090 --> 01:00:49.050
Nancy Lynch: Or you can very carefully weaken the problem requirements so that you always have agreement and validity, but only require termination within the system behavior stabilizes.

353
01:00:49.890 --> 01:01:00.570
Nancy Lynch: So that's what's in fact been done. Okay, so there's lots of other possibility results that have been proved. I'm going to skip through the CAP theorem.

354
01:01:02.250 --> 01:01:11.910
Nancy Lynch: Okay. Oh, foundations, well, all of this is based on Iowa commoner and things like that so

355
01:01:13.620 --> 01:01:14.010
Nancy Lynch: Yeah.

356
01:01:15.420 --> 01:01:15.900
Nancy Lynch: I'm

357
01:01:17.370 --> 01:01:22.590
Nancy Lynch: Not sure what to say about that. Maybe I've already said enough. These are papers to look at to

358
01:01:24.570 --> 01:01:27.060
Nancy Lynch: Review the Iowa Tanga framework.

359
01:01:28.290 --> 01:01:46.200
Nancy Lynch: And this is just an example of a system described using interacting I automaton implementing a higher level abstract component also an automaton. This is Australian composition and abstraction relations or automaton.

360
01:01:47.460 --> 01:01:51.630
Nancy Lynch: But we also did times versions of this. There's a book about that.

361
01:01:52.680 --> 01:01:58.470
Nancy Lynch: And probabilistic versions. One was just recognized at concur last year.

362
01:01:59.880 --> 01:02:04.740
Nancy Lynch: This is work mainly by Roberto the gala on probabilistic automaton.

363
01:02:06.060 --> 01:02:12.540
Nancy Lynch: Okay, so the algorithms for new distributed systems are basically

364
01:02:14.460 --> 01:02:28.680
Nancy Lynch: things we've been working on. More recently, the difference is that these new types of systems noise and uncertainty and change predominate. So we're looking particularly at wireless networks and biological systems.

365
01:02:30.090 --> 01:02:33.090
Nancy Lynch: So, but it's the same type of research again.

366
01:02:34.290 --> 01:02:51.600
Nancy Lynch: Again, we're looking at abstract models for problems and algorithms new algorithms ways to prove correctness and performance properties in possibility results and they're still work needed on general foundations for these very new types of assistance.

367
01:02:53.520 --> 01:02:54.270
Nancy Lynch: So,

368
01:02:56.130 --> 01:03:02.820
Nancy Lynch: I'm going to skip the wireless networks entirely. It's really fun stuff. But I'm not enough time so

369
01:03:04.140 --> 01:03:11.640
Nancy Lynch: Hopefully we can post the slides. So if people are interested, they can take a look okay for biological systems. I'll just say a few words.

370
01:03:12.420 --> 01:03:22.320
Nancy Lynch: The kinds of distributed algorithms we're looking at our insight colonies, a little bit on developing organisms and quite a bit more now on brain networks.

371
01:03:24.360 --> 01:03:34.380
Nancy Lynch: These are very simple models. I mean, the components are rather simple the algorithms tend to be fairly simple not elaborate

372
01:03:35.580 --> 01:03:45.510
Nancy Lynch: The goals of this work are both to try to help understand biological systems in a new way. As algorithms, but also

373
01:03:46.020 --> 01:03:53.580
Nancy Lynch: Feedback and go the other way, once you understand how the biological systems operate, sometimes you can get new ideas for engineering algorithms.

374
01:03:53.880 --> 01:04:05.130
Nancy Lynch: Like machine learning algorithms and results. We've got our understanding of the brain operation. Okay, this is these are some references on our insect colony work.

375
01:04:07.320 --> 01:04:22.680
Nancy Lynch: Problems. We've looked at it included task allocation searching problems very interesting consensus problem agreeing on a new nest and problems were inset colonies have to estimate how many other insects are

376
01:04:24.540 --> 01:04:25.350
Nancy Lynch: Actually where

377
01:04:26.460 --> 01:04:29.610
Nancy Lynch: We've been publishing some of this work in biology journals.

378
01:04:31.680 --> 01:04:32.610
Nancy Lynch: And then our

379
01:04:33.960 --> 01:04:36.570
Nancy Lynch: Recent work on brains uses

380
01:04:37.800 --> 01:04:48.570
Nancy Lynch: A stochastic spiking neural network model we studied problems like winner take all which is a kind of leader election from the focus and attention in the brain.

381
01:04:49.530 --> 01:04:59.940
Nancy Lynch: Waves of coding concepts in neural networks and ways of learning. And here's just a sampling of some of our recent papers on that.

382
01:05:01.380 --> 01:05:10.530
Nancy Lynch: OK, so the conclusions are that we've worked on the theory for distributed systems to help understand capabilities and limitations.

383
01:05:11.580 --> 01:05:26.760
Nancy Lynch: These are the different types of work that's been involved and this work has covered many kinds of systems over the year starting with distributed data management systems and moving through communication systems and currently different kinds of biological systems.

384
01:05:27.810 --> 01:05:35.700
Nancy Lynch: It's the same kind of aims, the same lens of algorithmic lens that we're applying to all of these different areas.

385
01:05:37.050 --> 01:05:44.310
Nancy Lynch: And there's still a lot more to do. For those of you who are interested in research, especially in these newer areas.

386
01:05:45.630 --> 01:05:51.270
Nancy Lynch: And I have to thank my literally hundreds and hundreds of collaborators on all this work.

387
01:05:52.440 --> 01:05:53.100
Nancy Lynch: And thank you.

388
01:05:55.080 --> 01:06:08.100
Rance Cleaveland: All right. Thank you very much. Nancy, and indeed it is unusual giving a talk in this new normal situation is a word for us. Now, we would all be a clapping, I have to clap for everybody on behalf of everybody else.

389
01:06:08.910 --> 01:06:15.300
Rance Cleaveland: And including not only the people who are on the zoom meeting, but also the people who are attending via

390
01:06:16.440 --> 01:06:34.320
Rance Cleaveland: Webinar mechanism as well as on on YouTube. This is broadcast on YouTube, so I'll open the floor for questions. Now if you those who are attending. If you have a question, please use the Q AMP a feature in in zoom and I'll moderate the questions and read them out.

391
01:06:35.400 --> 01:06:44.040
Rance Cleaveland: I'll maybe get things started, though a lot of people start marshaling there of marshaling their questions. I'm, I'm curious in the in the

392
01:06:44.520 --> 01:06:57.990
Rance Cleaveland: In particular, in the sort of biological settings that you talked about before, but I'm adaptations, you had to make to the basic theoretical models of of have distributed computation.

393
01:07:00.090 --> 01:07:05.220
Nancy Lynch: Okay, so we're pretty well moved to or just using synchronous models in those cases.

394
01:07:07.590 --> 01:07:09.000
Nancy Lynch: The models are stochastic

395
01:07:10.980 --> 01:07:18.150
Nancy Lynch: For inset colonies, you basically have a sort of population protocol style model where you have a bunch of insects that

396
01:07:19.170 --> 01:07:21.780
Nancy Lynch: May bump into each other, but we

397
01:07:23.760 --> 01:07:35.790
Nancy Lynch: Need to have a simple kind of stage that could put it in different kinds of modes and modes would allow them to make certain decisions of the next action with certain probability their inputs.

398
01:07:38.580 --> 01:07:45.810
Nancy Lynch: correspond to what they see in their environment as well as interactions with other insects.

399
01:07:46.470 --> 01:07:59.130
Nancy Lynch: So yeah, I mean it's a sort of agent based model with some differences, but we stuck. Pretty much to keeping everything synchronous just because insects and biological systems have really good notions of time.

400
01:08:00.390 --> 01:08:05.220
Nancy Lynch: So there's no reason to think about, you know, arbitrary race conditions as we didn't meet

401
01:08:06.750 --> 01:08:16.920
Nancy Lynch: The data. The data system network kind of models for brain systems were using a sense to cast fix biking neural network. So that's I'm

402
01:08:17.910 --> 01:08:35.040
Nancy Lynch: Networks, where you have firing that occurs based on accumulated potential from neighboring neurons and for us stochastic so the firing isn't guaranteed to happen when you reach a threshold, but the incoming potential

403
01:08:36.120 --> 01:08:37.080
Nancy Lynch: Helps to

404
01:08:38.790 --> 01:08:40.770
Nancy Lynch: Determine the probability of fire.

405
01:08:42.360 --> 01:08:51.660
Nancy Lynch: So I'm in the probability automatically makes the algorithms noise tolerance, because if you're going to work with that type of probabilistic noise.

406
01:08:53.490 --> 01:08:58.320
Nancy Lynch: Yeah, you're going to be able to handle quite a lot of noise and uncertainty.

407
01:08:59.370 --> 01:08:59.850
Okay.

408
01:09:00.930 --> 01:09:09.240
Rance Cleaveland: So we have a question that came up on the on the Q AMP. A from you, Eugene. Have you discovered any errors and biological networks.

409
01:09:11.880 --> 01:09:14.190
Nancy Lynch: In real biology know

410
01:09:15.600 --> 01:09:29.280
Nancy Lynch: Biology seems to work. Okay. I mean, there are certainly biology papers that show bizarre behavior on the part of a real insight colony is like ants that start walking around in a circle forever until they die.

411
01:09:31.410 --> 01:09:35.460
Nancy Lynch: Yeah, I mean, these are simple state machines and maybe they have bugs.

412
01:09:38.250 --> 01:09:41.550
Rance Cleaveland: Thinking. Yeah. Yes, yes, yes. Well, in fact,

413
01:09:43.140 --> 01:09:47.370
Nancy Lynch: We need to find mistakes and a biological system. They are what they are, I guess.

414
01:09:49.020 --> 01:09:56.310
Rance Cleaveland: Right. And so there's a follow up saying that, indeed, he was he was interested in these sort of bizarre behaviors that are large

415
01:09:56.940 --> 01:10:13.620
Nancy Lynch: Know, well you can analyze that kind of thing. We haven't done that mode but um yeah I mean you can describe by describing the insects as simple state machines, you can try to reproduce the behavior we're doing a lot of simulations now.

416
01:10:14.910 --> 01:10:22.740
Nancy Lynch: Rather than plunging right into proving theorems. We testing out because there's a lot going on. And you also want to play with the parameter values.

417
01:10:25.020 --> 01:10:28.500
Nancy Lynch: Right, right. Once we get once we get observed behavior. We try to

418
01:10:28.500 --> 01:10:29.190
Nancy Lynch: Prove it.

419
01:10:31.440 --> 01:10:40.440
Rance Cleaveland: Right, I guess the other thing too, and in line with that is that you have in the case of biology. You also have this sort of this sort of great

420
01:10:41.700 --> 01:10:46.320
Rance Cleaveland: I don't know what the right word is debugging engine and evolution that

421
01:10:47.880 --> 01:10:52.500
Rance Cleaveland: If too many and start following each other in a circle, then the species is imperiled

422
01:10:52.770 --> 01:10:53.760
Rance Cleaveland: Right, I'm

423
01:10:55.350 --> 01:10:55.590
Rance Cleaveland: Some

424
01:10:56.910 --> 01:10:57.420
Rance Cleaveland: Sort of

425
01:10:59.370 --> 01:11:01.890
Nancy Lynch: Behavior like that. That's true, yeah.

426
01:11:02.160 --> 01:11:03.240
Rance Cleaveland: Yeah, no.

427
01:11:04.320 --> 01:11:04.830
Rance Cleaveland: Um,

428
01:11:06.030 --> 01:11:13.860
Rance Cleaveland: I had another word for questions have come in. I had another another question to this relates to something you mentioned at the beginning of your talk about the

429
01:11:15.210 --> 01:11:22.260
Rance Cleaveland: The 2007 Dykstra award paper that was, I guess, published in 88. Is that right, am I remembering that

430
01:11:25.200 --> 01:11:29.850
Rance Cleaveland: And you you alluded to the fact that the paper was sort of

431
01:11:30.900 --> 01:11:46.770
Rance Cleaveland: Published and AND WAS SATIN dormant from a practical perspective, until, you know, a decade or so later when it was suddenly picked up, you know what, what led the how. How about happened.

432
01:11:47.910 --> 01:11:49.500
Rance Cleaveland: What the story was behind

433
01:11:50.430 --> 01:11:55.830
Nancy Lynch: people needing to build certain kinds of systems cloud storage is a big thing and I needed to

434
01:11:57.510 --> 01:11:58.860
Nancy Lynch: You know, try to maintain

435
01:12:00.150 --> 01:12:05.190
Nancy Lynch: Correctness. And the number of failures of some number of servers. I'm not. I'm guessing you

436
01:12:05.760 --> 01:12:06.090
Nancy Lynch: Know I'm

437
01:12:06.780 --> 01:12:07.710
Wondering, because

438
01:12:08.730 --> 01:12:20.610
Rance Cleaveland: It's something sits for a while. A lot of times people just you know rediscover AND or OR without realizing it, or you know we discover something wrong, that are rediscover sort of

439
01:12:21.240 --> 01:12:35.340
Rance Cleaveland: An error erroneous version. I'm just really intriguing also encouraging to me that the system of people in this case we're able to uncover and put to good use the results that you've proven

440
01:12:37.110 --> 01:12:49.350
Nancy Lynch: Because there are people like DeLeon Maki LAN parties to stand between theory and the system so they know the theory and when they see the systems problem they can call him some of the very learned and apply it.

441
01:12:49.980 --> 01:12:51.660
Rance Cleaveland: Right, right, right.

442
01:12:56.460 --> 01:13:12.720
Rance Cleaveland: There was another another comment in the and the q AMP. A looting of picking up this theme of evolutionary evolution saying it would be weird if every security species we observed wasn't an evolutionary liberalism know behavioral bugs and

443
01:13:14.100 --> 01:13:30.360
Rance Cleaveland: Is what's taken, of course, one what evolutionary equilibrium is from the species to die out, I guess. And quite frankly, if you are in an equilibrium. That's probably the liberty to wind up that because the environmental conditions will ensure continue to change. Even the species doesn't

444
01:13:31.980 --> 01:13:37.440
Nancy Lynch: Well, certainly. Yeah, there's a species like fruit flies that evolved over many, many generations.

445
01:13:38.670 --> 01:13:48.600
Nancy Lynch: And you might think they're in kind of a stable configuration as far as the mechanisms that they use some work by socket and a blocker is worth looking at their

446
01:13:49.980 --> 01:13:55.260
Rance Cleaveland: Yeah, yeah. So I have another another question just came in from funder.

447
01:13:56.280 --> 01:13:57.810
Rance Cleaveland: On the on the

448
01:13:59.640 --> 01:14:15.600
Rance Cleaveland: Let's see. I'll just read you the question, how to goals distributed decision making translated in set colonies. They tend to be more complex or partial solutions more acceptable for instance, perhaps, you don't need all or that many in fact insects to come to an agreement.

449
01:14:17.070 --> 01:14:18.810
Rance Cleaveland: Enable any simple vacations.

450
01:14:23.760 --> 01:14:35.130
Nancy Lynch: And we say yes, but you get algorithms with certain probabilities of failure, but so I'm thinking of an example that we're working on currently and this is

451
01:14:36.480 --> 01:14:39.660
Nancy Lynch: protocols that answer us to find a new nest.

452
01:14:40.170 --> 01:14:54.660
Nancy Lynch: So this is a consensus problem, but it's a consensus problem involving random searching and trying to persuade you decide what you think is the best nest and trying to persuade others. And once you decide and forcing others to to follow you.

453
01:14:55.140 --> 01:15:03.330
Nancy Lynch: Know, observing in the ant experiments you observed a certain probability that this doesn't exceed doesn't succeed. Within a short amount of time.

454
01:15:04.260 --> 01:15:22.650
Nancy Lynch: Um, so, you know, that kind of probability. Maybe unacceptable in an engineering system also a lot of those errors are self correcting like if you wait a day, all we have will wind up in the same nest, whereas the initial execution of the algorithm didn't accomplish that.

455
01:15:23.970 --> 01:15:37.200
Nancy Lynch: So yeah, I mean they are noise tolerant and self correcting but there's there's new aspects like short timeframe succeeds with a certain probability much longer timeframe.

456
01:15:38.850 --> 01:15:40.500
Nancy Lynch: it succeeds with a higher probability

457
01:15:42.600 --> 01:15:42.990
Nancy Lynch: So we

458
01:15:43.020 --> 01:15:51.240
Nancy Lynch: We've been working with insect biologists done on this project and we just submitted paper to our own biology, the journal

459
01:15:52.260 --> 01:16:04.710
Nancy Lynch: Is based on on a simulator which acts just like real ounce of a certain species and you can mimic the observed behavior very closely with the experiment experimental biologists and thumbs.

460
01:16:06.510 --> 01:16:10.560
Nancy Lynch: But now we get to play with the simulations, see what else we can make the species do

461
01:16:13.980 --> 01:16:24.300
Rance Cleaveland: Some of the just a curiosity, what causes the these decisions to be made that we need a new colony. Is there some sort of external environmental stressors.

462
01:16:24.300 --> 01:16:24.780
Rance Cleaveland: Yeah, it's

463
01:16:25.980 --> 01:16:28.470
Nancy Lynch: Kind of like the experimentalists Rex the nest.

464
01:16:29.340 --> 01:16:31.740
Rance Cleaveland: Okay. Okay, so then the answer.

465
01:16:32.100 --> 01:16:34.020
Nancy Lynch: Hey, I'm forced to go find a new nest.

466
01:16:34.200 --> 01:16:37.140
Rance Cleaveland: Okay, so they're highly motivated to do so presumably

467
01:16:37.710 --> 01:16:38.190
Nancy Lynch: Yeah.

468
01:16:38.790 --> 01:16:39.180
Mm hmm.

469
01:16:40.800 --> 01:16:43.920
Nancy Lynch: It does this in the lab. They also do it in the desert. We visited

470
01:16:45.090 --> 01:16:53.490
Nancy Lynch: Yeah, that's another story. But we looked at some activities announced in in the desert. The people in Arizona.

471
01:16:54.600 --> 01:17:00.840
Nancy Lynch: Study ants because they have quite a few local and available for study.

472
01:17:03.000 --> 01:17:03.600
Rance Cleaveland: Sort of the

473
01:17:04.800 --> 01:17:12.240
Rance Cleaveland: Are these algorithms typically fully decentralized or is are there sort of leader. Is there something like leader election that goes on in these amped oh

474
01:17:12.240 --> 01:17:13.560
Nancy Lynch: Nothing is centralized

475
01:17:13.950 --> 01:17:15.900
Nancy Lynch: Nothing centralized in that

476
01:17:16.650 --> 01:17:24.120
Nancy Lynch: Well, I mean, in the brain algorithms. Yeah, I mean, there's, there could be some chemical that's flooding double mean can flood neurons. And that's kind of a

477
01:17:24.990 --> 01:17:37.170
Nancy Lynch: Little thing, but in the insect colonies. No, there isn't any central direction of any come. Okay. Okay. You might think there's a queen and the queen would tell the others what to do, but she does not

478
01:17:37.890 --> 01:17:38.460
Okay.

479
01:17:39.510 --> 01:17:40.680
Rance Cleaveland: I was wondering about that.

480
01:17:41.790 --> 01:17:51.450
Rance Cleaveland: And so does that mean that like bees and the answer, different or or B is also sort of decentralized in the midst of the sort of all powerful queen is sort of not

481
01:17:53.430 --> 01:17:57.900
Nancy Lynch: Because that's a different, it's pretty much the same. These are very similar.

482
01:17:58.440 --> 01:17:58.830
Nancy Lynch: Right.

483
01:17:59.340 --> 01:18:07.020
Nancy Lynch: So much smarter have more elaborate behaviors that can indicate directions, for example.

484
01:18:09.150 --> 01:18:10.440
Nancy Lynch: Dancing in a certain way.

485
01:18:12.180 --> 01:18:18.480
Nancy Lynch: Fascinating behavior, maybe has limitations on how much that contributes to engineering

486
01:18:19.980 --> 01:18:36.870
Nancy Lynch: Because this the insects work under a lot of constraints that many computers don't have. But on the other hand, they also have strategies for resilience robustness highly fault tolerant noise tolerant and we can learn from from their strategies.

487
01:18:37.260 --> 01:18:38.520
Rance Cleaveland: Right, right.

488
01:18:42.660 --> 01:18:45.660
Rance Cleaveland: Okay. Well, I think we've exhausted the questions so

489
01:18:47.670 --> 01:18:57.630
Rance Cleaveland: Why don't we go ahead and conclude for today. So I'll just remind the NSF attendees, that there is a session from I think it's from three to four today.

490
01:18:59.100 --> 01:19:04.650
Rance Cleaveland: For sort of a more one on one interaction with with with Nancy

491
01:19:05.910 --> 01:19:15.150
Rance Cleaveland: In the meantime, Nancy. Thanks for a fabulous talk. It was on the one hand, great to hear revisit those results that were so influential me

492
01:19:16.230 --> 01:19:36.630
Rance Cleaveland: I've been so influential on me throughout my career. And I have to admit, I think we could spend another hour just talking about the the the insect and brain work to there so so much interesting going on there. So, um, so thank you again and we'll reconvene at that at three o'clock and

493
01:19:37.980 --> 01:19:43.530
Rance Cleaveland: I guess everybody at least on the East Coast now can go grab a sandwich or something and have some lunch.

494
01:19:44.850 --> 01:19:46.110
Nancy Lynch: And everybody location.

495
01:19:47.190 --> 01:19:49.050
Rance Cleaveland: It will. I think it's a different link.

496
01:19:49.980 --> 01:19:51.390
Rance Cleaveland: It. So I hope it's

497
01:19:51.810 --> 01:19:55.260
Rance Cleaveland: Do you have a do you have that link. I'll make sure that you get it.

498
01:19:56.700 --> 01:19:59.040
Nancy Lynch: Maybe send that to me. I don't remember it now.

499
01:19:59.550 --> 01:20:07.230
Rance Cleaveland: Yeah, yeah. So, Blaine, if you're online if you could just make sure that that Professor lunch gets the, the new link, that'd be great.

500
01:20:08.400 --> 01:20:11.400
Rance Cleaveland: Otherwise, we'll make sure that you get it, Nancy. I'll make sure it's email to you.

501
01:20:11.550 --> 01:20:13.050
Rance Cleaveland: Okay. All right.

502
01:20:13.230 --> 01:20:14.070
Rance Cleaveland: Thank you very much.

503
01:20:16.200 --> 01:20:16.560
Rance Cleaveland: All right. Bye bye.