WEBVTT 00:00:00.000 --> 00:00:16.000 Okay, let me make sure. I think, yeah. I think that goes. Hello, we're going to give it just another half a minute to make sure our numbers are going to give it just another half a minute to make sure our numbers are stable ones. 00:00:16.000 --> 00:00:20.000 People are joining. 00:00:20.000 --> 00:00:27.000 I can't see the Q&A so you can just handle that on your end. Yeah. Okay. 00:00:27.000 --> 00:00:40.000 Well, thank you everyone for joining us here in person it's great to see. All right, okay. Well, thank you everyone for joining us here in person. 00:00:40.000 --> 00:00:45.000 It's great to see people in the room despite last minute changes in the rooms. And thank you everyone for joining us here in person. 00:00:45.000 --> 00:00:50.000 It's great to see people in the room despite last minute changes in the rooms despite last minute changes in the rooms. 00:00:50.000 --> 00:00:55.000 And thank you everyone for joining the room despite last minute changes in the room despite last minute changes in the rooms. 00:00:55.000 --> 00:01:01.000 And thank you everyone for joining us online. And thank you everyone for joining us online. 00:01:01.000 --> 00:01:08.000 And thank you everyone for joining us online. And thank you everyone for joining us online. 00:01:08.000 --> 00:00:45.000 And we apologize for a couple of minute changes in the rooms. And thank you everyone for joining us online. And we apologize for a couple of minute delay. 00:00:45.000 --> 00:01:15.000 We had some technical difficulties. You would think by now we know how to do Zoom, but things come up. 00:01:18.000 --> 00:01:42.000 So thank you very much for your, patience. Today we're very, very excited to have the distinguished lecturer, Rebecca Willett and holds a courtesy appointment in the Toyota Institute, the College Institute in Chicago as well. 00:01:42.000 --> 00:01:54.000 Some of it's accomplishments I want to highlight. She's a fellow of Ipoli. 00:01:54.000 --> 00:02:02.000 She's all a fellow of Saia. And her research have been really, recognized with very, distinguished, awards such as career work from NSF, which of course is the best. 00:02:02.000 --> 00:02:18.000 And, and young fellow from Infosar. So I think without taking more time on the introduction, I just wanted to. 00:02:18.000 --> 00:02:35.000 So I think without taking more time on the introduction, I just wanted to. So I think without taking more time on the introduction, I just wanted to, hand it over to you. 00:02:35.000 --> 00:02:49.000 I do wanna also highlight, I think without taking more time on the introduction, I just wanted to, hand it over to you. 00:02:49.000 --> 00:03:19.000 I do wanna also highlight, hand it over to you. I do wanna also highlight, we're excited to see that not just only, hand it over to you. 00:04:55.000 --> 00:04:59.000 The medical imaging data. The way that we take that data and form an image for somebody to interpret or the way that computers are used to help interpret those images. 00:04:59.000 --> 00:05:14.000 It's going to affect the way surgeons plan and execute surgeries and even affect the pharmaceuticals that are prescribed post operations. 00:05:14.000 --> 00:05:26.000 And I think the impacts of AI and machine learning go beyond healthcare. So we've heard people talking about the potential of machine learning to help. 00:05:26.000 --> 00:05:26.000 So I'm just getting a notification here. From zoom. Sorry about that. 00:05:26.000 --> 00:05:35.000 We've heard people talking about the potential of AI and machine learning to impact all kinds of things that the National Science Foundation cares about. 00:05:35.000 --> 00:05:47.000 Developing a new understanding of the rules of life or the laws of nature, accelerating affordable drug development. 00:05:47.000 --> 00:06:00.000 Especially for you know somewhat rare diseases or underserved communities engineering green materials building quantum computers and even developing sustainable climate policies. 00:06:00.000 --> 00:06:12.000 And when you hear these lists, I think it's tempting to think that we've just developed great machine learning in AI technologies. 00:06:12.000 --> 00:06:23.000 We see evidence of that regularly. And then all that's left to do perhaps is to take those tools and figure out how to plug them into different domains. 00:06:23.000 --> 00:06:36.000 But what I'd really like to emphasize today is that is not the state of the world, that it's absolutely vital that that is not the state of the world, that it's absolutely vital that it's absolutely vital that we invest in fundamental AI and machine learning research. 00:06:36.000 --> 00:06:40.000 And in particular, I would like to claim that we invest in fundamental, AI and machine learning research. 00:06:40.000 --> 00:06:45.000 And in particular, I would like to claim that if we invest in fundamental, AI and machine learning research. 00:06:45.000 --> 00:07:01.000 And in particular, I would like to claim that if we try to develop applied machine learning research. And in particular, I would like to claim that if we try to develop applied machine learning without really understanding the mathematical, I would like to claim that if we try to develop applied machine learning research. 00:07:01.000 --> 00:07:12.000 And in particular, I would like to claim that if we try to develop applied machine learning without really understanding the So with that in mind, today I wanna have covered 2 core areas. 00:07:12.000 --> 00:07:23.000 One is just trying to do a retrospective look at some of the areas where the machine learning foundations that have been developed by the community and supported by the NSF have really had a major impact on the field and on practice. 00:07:23.000 --> 00:07:43.000 And then second, I want to talk about some of the emerging and future directions within the AI and machine learning communities and the role that foundational research is expected to play in those areas and what some of the major open questions are. 00:07:43.000 --> 00:07:47.000 Okay, so first let's just talk about like some examples where foundational research has really been impactful and where we needed that that understanding. 00:07:47.000 --> 00:08:00.000 And I think maybe the first example that would come to mind for many of us is in the context of optimization theory. 00:08:00.000 --> 00:08:08.000 So when we're performing machine learning, we set up a model and that model has got parameters or in the context of a neural network we've got weights on all of the different nodes. 00:08:08.000 --> 00:08:17.000 And what we do is we take training data and figure out how to set those parameters or weights so that we make good predictions on the training. 00:08:17.000 --> 00:08:23.000 And we do this by solving an optimization problem. So we're going to compute this by solving an optimization problem. 00:08:23.000 --> 00:08:25.000 So we're going to compute gradients of some loss function at our current values of the parameters and update them based on gradients. 00:08:25.000 --> 00:08:39.000 On the gradients of the loss. The one foundational innovation that's relatively recent that I'd like to highlight is, for instance, Adagrad. 00:08:39.000 --> 00:08:49.000 And what Ada Grant does is it takes these gradients of the losses and adapts them to past parameter estimates or past weight estimates. 00:08:49.000 --> 00:08:56.000 And this ultimately, it can accelerate training. It means that we can find good parameter estimates with many fewer iterations. 00:08:56.000 --> 00:09:11.000 And this kind of theoretical foundational algorithm is one of the key ideas that underpins for instance the Adam algorithm which is widely used across industry academia, national labs, etc. 00:09:11.000 --> 00:09:22.000 For training machine learning models. Another example, tied to optimization is distributed optimization across a collection of computers. 00:09:22.000 --> 00:09:31.000 Say, nodes in a cluster. So the expectation is that if I were to invest money and double this size of my cluster then I should be able to train my machine learning method twice as fast. 00:09:31.000 --> 00:09:44.000 But in early days of machine learning practice, this expectation was not realized. We very quickly hit a point of diminishing returns where I maybe would double my investment in computational infrastructure. 00:09:44.000 --> 00:10:14.000 And only get a tiny little improvement in the speed of my training. And so foundational fundamental research in optimization and distributed optimization helps us understand why this was occurring and that in turn led to new algorithms and just one example of this is the hog wild algorithm that was developed at the University of Wisconsin where I used to work where they used some of these theoretical insights to develop asynchronous distributed optimization methods that 00:10:23.000 --> 00:10:28.000 had this expected property, right? If I were to double the size of my computational infrastructure, then I would come close to doubling the speed of my training. 00:10:28.000 --> 00:10:46.000 And this led to many other innovations in terms of distributed optimization that were fundamental to training some of the large-scale tools that we see prevalent today like Chat GTP. 00:10:46.000 --> 00:10:55.000 I'd also like to highlight privacy guarantee. So when we're dealing with machine learning data. And it reflects a human data. 00:10:55.000 --> 00:11:08.000 We want to make sure that their privacy is protected. One standard that people use to say whether their algorithm is protecting people's privacy is something called K and identity. 00:11:08.000 --> 00:11:29.000 So K anonymity says that I'm going to transform my data just enough to make sure that I'm going to transform my data just enough to make sure that any individual in my data just enough to make sure that any individual in my data set is going to be indistinguishable from K other individuals to make sure that any individual in my data set is going to be indistinguishable from K other individuals. 00:11:29.000 --> 00:11:41.000 And this standard is, encoded into, law, right? This is considered sufficient for fulfilling, laws like HIPAA requirements in the US or GDPR But recently, one of my colleagues at U Chicago, Alumni Cohen, showed that this is actually a vulnerable standard. 00:11:41.000 --> 00:11:52.000 He said, you know, when people actually apply this, they don't want to change their data too much because they want to learn from it as much as possible. 00:11:52.000 --> 00:12:05.000 And so they would redirect the minimum possible amount of information to satisfy this requirement. And so that gives a malicious actor some additional information about that data set. 00:12:05.000 --> 00:12:17.000 So initially this work guy a somewhat negative reaction. People said, well, this is sort of like a theoretical bound, but I don't know how you could use that in a practical sense to risk someone's privacy. 00:12:17.000 --> 00:12:30.000 But then Aloni and his collaborators actually came up with an algorithm based on these theoretical foundational insights that demonstrated how you could leverage this insight to violate people's privacy. 00:12:30.000 --> 00:12:39.000 So I think this is a really key example of how foundational research is absolutely essential. And sort of analogously, we've had a lot of work done supported by the NSF in areas like differential privacy, which give us an alternative mechanism. 00:12:39.000 --> 00:12:56.000 For trying to safeguard policy by adding random perturbations to the data. And these are coming with, you know, statistical guarantees to the data. 00:12:56.000 --> 00:13:04.000 And these are coming with, you know, statistical guarantees to the data. And these are coming with, you know, statistical guarantees that, were absent from the kind of K anonymity standards that we were just considering 00:13:04.000 --> 00:13:19.000 Another area where foundational research has really impacted machine learning is in quantifying uncertainty. So we often think about machine learning tools as taking in an input, say a feature vector and producing a label, a prediction. 00:13:19.000 --> 00:13:27.000 But in many, many settings, we don't only want sort of a point prediction. We want some uncertainties associated with that prediction. 00:13:27.000 --> 00:13:37.000 And this is essential in all kinds of context climate analysis. Model predictive control automatic translation and in a variety of other settings. 00:13:37.000 --> 00:13:51.000 And people have thought about uncertainty quantification for a very long time. In classical methods might require simple model like we're just going to do linear regression in which case neural networks would have no guarantees at all or they would have very strong prior knowledge like we know exactly what the distribution underlying our data is. 00:13:51.000 --> 00:14:06.000 Which in general we we would never know. But recent efforts have focused on ideas like conformal prediction. 00:14:06.000 --> 00:14:08.000 And these tools are allowing machine learning people to assess the uncertainty of predictions with theoretical guarantees and really minimal assumptions. 00:14:08.000 --> 00:14:26.000 These can work with black box models, whatever new fancy neural network you come up with with all the bells and whistles you can still use the ideas from the toolbox. 00:14:26.000 --> 00:14:46.000 And this has had really broad impact. It's been used from contacts that range all the way from object pose estimation to biomolecular design to analyzing polls for political or for elections to even clinical medical sciences. 00:14:46.000 --> 00:14:54.000 Okay, so what final example where, we've seen foundational research really have a big impact in the machine learning context. 00:14:54.000 --> 00:15:11.000 Is, oh sorry, not final, second to last, is in allocating data collection resources. So one of the giant bottlenecks that we experience is not necessarily with the training, but just the initial collection of the data or the curating and annotating of that data. 00:15:11.000 --> 00:15:24.000 So tools like bandit algorithms, active learning, Bayesian optimization, all things that are foundational fundamental research can be used to guide data collection and labeling. 00:15:24.000 --> 00:15:35.000 And these are not just sort of theoretical algorithms that nobody uses in practice. People are actively using these kinds of ideas in industry, for instance, for ad placement. 00:15:35.000 --> 00:15:43.000 Okay, so now one final example. I want to return to this medical imaging context that we started off with. 00:15:43.000 --> 00:15:49.000 And I've got a video here that highlights how a cat scanner works, a CT scanner. 00:15:49.000 --> 00:15:57.000 So basically you lie in the scanner and it shoots x-rays through your body and on the other end you measure what's called the absorption profile. 00:15:57.000 --> 00:15:57.000 How much of that x-ray energy was absorbed for each of the different x-ray rays? 00:15:57.000 --> 00:16:06.000 And what I'm gonna show here on the right. Is that absorption profile? Each column is gonna correspond to a different absorption profile. 00:16:06.000 --> 00:16:27.000 White will mean high absorption and dark will mean low absorption. So this whole apparatus rotates around your body and we measure the absorption profile for each rotation and it forms a different column in this set of observations. 00:16:27.000 --> 00:16:32.000 So this thing on the right is what we would actually collect with a scanner. It's called sinogram. 00:16:32.000 --> 00:16:40.000 And the core next task is to map that synogram to an image that a radiologist can actually look at. 00:16:40.000 --> 00:16:48.000 So this is a difficult process. This is an ill-posed inverse problem. It's really sensitive to noise and other kinds of challenges. 00:16:48.000 --> 00:16:53.000 And so what people started asking is, well, can I somehow leverage machine learning to improve this image reconstruction process? 00:16:53.000 --> 00:17:09.000 And so initial early estimates took collections of synograms and images and some off-the-shelf neural network architecture and tried to train the network to actually perform this reconstruction. 00:17:09.000 --> 00:17:14.000 And if you just had enormous quantities of data, then this worked okay. It was a great proof of concept and spurred a lot of interest. 00:17:14.000 --> 00:17:30.000 But it also just ignores everything we know about the data collection process. Because in this kind of simple paradigm, we're forcing the network to learn 2 things. 00:17:30.000 --> 00:17:38.000 First, it has to learn something about the geometry and the structure of the images that we want to reconstruct, which is something that hopefully training data can give us insight into. 00:17:38.000 --> 00:17:48.000 But second, it's having to learn something about the relationship between that image and the synagraph, which is something that we absolutely know, right? 00:17:48.000 --> 00:17:57.000 Because we engineered our CT scanners. And so this is where, you know, foundational research really comes into bear. 00:17:57.000 --> 00:18:01.000 We say, well, can we design a neural network that reflects our knowledge of this underlying physics? And the answer is yes. 00:18:01.000 --> 00:18:09.000 And to figure out the best way to do it, we build upon foundational research in inverse problem theory. 00:18:09.000 --> 00:18:15.000 Data simulation, signal processing. Optimization, in other fields as well. So let me just give you a little bit of nuance for this one particular example. 00:18:15.000 --> 00:18:26.000 So our goal is to reconstruct an image that I'm going to call X. From a signogram that I'll call Y. 00:18:26.000 --> 00:18:35.000 And we think about why is being some function h times x plus epsilon, where h is basically telling us what the physics of this forward model is. 00:18:35.000 --> 00:18:48.000 And in a classical setting, before we bring machine learning into the mix, the way we might approach this is to search over all possible images and find one that first of all is a good fit to the data. 00:18:48.000 --> 00:18:56.000 And when we're measuring how good of a fit it is to the data, we take this forward model, this model of how the CT scanner works into account. 00:18:56.000 --> 00:19:04.000 But second, we recognize that this is really sensitive to noise and measurement error and other things. And so we also try to make sure that whatever image we come up with is a reasonable image. 00:19:04.000 --> 00:19:10.000 And classically we would say prefer to have images that are smooth, maybe sparse in some wavelet basis or something like that. 00:19:10.000 --> 00:19:27.000 Mathematicians would sit around and figure out a good value of this regularization function. And once we had this set up, we could use an optimization routine to find this good image. 00:19:27.000 --> 00:19:35.000 So we would generally alternate between 2 steps. We're in the first step. We would take our current estimate and nudge it a little bit to be closer to our observed data. 00:19:35.000 --> 00:19:49.000 And this is where we take our physical model of the CT scanner into account. And then second, we would apply some regularization stuff where we would say, I'm going to take my intermediate estimate and just clean it up a little bit. 00:19:49.000 --> 00:19:50.000 Make sure it's a little bit smoother that I've reduced some of the errors in it. 00:19:50.000 --> 00:20:01.000 And we would alternate between this and you can even represent this as a block diagram. So now we can say, all right, now I wanna use machine learning. 00:20:01.000 --> 00:20:19.000 So I've talked about Lar the naive machine learning approach. I've talked about the classical inverse problems approach and a nice hybrid turns out to be to replace this regularization step that before, you know, mathematicians and single processors we dream up with a trained component, a neural network that we can actually set the weights of. 00:20:19.000 --> 00:20:34.000 And so with what's called the deep unrolling framework, we try to set the weights of this neural network so that the final output after some number of blocks is a faithful reconstruction of the dimensions. 00:20:34.000 --> 00:20:37.000 We use our training data to set that. So first of all, this works really nicely in practice. 00:20:37.000 --> 00:20:55.000 So in the context of MRI, for instance, It allows us to get really accurate reconstructions even when we've got, say, a factor of 6 fewer measurements than we have pixels we want to reconstruct. 00:20:55.000 --> 00:21:03.000 So we're figuring out how to basically heal in the blanks. Very accurately using these machine learning tools. 00:21:03.000 --> 00:21:10.000 And not only that, but we can use the networks that we learn. In order to facilitate even faster data acquisition. 00:21:10.000 --> 00:21:17.000 So I could maybe have the number of samples be a factor of 8 smaller than the number of pixels that I want to reconstruct. 00:21:17.000 --> 00:21:39.000 But what I think is even more interesting and exciting about this is a little bit more subtle perhaps. So we had this block diagram before and we said we were going to train the neural networks within this block diagram before and we said we were going to train the neural networks. And we said we were going to train the neural networks within this block diagram. 00:21:39.000 --> 00:21:46.000 But all of these data consistency, and we said we were going to train the neural networks within this block diagram. 00:21:46.000 --> 00:21:57.000 But all of these data consistency blocks, we can think where some of the blocks are fixed and determined by the physics and the setup that we're dealing with. 00:21:57.000 --> 00:22:25.000 And some of the blocks are unknown and have weights that we can learn using training data. And so this is an example, I think, of where incorporating physical models, foundational, inverse problem theory and optimization methods has led to new ways of thinking about how to design neural network architectures that go way beyond just taking an off-the-shelf architecture and plugging it in like the early work did. 00:22:25.000 --> 00:22:32.000 Okay, so overall all of the advances that I've just described are the result of just decades of NSF. 00:22:32.000 --> 00:22:32.000 Investment and foundational research. These are not things that people just stumbled upon by running millions of experiments. 00:22:32.000 --> 00:22:46.000 They stemmed from really deep understanding of the issues at play. But like I said, that's not the end of the story. 00:22:46.000 --> 00:22:58.000 I think there still remain some really major foundational questions. That we need to be answering. It's not a matter of taking the existing tools and plugging them in and different context. 00:22:58.000 --> 00:23:02.000 So one example, for instance, addresses a problem I think we're all aware of with AI and machine learning tools. 00:23:02.000 --> 00:23:18.000 Having some really serious societal implications that are concerning biases arising or these tools being used for malicious purposes. 00:23:18.000 --> 00:23:29.000 And we really should worry about that when these tools are being incorporated into health care, deciding who gets a mortgage or other kinds of financial tools. 00:23:29.000 --> 00:23:34.000 So because of this, a number of people have said, well, we really need to design regulations. And certifications of ML algorithms. 00:23:34.000 --> 00:23:40.000 And I think this is really well intentioned, but this is something where I think foundational research is absolutely essential. 00:23:40.000 --> 00:24:01.000 And to see why, just think back to this earlier example that we talked about with privacy, right, where we developed laws without really understanding all of the technical underpinnings, without really understanding how these things might work or where they might fail. 00:24:01.000 --> 00:24:05.000 And we ended up with laws that really did not. Satisfy the spirit of what we were trying to accomplish, right? 00:24:05.000 --> 00:24:25.000 People's privacy wasn't actually preserved. And so if we want to come up with technical regulations and certification policies, we really need to understand how these methods work to ensure that these these regulations will be efficacious. 00:24:25.000 --> 00:24:34.000 In addition, many of you are probably aware that when we train these super large scale systems, it's it's really not climate-friendly. 00:24:34.000 --> 00:24:52.000 There's an enormous carbon footprint associated with training large-scale models and the data that we use to fuel these models is stored in big data warehouses that have to be kept cool with water even in states with with major water sortages. 00:24:52.000 --> 00:24:57.000 And I think there's a real desire to figure out how we can get all the power and benefits of these tools without destroying the climate. 00:24:57.000 --> 00:25:12.000 And I would argue that the mathematical statistical and computational foundations of machine learning by understanding those, it's going to help us optimize architectures and training efficiency. 00:25:12.000 --> 00:25:27.000 So we're going to be able to figure out how to compress these models or train them with with fewer steps or develop low power implementations that, become much more efficient. 00:25:27.000 --> 00:25:35.000 And these kind to kind of see how we might use foundations to address this. I want to kind of highlight some kind of broader questions that are related to this. 00:25:35.000 --> 00:25:45.000 Things like, how much data do I really need to train my model for a given problem? Is it going to be robust or what do I need to do to make sure it's going to be robust? 00:25:45.000 --> 00:25:56.000 After I train it, if I use it's going to be robust. After I train it, if I use it in a slightly different context, is still going to be robust. 00:25:56.000 --> 00:26:05.000 After I train it, if I use it in a slightly different context, still going to work, how can I make these things sustainable and do specific architectures like say transformers that underlie chat, how can I make these things sustainable? 00:26:05.000 --> 00:26:12.000 And do specific architectures like say transformers that underlie chat GPT, what's the work? How can I make these things sustainable? 00:26:12.000 --> 00:26:16.000 And do specific architectures like say transformers that underlie chat GP So to get at these kinds of questions. 00:26:16.000 --> 00:26:27.000 Some people in the community are thinking about neural networks as functions, right? The neural network takes in an input and produces an output that's a function of the input. 00:26:27.000 --> 00:26:42.000 And with that perspective, we can suddenly say, well, are there ideas from computational harmonic analysis or functional analysis or signal processing that can give us insight into how well we can estimate these functions or understand how they're going to perform in different settings. 00:26:42.000 --> 00:26:50.000 So to give you a flavor of that, for instance. What I'm showing here are 2 different functions. 00:26:50.000 --> 00:27:02.000 They're both functions in 2D, so they've got inputs X one and X 2 and the color is indicating the value of the function. 00:27:02.000 --> 00:27:08.000 And it is also some kind of hard to see little dots here. Those would be training samples. So both of these functions bid the training samples exactly. 00:27:08.000 --> 00:27:21.000 These are interpolating functions. So when we train a neural network, if we're able to train it to exactly fit our training data, which happens many, many times. 00:27:21.000 --> 00:27:29.000 It's the, I would say the most common load of operation. Then we're essentially finding an interpolating function. 00:27:29.000 --> 00:27:50.000 But as this example illustrates, there's many possible interpolating functions. So which is the one that we're trying, which is the one that we ultimately find and how to think, excuse me, how do things like our training algorithm or our choice of architecture or or other features affect what function we ultimately fit to the data. 00:27:50.000 --> 00:27:56.000 And what implications that has for things like robustness. 00:27:56.000 --> 00:28:03.000 Okay, so a little bit more sort of application oriented. I started off talking about these moonshot problems in the sciences that the NSF I think cares about very deeply. 00:28:03.000 --> 00:28:17.000 And I think that machine learning is really going to fundamentally change the way that we approach these problems. So one aspect of that is sort of obvious, right? 00:28:17.000 --> 00:28:20.000 We're gonna take data from experiments and then we're going to try to analyze that data using machine learning methods. 00:28:20.000 --> 00:28:35.000 But in addition to that, there are opportunities for machine learning to affect hypothesis generation or to affect how we design experiments or the way that we handle simulations. 00:28:35.000 --> 00:28:48.000 It's really going to affect every aspect of the scientific method. So people are exploring these ideas, but in many cases it's sort of on a kind of case by case basis. 00:28:48.000 --> 00:29:02.000 Let me think about experimental design in biology. Let me think about simulations in physics. But I really believe that there are some deep foundational questions that we can think about that span all the different application domains. 00:29:02.000 --> 00:29:14.000 That making investments in those areas and building up those foundations will be deeply impactful in just thinking about machine learning and the scientific discovery process. 00:29:14.000 --> 00:29:19.000 So those 4 kind of prevailing themes and I'm gonna elaborate on all of them are uncovering new laws of nature. 00:29:19.000 --> 00:29:29.000 AI guided scientific measurement, physics informed machine learning and advancing machine learning frontiers. So uncovering new laws of nature. 00:29:29.000 --> 00:29:43.000 What I mean by this is we want to be able to take observations of some sort of physical system and then use AI or machine learning to uncover the governing physical laws. 00:29:43.000 --> 00:29:46.000 So just as an example, here I've got a video. That's illustrating 2 particles. 00:29:46.000 --> 00:30:01.000 Moving according to these 3 equations. This is called the Lorenz, 63 system and it's like the chaotic system so you can see as time progresses these particles can be pretty far apart. 00:30:01.000 --> 00:30:09.000 And it's a nice model for things like atmospheric convection and studying various processes. 00:30:09.000 --> 00:30:25.000 So in this particular case, the data was generated by a particular system of equations. But many scientific settings, we just get observations and what we would like to be able to do is to figure out a good descriptive series of equations. 00:30:25.000 --> 00:30:34.000 And that's going to help us understand how to extrapolate our observations to to new settings in the context hopefully. 00:30:34.000 --> 00:30:45.000 So the question is can we use machine learning or AI to do this mapping? And it sounds a little bit magical, but I'm going to give you a hint at how some of these methods might be achievable. 00:30:45.000 --> 00:31:08.000 But first of all, I just want to emphasize, if we can figure out foundational tools for solving this problem, it could be enormously impactful from trying to figure out biophysical forces of cell development to solve, convinced matter and polymer physics so we can design better materials to trying to understand the dynamics of microbial communities so we can figure out for instance how to better break down plastics or even studying. 00:31:08.000 --> 00:31:25.000 Phenomena revealed by agent-based models such as how people respond to different p pandemic policies for instance. Okay, so why is this not just pie in the sky thinking? 00:31:25.000 --> 00:31:30.000 So if we return to the Lorenz, 63 example, remember there were 3 variables XY, and Z for the XY, 3D location of each particle. 00:31:30.000 --> 00:31:53.000 So we just think about how the ex location is evolving over time. We assume we don't know that equation, but we're going to write down, write it down as a weighted sum of things like the ex location, the Y location, the Z location, the ex location squared, etc. 00:31:53.000 --> 00:32:08.000 And then we're going to use our observations to estimate what these weights are. And so using sparse estimation routines, what we would uncover is that all of these blue green W's are 0 and the red W's are non-zero. 00:32:08.000 --> 00:32:19.000 And then that gives us a nice simple expression or the temporal level of the the x variable. We can do the same for the y's and the z's and the y's and the z's and uncover those dynamics and disease and uncover those dynamics. 00:32:19.000 --> 00:32:32.000 So this kind of simple idea has been the catalyst of a lot of interesting work. People have demonstrated how they can recover orbital mechanisms using these kinds of tools. 00:32:32.000 --> 00:32:40.000 Tools like AI Feynman have tried to take these ideas and generalize them in pretty exciting ways. 00:32:40.000 --> 00:32:45.000 But ultimately we want the equations to generalize them in pretty exciting ways. But ultimately, we want the equations to come out of this to be trustworthy. 00:32:45.000 --> 00:32:51.000 But ultimately, we want the equations to come out. But ultimately, we want the equations to come out out of this to be trustworthy, right? 00:32:51.000 --> 00:32:54.000 There's always room for error because the equations to come out out of this to be trustworthy, right? 00:32:54.000 --> 00:33:09.000 There's always room for error because the space of equations is enormous And so in order to do that, we really need to again invest in foundational research into how we can use AI machine learning tools to ensure that this is useful. 00:33:09.000 --> 00:33:19.000 Okay, so the second problem of how AI and machine learning are affecting science is in AI guided scientific measurement. 00:33:19.000 --> 00:33:26.000 And what I mean here is to use AI or machine learning to design better experiments, simulations and sensors. 00:33:26.000 --> 00:33:42.000 So just as an example, let's say that I want to design a microbial community for a specific purpose, say to break down plastics or as a therapeutic for somebody who's got microbiome is out of lack or something like that. 00:33:42.000 --> 00:33:49.000 So what I know is that the efficacy of the community is going to be a function of all different kinds of things. 00:33:49.000 --> 00:33:59.000 It's going to depend on what strains are present in what densities, what nutrients are present, what antimicrobial peptides are present and what the environmental conditions are. 00:33:59.000 --> 00:34:12.000 So we don't know this function. We just know that somehow it exists and we want to figure out what's the best way to set all these different variables in order to maximize the fitness. 00:34:12.000 --> 00:34:19.000 The problem is that there's just so many possibilities that we can't possibly try them out. We only have finite experimental resources. 00:34:19.000 --> 00:34:24.000 And if we were to sample at random. And the risk is that we end up evaluating a lot of different communities that are nowhere near the maximum that we're looking for. 00:34:24.000 --> 00:34:42.000 And so this is where some of the ideas that I mentioned earlier related to active learning and badment methods or even uncertainty quantification are really essential to help us guide these sequences of measurements. 00:34:42.000 --> 00:34:56.000 And so in investing in these foundational areas is kind of fueling this work. And we've seen that people are really trying to scale these ideas off different labs. 00:34:56.000 --> 00:35:14.000 Both in academia and industry are building self-driving laboratories for things like material design and even in the pharmaceutical industry when people talk about the design make tests and analyze cycle of designing new drugs, they are starting to incorporate these kinds of ideas. 00:35:14.000 --> 00:35:27.000 Okay, so the third problem here is physics informed machine learning. And here what I mean is that we want to optimally leverage physical models along with experimental or observational data. 00:35:27.000 --> 00:35:32.000 So I already kind of hinted at this when we had our discussion about CT and MRI. 00:35:32.000 --> 00:35:52.000 But really, there's just tons of settings where we have both physical knowledge as well as data. Large-scale physics experiments like photon sources or syncrons or LIGO experiments for measuring gravitational waves, things like molecular structure estimation or even climate forecasting. 00:35:52.000 --> 00:36:01.000 We have decades of foundational research in those areas that inform what's happening under the hood and can complement our data. 00:36:01.000 --> 00:36:11.000 And so the question really becomes, well, how do we leverage it? So one thing people have thought about is trying to use that physical knowledge to build large complex simulations. 00:36:11.000 --> 00:36:17.000 And then you might say, okay, I'm going to use that simulation data in order to train a machine learning model. 00:36:17.000 --> 00:36:23.000 And then hopefully that's going to give me good predictions about what's going to happen in the real world. 00:36:23.000 --> 00:36:53.000 And I think that hope is possible, but I also think that in order to make this succeed we have to understand things about distribution drift because we have to understand things about distribution drift, we have to understand things about distribution drift because the distribution drift because the distribution of my simulated data is not the distribution of real life data, transfer learning, data simulation, reduced order model Moreover, I think, broadly speaking, there's lots of areas where physical models can help inform machine learning. 00:36:59.000 --> 00:37:07.000 We've talked about leveraging simulations. There's work in areas like physics informed neural networks where people are trying to accelerate the simulations using machine learning tools. 00:37:07.000 --> 00:37:18.000 I'm going to talk a tiny bit about that later, or even upscaling molecular dynamic simulations. 00:37:18.000 --> 00:37:29.000 And I would claim that figuring out the best way to incorporate physics and form our physics information is absolutely essential to making sure that these methods are robust, that they're going to work when you have relatively little training. 00:37:29.000 --> 00:37:42.000 And really critical in the sciences, we need this physical knowledge in order to help us extrapolate the sciences. 00:37:42.000 --> 00:37:48.000 We need this physical knowledge in order to help us extrapolate beyond the domain of the training data, this physical knowledge in order to help us extrapolate beyond the domain of the training data, right? 00:37:48.000 --> 00:37:55.000 This is what science is all about. And if we don't incorporate our physical models, then we risk really getting a new understanding that's in a very limited domain. 00:37:55.000 --> 00:38:08.000 Okay, so finally I wanna talk about advancing machine learning frontiers like really fundamental machine learning and AI research areas that are broadly applicable in the sciences. 00:38:08.000 --> 00:38:22.000 So one area that's received quite a bit of attention lately is learned emulators. So for example, I've got say a large-scale climate simulation that it can only run on a supercomputer and it takes very long time. 00:38:22.000 --> 00:38:28.000 What I'd like to be able to do is to take to form a training data set using past simulations and use that to train. 00:38:28.000 --> 00:38:38.000 A machine learning model say a neural network that's going to mimic that simulator, but at a tiny fraction of the computational cost. 00:38:38.000 --> 00:38:42.000 So that then if I want to run a future simulation instead of meeting my supercomputer, maybe I can use my desktop. 00:38:42.000 --> 00:38:55.000 So this is the goal and there have been some anecdotal successes that are really exciting, including one paper that was highlighting just orders of magnitudes of acceleration using machine learning tools. 00:38:55.000 --> 00:39:08.000 And I would characterize this line of research as being an example of a broader class of problems that I would call generative models for science. 00:39:08.000 --> 00:39:16.000 So generative models include things like chat GPT for text or Dolly for language. I'm sorry for images. 00:39:16.000 --> 00:39:28.000 But we could imagine in the sciences wanting to generate new climate scenarios or wanting to generate new proteins. 00:39:28.000 --> 00:39:42.000 We cannot just take say chat GPT architecture and plug it in to sign scientific data and all of a sudden have a good, useful, scientifically rigorous generative model of the science. 00:39:42.000 --> 00:39:50.000 And there's a number of reasons for this. First of all, we just have way less scientific data than Chaty PT or Dolly. 00:39:50.000 --> 00:40:00.000 Even when we've got huge volumes of data, a lot of times that can be a small number of samples of really high dimensional data and so it's still really just not comparable. 00:40:00.000 --> 00:40:05.000 In addition, we just care about different things in the sciences, right? Like we might have rare events or chaotic dynamics. 00:40:05.000 --> 00:40:19.000 So if you're training chat GPT maybe you can ignore rare events because chat GPT is supposed to be generating kind of plausible typical language and if you fail to reproduce rare events it's not really a problem. 00:40:19.000 --> 00:40:27.000 In the sciences if I develop a generative model for per climate and it can never reproduce a hurricane. 00:40:27.000 --> 00:40:48.000 And I've got a real problem on my hands. In addition, and some of my own work on chaotic dynamics for generative models has shown how tools developed from computer vision have been able to help us, for instance, capture chaotic dynamics and make sure that these generative models have the right chaotic attractors. 00:40:48.000 --> 00:40:59.000 So again, foundational research is playing a really key role here. We also see in the sciences that there are just huge variations in scales and resolutions, right? 00:40:59.000 --> 00:41:06.000 That I play a critical role and then we just don't experience and sort of more commercial grade applications. 00:41:06.000 --> 00:41:16.000 We also have the opportunity to incorporate things like physical models, constraints and symmetries in order to somehow mitigate these challenges. 00:41:16.000 --> 00:41:25.000 So as we think about using generative models in the sciences, I thought, okay, how much can we trust these things? 00:41:25.000 --> 00:41:35.000 So I asked Chat-GBT this 3 reasons that football is safer than bad men and they produce some caveats. 00:41:35.000 --> 00:41:42.000 Totally irresponsible by any means. But then it said things like, well, there's a lot more protective gear in football by any means. 00:41:42.000 --> 00:41:48.000 But then it said things like, well, there's a lot more protective beer in football, so people are more protected gear in football, so people are more protected. 00:41:48.000 --> 00:41:54.000 And there's really strict refereeing in football. So people are more protected. And there's really strict refereeing football so people are more protected and there's really strict refereeing and and football players are much better physically fit. 00:41:54.000 --> 00:42:06.000 And football players are much better physically fit. And I guess these things are perhaps plausible, but they really are missing a core essential truth here And I think in the sciences we don't necessarily know what that core essential truth is a priori and so we can't really detect things that are maybe bacuous or faulty reasoning. 00:42:06.000 --> 00:42:19.000 And so we really need to think about how we can take generative or faulty reasoning. And so we really need to think about how we can take generative models which are trained to produce plausible results. 00:42:19.000 --> 00:42:28.000 And figure out what kind of foundational work is going to make them really trustworthy and useful in scientific contexts. 00:42:28.000 --> 00:42:35.000 Okay, uncertainty quantification we touched on earlier, but I think this is not something that's done. 00:42:35.000 --> 00:42:47.000 This is something that's ongoing and absolutely critical. And evidence of this is, for instance, Amazon developing a library for uncertainty quantification in the context of machine learning. 00:42:47.000 --> 00:42:54.000 And while this library is being posted by Amazon, it is absolutely incorporating foundational ideas being developed in academia by people supported by the NSF. 00:42:54.000 --> 00:43:04.000 And hopefully that work will continue, but this is just an illustration of the impact of that kind of work. 00:43:04.000 --> 00:43:10.000 And then finally there's areas like graphs where we can use graphs as a fundamental representation of scientific data. 00:43:10.000 --> 00:43:21.000 All the way from materials to networks to things like embody physics. And I'm not going to go into a lot of detail here, but as we also deal with human centric data. 00:43:21.000 --> 00:43:34.000 Figuring out foundational methods to ensure trustworthiness in the context of privacy, transparency, fairness, and accountability is absolutely critical. 00:43:34.000 --> 00:43:43.000 Okay, so we talked about these 4 different areas where in the sciences, in areas that the NSF really cares about, machine learning foundations are absolutely vital. 00:43:43.000 --> 00:43:56.000 This is not just focusing on individual scientific applications. This is thinking about foundational, mathematical, statistical, and computational challenges. 00:43:56.000 --> 00:43:59.000 And so it's only by building up those foundations that will be able to realize these ambitions. 00:43:59.000 --> 00:44:15.000 And not just realize them, but to make sure that the results are high quality and reproducible. And I think there's evidence that, right now we're not succeeding. 00:44:15.000 --> 00:44:30.000 There have been a number of articles that have highlighted how people are trying to use machine learning. In the context of the sciences and getting misleading results or results that are not repeatable, not reproducible. 00:44:30.000 --> 00:44:43.000 And I think it's only by really investing in the foundations. In foundational research and investing in training mechanisms that we're going to be able to to get away from this this current mode of operation. 00:44:43.000 --> 00:45:00.000 And so before I conclude here, I just wanna highlight how vital different training programs are to these endeavors to making sure that students, even students who maybe are coming from different application domains have a deep understanding of foundations and potential failure modes. 00:45:00.000 --> 00:45:19.000 And I think these are things that are difficult to get if you just are watching online webinars. And that, you know, students really benefit from being able to post questions to panels or talk to each other at poster sessions or or interact in really meaningful ways. 00:45:19.000 --> 00:45:28.000 I can't tell you how many times my students have come from workshops and come back to my group meeting and told me that some idea I had was totally not going to work. 00:45:28.000 --> 00:45:40.000 Excellent. That was time saving. And so the summer schools, workshops, and cross-disciplinary collaborations that are supported by the NSF really catalyze the groundbreaking research and accelerate workforce development. 00:45:40.000 --> 00:45:48.000 In fact, all of these pictures are from an AI and science solar school supported by the NSF. 00:45:48.000 --> 00:46:00.000 The NSF is also doing things I think are really important in terms of training. One example are institutes like the Institute for Pear and Applied to Mathematics or iPad at UCLA. 00:46:00.000 --> 00:46:08.000 When I was a graduate student I was part of one of their long programs in 2,004 made possible by the NSF. 00:46:08.000 --> 00:46:12.000 And this just had a huge impact on my career. It really shaped the way that I thought about a lot of different problems. 00:46:12.000 --> 00:46:31.000 Helped me think about important ideas for my career proposal and just had an enormous impact. And additional programs that the NSF has supported include other math institutes like Emsie. 00:46:31.000 --> 00:46:42.000 The Tripods program, which is really focused on the foundations of data science, some of the AIr Pods program, which is really focused on the foundations of data science. 00:46:42.000 --> 00:46:51.000 Some of the AI institutes have a found and even programs like the National Institute for Theory of Mathematics and Biology have a very strong foundation focus. 00:46:51.000 --> 00:47:02.000 And I think that the training opportunities that these kinds of programs provide is just absolutely essential to building up this this infrastructure and ecosystem. 00:47:02.000 --> 00:47:09.000 Okay, so, with that, there's 2 key things that I would like everybody to take away from today. 00:47:09.000 --> 00:47:23.000 The first, this is really important. I do not have brain damage. And second. Developing a applied machine learning without understanding the mathematical statistical computational foundations is basically like trying to develop biotech without understanding. 00:47:23.000 --> 00:47:29.000 Biology. That to really accelerate innovation and promote trustworthiness, we need to invest in foundations. 00:47:29.000 --> 00:47:42.000 So thank you very much and I'm more than happy to answer any questions. Thank you very much.