WEBVTT 101:15:53.000 --> 101:15:59.000 Were here when I did those. They were in Arlington at the old place for security is much easier. 101:15:59.000 --> 101:16:04.000 Oh, but just a 101:16:04.000 --> 101:16:06.000 There's some message on the topic. Maybe just a quick, you know, this is something I spent my 10 years in academia in the last 5 years in industry. 101:16:06.000 --> 101:16:18.000 One thing I learned about industry is you almost start, we start almost every talk or presentation with a bio slide. 101:16:18.000 --> 101:16:32.000 Your journey and how you got there. And so, I have a software and I even have the, the soccer bear lever cruising, which just spawned for the first time in our 100 plus year history the German Championship football slash soccer. 101:16:32.000 --> 101:16:34.000 Which is really cool so I did my graduate work at Wisconsin, and number of other things working chromosome biology. 101:16:34.000 --> 101:16:45.000 Sata genetics, but within a plant reading programs. My PhD in Cromosome Biology, Sata Genetics, but within a plant reading program. 101:16:45.000 --> 101:16:52.000 So my PhDs and plant Did a postdoc in Minnesota? I got the football or the mascot for all the places I've been up there. 101:16:52.000 --> 101:17:00.000 So as a go for 2 years. For those that haven't lived in Minneapolis is cold in the winter Very cool. 101:17:00.000 --> 101:17:15.000 And, there I worked on, Wild Rice. You got, part of my future lab that worked on rice species, and relatives of cold food rice and how you might use information from us cultivate relatives to improve rice. 101:17:15.000 --> 101:17:22.000 I got a factory position at Purdue in 2,001 started the exact same day at Cliff while I started on back late for you. 101:17:22.000 --> 101:17:33.000 I know he looks much older. Cool. And they hired me to work on sweeping. The only thing I knew about swimming I went there is that you drove down the highway in was corn and the short stuff. 101:17:33.000 --> 101:17:40.000 It's probably soybean and that is literally all I knew about smoking. And I took a chance and hired me and I spent the next. 101:17:40.000 --> 101:17:49.000 15 plus years working on soy, other legumes. And. One interesting aspect of this is the early days of sequencing. 101:17:49.000 --> 101:17:57.000 We're part of the group that helped sequence us waving. That's where it's twice into some discussions we've been having this morning around workforce training and she learned AI. 101:17:57.000 --> 101:18:02.000 Bring that to bear on under biological questions and problems. As we're generating almost genomic data. 101:18:02.000 --> 101:18:10.000 Biologists were generating it, but didn't know what to do with it. So how do you get the mathematicians and computer scientists and data scientists to have an interest in this biological problem? 101:18:10.000 --> 101:18:18.000 So we went to show the same morning curve, but 20 years ago. I went to Georgia, became a bulldog in 2011 after a year here. 101:18:18.000 --> 101:18:28.000 My first stint as a rotating program officer. And probably, I was at a place that won a national championship with their football team. 101:18:28.000 --> 101:18:34.000 And I worked on peanut and other legium size at a bear. Correct, UGA. 101:18:34.000 --> 101:18:44.000 I joined Bear 5 years ago this August and I think my bio is a little bit off. I actually the North America swinging and cotton pipelines now as of a year and half ago. 101:18:44.000 --> 101:18:57.000 So on the Rd scale, I'm more on the development side of it now. Spent a lot of time with commercial partners and growers talking about our products, what is that they need and how we're going to deliver those from a genetic perspective. 101:18:57.000 --> 101:19:03.000 So it's, my academic career in one slide, 19 years. Bunch of students and postdocs. 101:19:03.000 --> 101:19:10.000 Not all that, but my, training is in plant breeding. My passion was criminals on biology. 101:19:10.000 --> 101:19:18.000 As you start sequencing genomes genomes, Gina was being able to type DNA sequence to understand how, how they function, what the structure is. 101:19:18.000 --> 101:19:23.000 And then when my other passions was polite, which is prevalence and plants. So we did a lot of sequencing and polyplite plants. 101:19:23.000 --> 101:19:38.000 Looking at a structural pace applied by genes or the beta of genes and poly. And other aspects of, how polyplays evolve and then trying to use that information to understand how that's improve crops in a more efficient way. 101:19:38.000 --> 101:19:51.000 Hmm. And this is getting to the, to the purpose of the talk today. So talk about. 101:19:51.000 --> 101:19:52.000 Background in plant breeding, the genomics for a number of years, hired in the bear. 101:19:52.000 --> 101:20:01.000 And when I first was hard and bear, I was doing a group in R&D focused on how do we use stomach information. 101:20:01.000 --> 101:20:11.000 Where do we generate that showwork information? And how do we use it more efficiently? What tools we on top of that to make better decisions in green pipelines to get the products to our growers if they want. 101:20:11.000 --> 101:20:26.000 And, but very quickly realized the scale, the scope, the pace of everything happened industries dramatically, dramatically, dramatically, dramatically, when you think about a genetic experiment in academia, it's 3 wraps, 3 locations 3 years. 101:20:26.000 --> 101:20:37.000 We don't, we don't talk about those numbers at all. You know, we're talking 60, 80 reps a year and tens of thousands of genetic entities within those reps. 101:20:37.000 --> 101:20:49.000 And having genetic information on all those. And so what we we have is a massive pipeline pushing project of millions of hundreds of thousands of progeny through on an annual basis. 101:20:49.000 --> 101:20:55.000 In, in, in various steps of that pipeline. And if you think about breeding, basically it's a large funnel. 101:20:55.000 --> 101:21:00.000 You create a bunch of progeny. You take and think through various cycles of testing to get down with very few that you want. 101:21:00.000 --> 101:21:04.000 Yeah. That's sort of like looking for a needle in the haystack. You create a huge pile, you want to find that one winner. 101:21:04.000 --> 101:21:19.000 So that one winner. So you spend the next 10 years after you create this huge pile trying to figure out which of these Hundreds of thousands of project you created is gonna be the one that's gonna be a successful variety or our hybrid. 101:21:19.000 --> 101:21:26.000 And we generally a lot, I generate a lot of data, genotyping things along the way, sequencing things along the way, collecting human data. 101:21:26.000 --> 101:21:39.000 And you can begin to build automation and tools around that to connect things together and be able to impute genetic information, and for what the phenotype might be based on relatives and progeny, your grandparents of that entity. 101:21:39.000 --> 101:21:48.000 That's we've built a lot of resources. We've had a lot of data scientists computer scientists to help build this infrastructure, these models tie these things together. 101:21:48.000 --> 101:21:53.000 Oh, at the end of the day, we're still looking for that needle. And so we get a little bit more efficient using these things to find the needle. 101:21:53.000 --> 101:21:58.000 We're still making hundreds and hundreds and hundreds of thousands of progeny. We're genotyping them. 101:21:58.000 --> 101:22:04.000 We're testing them trying to get down to those those few needles that we want to move forward. 101:22:04.000 --> 101:22:14.000 So maybe just on this slide here, that thing that looks like a cross section of a brain is actually representation of the of our maze germplasm based on genetic information. 101:22:14.000 --> 101:22:24.000 And it looks like 2 lobes of a brand. Those are the male and female header so we create hybrids and those are the 2 pools that we read with them. 101:22:24.000 --> 101:22:34.000 So as you can imagine over the past 20 years as we built and scaled this infrastructure to try and find these needles and this massive amount of entities that we generate. 101:22:34.000 --> 101:22:44.000 We create a lot of automation to get them to collect the data we need. Everything from the genetics all the way down to how they perform the field. 101:22:44.000 --> 101:23:02.000 And so we have, we have centers where the seeds are sent, the seeds are chips, so they take a small section out of a piece of seed, the genotype that section, and then we move that seat forward either into a waste can if we don't want to plant it or we put it in a greenhouse or feel based on the genetic information that we get from that show. 101:23:02.000 --> 101:23:08.000 And this is all automated and central, lab facilities. 101:23:08.000 --> 101:23:18.000 Once we go from knowing what's of the millions of seeds that we check annually and get genetic information on to know which are the hundreds of thousands that we want to actually plant. 101:23:18.000 --> 101:23:25.000 Those get sent to a central packaging facility, which looks a lot like an Amazon warehouse. It's conveyor belt, its automation. 101:23:25.000 --> 101:23:34.000 These things come in, they get packaged into what we call the sets. The cassettes can get sent out to centers, planning centers around the world. 101:23:34.000 --> 101:23:42.000 And they're, planning the field on a conceptment. We know where every plot, every scene, you know, gene type of everything in that, in that, in that field. 101:23:42.000 --> 101:23:46.000 And we know where it is geographically And then we collect data throughout the season. So how does it perform? 101:23:46.000 --> 101:23:59.000 How's it perform and stress? How's it perform? With various disease pressures. We fly UAVs or drones to collect that. When does it flower? When does it mature? 101:23:59.000 --> 101:24:03.000 When is it setting seed? All these other things All this data. So we start with millions of project genotype, plant hundreds of thousands to start collecting. 101:24:03.000 --> 101:24:16.000 Fantastic information. And over the next 7, 8, 9 years, window those hundreds of thousands down to the 10 or 20 that removed board is commercial products. 101:24:16.000 --> 101:24:24.000 It's an expensive process. Generate lots and lots and lots of data. A lot of this is automated within large greenhouses. 101:24:24.000 --> 101:24:31.000 So the one here in Rana. So 5 or 10 acres, I can't remember. Makes 10 acres under glass, 10. Yeah. 101:24:31.000 --> 101:24:43.000 All automated. So to start cycling populations more rapidly to move the genetics out of population doing multiple cycles per year rather than one cycle per year and planting in a fuel. 101:24:43.000 --> 101:24:49.000 So we can move the Chinax or population more quickly and then move them out into the field for testing. 101:24:49.000 --> 101:24:59.000 So if you think about breeding over time, going back to domestication thousands of years ago where people are picking things that didn't, but the seeds didn't fall on the ground so we got non-shattering. 101:24:59.000 --> 101:25:17.000 Those were sort of major changes. To breeding in the early 19 hundreds we started pulling statistical models hybrid seedless first developed in 1920 1930 and a commercial in 19 fortys 19 fifties we started applying, 101:25:17.000 --> 101:25:27.000 Mark, modern harvesting tools catching yield as they came off the harvester. We started doing local markers in the nineties and really full blast in the 2 thousands. 101:25:27.000 --> 101:25:34.000 And those are sort of like evolution in how we done plant improvement. At Bayer, Monsanto. 101:25:34.000 --> 101:25:38.000 Fair Bob and Sano 5 years ago so it's fair. But they sort of breaking in a greeting 1 point O 2.1 3 point. 101:25:38.000 --> 101:25:50.000 Oh, it's just they acquired a lot of genetics and germ plasma sea companies to get the genetics get get those tools to start creating those winning varieties. 101:25:50.000 --> 101:26:00.000 Bringing 2.1 3 point are really about increasing the precision. So knowing where you're planting things, predicting where you want to plant them, placed on what they expected. 101:26:00.000 --> 101:26:09.000 Bringing 3 point, has really around the digital enablement. So all the automation around seat shipping getting genetic information on all the millions of progenies at the very beginning. 101:26:09.000 --> 101:26:20.000 To know which one, which ones you want to plant in those initial stages of testing. And what we're at the phase we're in now and this is where we're, I'm gonna take over here a minute. 101:26:20.000 --> 101:26:30.000 Is really thinking more about design. So can we flip this breeding strategy from creating millions of progeny, trying to get down to those 10 that are going to be the winners? 101:26:30.000 --> 101:26:34.000 Can we think more intentionally about how we create this populations at the beginning knowing what our growers need? 101:26:34.000 --> 101:26:40.000 And can we design the genetics more intentionally? Using modern tools all the data that we've generated over the past 10 years. 101:26:40.000 --> 101:26:45.000 To note to more to to create the the chances. And reduce the haystack to get those needles that are gonna be those winners in the growers fields. 101:26:45.000 --> 101:27:01.000 So with that, I'm gonna turn over to Ethan. Okay. Alright. Thanks for, really excited to be here. 101:27:01.000 --> 101:27:04.000 I'm written a number of, you know, kind of stuff proposals and things like that and seeing this. 101:27:04.000 --> 101:27:14.000 All over the place and having opportunity to actually do it, and so nice. Oh thanks, it's gonna help a lot. 101:27:14.000 --> 101:27:38.000 Let's see, I think we have a couple slides to push through here. I just wanna quickly acknowledge, so I get to lead an AI genomics research team right now up there and then a number of different PhD researchers who does not work on last year or 2 just wanna make sure I mentioned them Bobby out of so a little bit about myself, since, we always have these timelines 101:27:38.000 --> 101:27:40.000 and, Scott gave a little bit of background, so I'll do it as well. 101:27:40.000 --> 101:27:49.000 Even though we love the same age, mine's a lot more abbreviated, in time. 101:27:49.000 --> 101:27:58.000 And something here all fast. So, so my youth was actually in agriculture. So, so my youth was actually in agriculture. 101:27:58.000 --> 101:28:06.000 So I grew up on a, vegetable farm in agriculture. So I grew up on a, vegetable farm in Ohio and we're, so I grew up on a, vegetable farm in Ohio and we're primarily going to be a vegetable farm in Ohio and we're primarily growing. 101:28:06.000 --> 101:28:35.000 Really enjoyed it a lot, but I started to recognize that biology was for, really enjoyed it a lot, but I started to recognize that biology was for, really enjoyed it a lot, but I started to recognize that biology was for, very unpredictable, but I started to recognize biology was for, very unpredictable, complex is going all over in different directions. And it's going all over in different directions. 101:28:35.000 --> 101:28:47.000 But some of the machines that we were using and, very unpredictable, complex is going all over in different directions. 101:28:47.000 --> 101:28:57.000 But some of the machines that we were using and kind of the engineering that was around agriculture was much more then you can start designing for it. Very intentionally. 101:28:57.000 --> 101:29:05.000 And then I had the unpredictable move that got a call one day about the position at bear and whether or not I'd be interested starting to go back into these messy biological complex problems that are not so predictable. 101:29:05.000 --> 101:29:14.000 So it's been a very uncomfortable jump into the unpredictable aspect, but it's been a lot of fun. 101:29:14.000 --> 101:29:27.000 And so, one of the things about jumping into the biological domain. One of the questions that I get very often, it's consistent and I have to wrestle with every days. 101:29:27.000 --> 101:29:36.000 I'll get the question. Can you interpret, your model? Can you give us the interpretation of your model? 101:29:36.000 --> 101:29:42.000 And generally that answer today is going to be now. It's a non-linear AI model. 101:29:42.000 --> 101:29:55.000 To do an interpretation not today but that's not necessarily the purpose it's for prediction not necessarily interpretation so I'm gonna make a couple arguments about why that's particularly important here. 101:29:55.000 --> 101:30:06.000 Oh, this isn't. So, in the background of playing around in physics for a long time and being very interested in physics and calculus, I think it's interesting to look back at how physics changed in time and how that was, how it developed. 101:30:06.000 --> 101:30:27.000 So for most of history, physics was a field of philosophy. So there were 3 branches. You had physics, then you had logic and if you were to propose anything physics, you had to reason with that between ethics and logics. 101:30:27.000 --> 101:30:36.000 Logic, and a human experience. And so you were not able to propose something unless you could interpret it and explain it within all 3 parts of the field. 101:30:36.000 --> 101:30:54.000 And so this was a very qualitative over quantitative approach to how physics was described. That was for 2 millennia starting with Aristotle and the Aristotle in physics all the way up until the current Copernicus in Galloway, we're starting to change some things. 101:30:54.000 --> 101:31:05.000 There's really Newton and Leibniz when they introduced, calculus. And calculus absolutely transformed the way of physics moved forward and how things were designed. 101:31:05.000 --> 101:31:14.000 But there's a, what calculus was not seen as necessarily a golden, it wasn't perfect right off that. 101:31:14.000 --> 101:31:22.000 So. Like neural networks and AI, this lack of interpretability also play calculus when it was originally introduced. 101:31:22.000 --> 101:31:33.000 And I really like this. This quote here that calculus is often taught as if it is a pristine thing emerging at you know like completing hole from the head of. 101:31:33.000 --> 101:31:42.000 It is not. This state it took over 200 years for us to actually create the foundations of modern calculus for us to actually create the foundations of modern calculus and there was a lot of concern about how it works. 101:31:42.000 --> 101:31:48.000 So in particular, noon and Levin said, hey, here's a tool. It predicts particularly accurately. 101:31:48.000 --> 101:31:55.000 It works very well and it works very effortlessly but they couldn't articulate or explain or interpret this to the various philosophers. 101:31:55.000 --> 101:32:07.000 And physicists of the seventeenth century. And so all of people push back on this and really what Noon and Leibniz said back, well this wasn't exactly our goal. 101:32:07.000 --> 101:32:34.000 But the engineers and I'm an engineer so I really like this kind of approach that will whatever that's fine I don't we don't care necessarily about the interpretability but if we can predict accurately or predict something we can design and this is gonna be really nice and we can move forward and it was this interaction between those new designs, those new steps that engineers took that provided a lot of the data that essentially 101:32:34.000 --> 101:32:42.000 created the foundations of calculus, which took about 200 years before we had the modern practice that work with now. 101:32:42.000 --> 101:32:55.000 So I think the purpose here is to really mention that I think This is a provocative statement that interpretability is not necessarily the goal of what we're trying to do with AI, but that is the goal. 101:32:55.000 --> 101:33:04.000 And here's, I believe neural networks or AI will be the biology what calculus was. 101:33:04.000 --> 101:33:11.000 Provides us a way to start interpreting or predicting. From some input variables, some downstream outlook variables. 101:33:11.000 --> 101:33:30.000 And there's a particular reason for why neural networks, I think, are, unique and useful for, biology versus calculus with physics and because physics has a ton of classical laws and it's relatively The universe is always seeking. 101:33:30.000 --> 101:33:35.000 So it's kind of this this ball that's rolling down the entire time. It's relatively elegant. 101:33:35.000 --> 101:33:39.000 It's relatively and calculus is also very elegant. We look at biology. We don't have all these laws and I really like this. 101:33:39.000 --> 101:33:53.000 Something pulled out of the dissertation from 2022 from a Haltx student. 101:33:53.000 --> 101:34:11.000 That life or perpetuates its existence out of equilibrium. Against the will the second law and I think that aspect there against the will of what thermodynamics wants to do is why biology is the will of what thermodynamics wants to do is why biology is so complex and why we've had such a hard time understanding it's why biology is so complex and why we've had such a hard time understanding it from other tools like 101:34:11.000 --> 101:34:15.000 calculus and why we've had such a hard time understanding it from other tools like calculus because it's using it as fighting. 101:34:15.000 --> 101:34:19.000 And if you've ever seen a fight, it's never elegant. It's always something crazy that's going up this. 101:34:19.000 --> 101:34:32.000 Inclined, March. So, This complex, neural that's an AI, I think it's the right tool for us to start predicting. 101:34:32.000 --> 101:34:38.000 So now this kind of gets more into just the general motivation of why we're doing this in agriculture and such. 101:34:38.000 --> 101:34:44.000 And, we know that agriculture must adapt fast than ever. We have a number of different pressures going on. 101:34:44.000 --> 101:34:54.000 We have a massive population increase. It's going to require 60% increase in agricultural production. We have ever changing growing conditions that we have to deal with. 101:34:54.000 --> 101:35:06.000 We have a larger spreads of disease due to globalization. We need to make sure that, with regulations that we need the societal demands for how food is produced. 101:35:06.000 --> 101:35:13.000 And finally we have to do all this somehow the 60% increase and all those other constraints without blowing up the planning. 101:35:13.000 --> 101:35:19.000 And when we look at generally what we have with respect to data, that's an agriculture. 101:35:19.000 --> 101:35:32.000 I think we have a really great opportunity to start accelerating even faster about how we start designing because of all the different data sets that are popping up across the planet and the different opportunities. 101:35:32.000 --> 101:35:41.000 That we can hopefully pull from that data. That's Scott was mentioning this and I think, and so I'll go somewhat quickly here, but. 101:35:41.000 --> 101:35:46.000 When we look at agricultural data, we see an increasing in a number of different ways. So it's not only in scale, but it's in resolution and its source and type. 101:35:46.000 --> 101:35:53.000 And so this demands that we have likewise advancements in modeling capabilities. Particularly on the AI side of things. 101:35:53.000 --> 101:36:02.000 So I like this is kind of a nice example of what were the genomic resolution resolutions that you could get at scale or a resolutions that you could get at scale or a big company. 101:36:02.000 --> 101:36:22.000 Resolutions that you could get at scale or a big company. And we're very close to seeing the ability to look at scale for a big company. 101:36:22.000 --> 101:36:52.000 And we're very close to seeing the ability to look at full, full assemblies, And so that traditionally would be a problem, because we have base pairs here, we have time series weather data, we have care, categorical management approaches. 101:36:57.000 --> 101:37:06.000 We have scalar variables that, we see in the soil. All of these are very different data sources. 101:37:06.000 --> 101:37:15.000 And so AI provides a really unique, flexible opportunity that you can start. Synthesizing all these different multimodal data streams to one particular architecture. 101:37:15.000 --> 101:37:21.000 To help you design. Today I'll show up. A couple of quick examples. Just focusing on the G part. 101:37:21.000 --> 101:37:46.000 So we're just gonna focus on the genomics and what we can do of modeling genomics to a that I'll mention here some observations that say yield, height, disease resistance, and we'll be using a genotype vector of some sort or some resolution to that to that. 101:37:46.000 --> 101:37:55.000 No noise. And there's 4 pieces. Of this approach of going from genotype to phenotype that will care about. 101:37:55.000 --> 101:37:59.000 The first one is the architecture of an AI model. So an architecture that is the bones, this gives the structure, this gets most of the properties that we can expect. 101:37:59.000 --> 101:38:13.000 Out of a model will be embedded in the design of the architecture. And, we'll show a kind of a cool, well, I think it's cool. 101:38:13.000 --> 101:38:25.000 Approach where we start putting, information biologically informed components into our architecture to make it predict at of increase accuracy. 101:38:25.000 --> 101:38:29.000 So everyone has lost functions. So lost functions are the learning criteria that you can use for your, AI model. 101:38:29.000 --> 101:38:35.000 And they're very important because they define the design. The design question that you care about. 101:38:35.000 --> 101:38:45.000 And so we should make sure that our learning and our loss functions align with those. And then I'll show 2 quick, other approaches here. 101:38:45.000 --> 101:38:54.000 Active learning is an idea and AI where you're, it's very similar to genomics selection. 101:38:54.000 --> 101:39:00.000 Where you have an AI model and you have, your system and you're gonna allow them to interact with each other. 101:39:00.000 --> 101:39:13.000 So they get to talk and they get to update. And continue to progress towards some downstream goal. And then we'll say a couple of quick things about large language models and their applications right now. 101:39:13.000 --> 101:39:25.000 So jumping into the architecture. So one of the questions that we wanted to answer was could we start embedding domain knowledge into our models? 101:39:25.000 --> 101:39:40.000 And so first when we look at the left side of the data that we have at scale at Bayer, we have tens of millions of phenotypes these being in yield disease etc. and we have perhaps over a hundred 1,000 unique are at marker levels. 101:39:40.000 --> 101:39:47.000 So we have very coarse information. It might only be 10,000 base pairs or something around Lagos lines. 101:39:47.000 --> 101:39:54.000 So we're missing a lot of what's really going on. And the genotypes that we care about. 101:39:54.000 --> 101:40:02.000 Now, on the other hand, when we look at domain knowledge and things like say gene regulatory networks or gene ontology terms. 101:40:02.000 --> 101:40:11.000 These provide some really high fidelity information. Things that we clearly know or at least at this point in time, believe or particularly important. 101:40:11.000 --> 101:40:16.000 Those are really high fidelity pieces of information, but the problem is that we have very little data to call model. 101:40:16.000 --> 101:40:22.000 So if you have gene expression data, typically we might only have a couple of different genetics. 101:40:22.000 --> 101:40:26.000 So can't even make a design model for that. So he said, well, if you could combine those 2. 101:40:26.000 --> 101:40:42.000 So you could take the general structure of a neural net with all of these parameters and we can embed that domain all in the center of it and make the model have to learn to predict through this particular graph. 101:40:42.000 --> 101:40:51.000 And so Give a couple more reasons for why this we think this is a good idea not only from a biological standpoint but from a mathematical standpoint. 101:40:51.000 --> 101:41:03.000 Is that, graphs are very attractive for this. but this approach is one off the shelf AI models, which we see a lot of off the shelf AI models being used. 101:41:03.000 --> 101:41:11.000 And that's, a bit of a concern I would say. We want to be very particular of how we're using our AI models. 101:41:11.000 --> 101:41:14.000 And so we're going to be very particular of how we're using our AI models. And so we're gonna get over prioritization. 101:41:14.000 --> 101:41:18.000 Now if we build a graph, we can reduce that complexity. Now if we build a graph, we can reduce that complexity. 101:41:18.000 --> 101:41:27.000 Now if we build a graph, we can reduce that complexity substantially. The other problem with off the shelf AI models are Pretty much all AI models is that they struggle with understanding very long range interactions. 101:41:27.000 --> 101:41:36.000 So if we know that we have some gene, say, comes on one and another genome from 10, they're billions of base bears away. 101:41:36.000 --> 101:41:37.000 Yeah, I know. Generally, it's never going to be able to pick that up. 101:41:37.000 --> 101:41:41.000 It's never gonna be able to understand that. If we have a graph, we can call out those known interactions very quickly and very exclusively. 101:41:41.000 --> 101:41:54.000 And so that provides a very big map now. Advantage So, here's an example of building one of these. 101:41:54.000 --> 101:42:14.000 IoT and N's. So yeah, we call the bioinform GN and we're building this is all open source data actually so we built the graph from the genometology resource so we asked okay here are various genes that we have in the maze genome. 101:42:14.000 --> 101:42:26.000 Build a graph of all the different interactions. And then we took that graph and then we link that graph up to the marker sets that we have so that way you had base pairs but then certain distance are going to be linked to that gene. 101:42:26.000 --> 101:42:33.000 And then we put there's some that were just really far away. We didn't necessarily need to do this, but they're really far away and so we put them into their own little neural net. 101:42:33.000 --> 101:42:40.000 And this was using the genomes, the fields, data set and we were able to see somewhere around 1520% increases. 101:42:40.000 --> 101:42:48.000 And our root mean square root mean squared error. With you, the plan night in here, right? 101:42:48.000 --> 101:42:53.000 What I'm most excited about this approach is that this is organism agnostic. 101:42:53.000 --> 101:43:06.000 So there's a ton of other gene ontology, perhaps that you could build for a number of other different data sets that exist out there and start to continuously learn a through other organisms about what these graphs could look like. 101:43:06.000 --> 101:43:14.000 These graphs are not unique. There's not one silver bullet graph most likely, but you could tune these to Explicit questions that you care about. 101:43:14.000 --> 101:43:35.000 So here we cared about yield, so we kind of just have to have everything. But if we cared about something much more specific, say something like flowering time, we could build a graph that's very explicitly defined for flowering time and we don't really care about a number of other interactions perhaps. 101:43:35.000 --> 101:43:36.000 So now to the lost functions. This is gonna be the most, mathematical component of this. 101:43:36.000 --> 101:43:51.000 I'll go a little bit further through because I think we have so much time. So to talk about lost functions, which are a learning functions, the general goal of creating a lost function is that, or whenever you have any model. 101:43:51.000 --> 101:43:58.000 You want your observed values to align with your predicted values. So you want to be along this perfect prediction line. 101:43:58.000 --> 101:44:08.000 And so anything above this line is overpredicted and below this line it's under predicted and the goal is that you want to push these as close together as possible. 101:44:08.000 --> 101:44:13.000 So typically when we train a model, we'll use something like mean squared air or use mean average air. 101:44:13.000 --> 101:44:26.000 And generally, this is just going to take all the points and try to squish them. But when we look at a lot of the data that we work with and the design goal that we care about for, genomics selection and crop improvement. 101:44:26.000 --> 101:44:32.000 Is that if we look at all the data that we have. We're trying to typically, it's this is yield. 101:44:32.000 --> 101:44:43.000 We're trying to improve you. And most of our data does not sit anywhere near the upper bounds of the things that we really care about products that we want to design. 101:44:43.000 --> 101:45:04.000 So what can this lead to? What can lead to very poor tail versions reason why is because mean mean Mean that we're only going we tend to emphasize all the data points where all the data points exist and there's no If these are anti-correlated in any way to the table events, they will just spread out. 101:45:04.000 --> 101:45:11.000 And that makes that means that for what we're trying to design for, we're not gonna be very good at predicting. 101:45:11.000 --> 101:45:18.000 There's a second case of this and this one's not, I don't observe this one too often, but observe this one all the time, especially in agricultural data. 101:45:18.000 --> 101:45:31.000 And I argue this is perhaps even worse. This is compression. Where we have observed data that extends a pretty long span and our model is only able to predict over a much shorter span. 101:45:31.000 --> 101:45:37.000 So it doesn't even understand the edges. Whatsoever in both of those tales, whether that could yield. 101:45:37.000 --> 101:45:44.000 Or say disease resistance. So what we can do if we're thinking about this from a design perspective. 101:45:44.000 --> 101:45:54.000 We can actually create lost functions that target only learning about the tales or prioritize not only, but prioritized learning about the tales. 101:45:54.000 --> 101:45:59.000 Well at some other time giving up a little bit on me. So there's no free lunch or you're allowed to. 101:45:59.000 --> 101:46:05.000 Pivot yourself towards what you actually, want to design for. And so, there's some interesting work. 101:46:05.000 --> 101:46:20.000 Is comes out of the MIT, a postdoc lab that we're working with extreme events and how do you how do you tease out extreme and rare events from different systems with AI and one of the ways is to build these out. 101:46:20.000 --> 101:46:26.000 So I'm gonna jump past some of the, I had a proof. But I don't know, we'll pass up the proof. 101:46:26.000 --> 101:46:40.000 It's a very elegant This is a way to build in. No constraints. Is that? 101:46:40.000 --> 101:46:51.000 Okay, exactly. Yeah. If I have, if I'm trying the last function, I'm just trying to understand what you're telling us. 101:46:51.000 --> 101:47:18.000 I 1 option would be just to ignore half of the data and only function or focus on the data that's in the The game is somehow so that as a more important thing or they have more heavily weighted stuff that is where that data in the middle is very useful for understanding the extremes. 101:47:18.000 --> 101:47:25.000 But sometimes there's data that comes at the expense of understanding those extremes. So that's why we weight it that way. 101:47:25.000 --> 101:47:36.000 And we waited in this very particular way to make sure it's a continuous distribution. And so you're able to, in the case that everything's perfectly correlated. 101:47:36.000 --> 101:47:49.000 It still works great across the entire span. So, yeah, we still don't wanna throw away by just completely ignoring, we'd likely be missing a lot of information. 101:47:49.000 --> 101:48:02.000 So here's an example of using that for disease. And so disease is. A great goal, any problematic disease by definition means that resistance is going to be rare. 101:48:02.000 --> 101:48:05.000 It wasn't problematic and it wouldn't be rare and we wouldn't really care so much. 101:48:05.000 --> 101:48:11.000 And in all these cases we have, so resistance is over here. This is this tail, very limited data. 101:48:11.000 --> 101:48:14.000 Mostly everything here's susceptible. And if you use this standard genomics model, you get this compression effect. 101:48:14.000 --> 101:48:25.000 So you see everything being compressed to the mean. The average value and so your model is just giving you tons of average values out left and right. 101:48:25.000 --> 101:48:33.000 And so your model is just giving you tons of average values out left and right. But you can start pulling this. 101:48:33.000 --> 101:48:44.000 You can start pulling and teasing out these resistance components of the genetics by adding and teasing out these resistance components of the genetics by adding in one of these resistance components of the genetics by adding in one of these resistance components of the genetics by adding in one of these resistance components of the genetics by adding in one of these lost functions. 101:48:44.000 --> 101:48:51.000 And then it's really hard to see with the green here, but this ends up removing the compression and gives you more of a diagonal line on your predicted and observed. 101:48:51.000 --> 101:48:56.000 And so here we don't have this over I'm not. So now we're telling our model that you need to focus. 101:48:56.000 --> 101:49:13.000 Very explicitly on what makes things. Rare in this case which is poor diseases. This means that we can now move way faster when we see to focus very explicitly on what makes things, rare in this case, which is poor diseases. 101:49:13.000 --> 101:49:16.000 This means that we can now move 101:49:16.000 --> 101:49:25.000 One other part here on kind of genomics selections that's very useful for teaching that or teasing that out for genomics selection, but we can start implementing some ideas of active learning. 101:49:25.000 --> 101:49:45.000 I think many of them are probably familiar with genomic selection, but, we typically will go test some set of genetics, observe the phenotypes, we then train some model and then we try to use that model to choose the next set of genotypes to put out in the meeting. 101:49:45.000 --> 101:49:46.000 Now this has typically traditionally taken this approach where this is, this is what we call the acquisition function. 101:49:46.000 --> 101:49:54.000 It tells you which new genetics you want to put out in the field. And traditionally we just exploited. 101:49:54.000 --> 101:50:07.000 So the model says this is the best. Let's put that out there. But that doesn't allow the model to ever learn about other interesting ideas that are out there. 101:50:07.000 --> 101:50:21.000 So we need to make sure we start embedding some exploratory terms so when we put this way we're not biasing our model just to one particular solution but allowing it to search the space much more dynamically. 101:50:21.000 --> 101:50:27.000 And we've done a little bit of some analyses of various different genomic data sets and Really all this gift is trying to show is that this is a blue blah. 101:50:27.000 --> 101:50:35.000 Because all the data and the model is picking out all these red terms. To the high performance. They can do it at extremely efficient levels. 101:50:35.000 --> 101:50:47.000 It has an exploration term. And so this is perhaps maybe one tenth of the data. So there's massive accelerations that we've. 101:50:47.000 --> 101:50:57.000 Potentially see if we do appropriate exploration and active learning techniques. And then the final thing with our language models, I have to say it because everyone's doing it. 101:50:57.000 --> 101:51:09.000 So one of the things that we're interested in. There is that you have a massive genome and you need to find what are interesting regions for us to go and edit. 101:51:09.000 --> 101:51:22.000 And so we've been using some of the large language models, find out, find segments that have unmethylated regions, and can serve 9 code sequences and transcription factor binding sites. 101:51:22.000 --> 101:51:32.000 And so we use these models to try to figure out where that is and say, that's a good or high value editing target and then we go try to collect that here. 101:51:32.000 --> 101:51:34.000 Now we have a nice advantage that we have a ton of data on our very specific journalism that we want to make specific edits. 101:51:34.000 --> 101:51:44.000 So that really helps with building some of these models. 101:51:44.000 --> 101:52:10.000 And just to start wrapping up here, I talked all about genetics, but there's so much opportunity in the soil and weather and management components here as well as imaging to either image or image some of these things like weather or management practices and imaging that gives you much better higher resolution unit typing and genotypes that we have yet to even start modeling. 101:52:10.000 --> 101:52:18.000 And those will all fit very nicely and the AI architectures. I like showing this slide that we have. 101:52:18.000 --> 101:52:23.000 I didn't build this slide, but somebody didn't. Kind of nice to show the progress of what we've done. 101:52:23.000 --> 101:52:33.000 But I think even though we've come a very long way, the next steps are gonna have to go beyond the gonna be about efficiency, but about other things. 101:52:33.000 --> 101:52:39.000 How can we make sure that we, meet, the livelihoods of farmers and also the regulations and societal pressures of how foods produced and other sustainability metrics. 101:52:39.000 --> 101:52:53.000 And I think being able to synthesize all these different data. Strange is going to be very critical and going beyond the traditional, Okay. 101:52:53.000 --> 101:53:06.000 So, oh, last couple of comments here about. Where is I think maybe on the educational training side, Must that shift, to recognize the opportunity of these data-driven models. 101:53:06.000 --> 101:53:14.000 I would say that first, that AI and add requires a bit of a perspective change. And this is. 101:53:14.000 --> 101:53:30.000 That interpretability and explain ability. Which are important things that we should continue to ask questions about, but they should not undermine the capability of And sometimes we see that that something can't be, we don't move forward with it. 101:53:30.000 --> 101:53:37.000 But prediction and design we don't necessarily need interpret building exploitability at least not today. We'll give it some time. 101:53:37.000 --> 101:53:50.000 The next part is formalizing quantitative design goals and really making sure that our design goals are aligned are aligning perfectly with what we're doing with our tools that we have And that's more of an engineering perspective here of trying to teach these creative solutions. 101:53:50.000 --> 101:54:06.000 We're clear assumptions and hypothesis and boundaries that we want to operate in. And the third one is that we still need to make sure we identified problems from deep biological domain knowledge. 101:54:06.000 --> 101:54:26.000 I think one of the most interesting things over the last 2 years being at fair is The very critical conversations that I had with a lot of career biologists that have been absolutely, that they've been amazing in terms of figuring out what other problems we cancel. 101:54:26.000 --> 101:54:32.000 So this deep biological domain knowledge can't go away here in this discussion of these kind of 3 items. 101:54:32.000 --> 101:54:41.000 I'm going forward in. Maybe if I leave one last thing, this is kind of how I see it as, as this is going to be a work of arts of some sort and I think the engineering mindset really comes in building the frame. 101:54:41.000 --> 101:54:59.000 Setting the boundary conditions and the design goal. Ai is really the tool and biology provides all the different, colors and interesting components that we can use to start painting this picture. 101:54:59.000 --> 101:55:11.000 So with that, I think that was the end of what we had. 101:55:11.000 --> 101:55:12.000 Docking hand. Yeah. Thank you, Scott. Thank you, Ethan. 101:55:12.000 --> 101:55:15.000 Do we, so we now have a little bit of time for some questions. Do we have a little bit of time for some questions? 101:55:15.000 --> 101:55:24.000 So we now have a little bit of time for some questions. Do we have any questions in the room? 101:55:24.000 --> 101:55:34.000 Thanks. Do we need a microphone down here? So people can hear. 101:55:34.000 --> 101:55:45.000 Here it comes. Yeah, I hope that my phone is working. 101:55:45.000 --> 101:55:54.000 Yeah. Hi, and thanks. I'm Chris Erques. So I'm a plan physiologist in high West. 101:55:54.000 --> 101:56:04.000 So, So my question to you is, I imagine that in your data, you're looking at you and this is resistance, but you probably are, I assume my integrating data from the environment as well. 101:56:04.000 --> 101:56:14.000 And imagine that you guys have amazing sensors and you know measurements of all the differences in environmental conditions during the day, during the seasons. 101:56:14.000 --> 101:56:17.000 So how hard is it to integrate all this into you then disease resistance or just you? And is it better like with the precision that I assume you guys have? 101:56:17.000 --> 101:56:30.000 Either in greenhouses or feuds, is it better to look at things very specifically or is it better to look at? 101:56:30.000 --> 101:56:35.000 All the changes, all the complex changes in the environment. Is it? In a way better to look at all the noise at once or is it better to be very specific? 101:56:35.000 --> 101:56:48.000 So it's gonna it will depend on your design goal. So in the case of we want a germ plasma that operates really well and very select region, then we can be very specific for that. 101:56:48.000 --> 101:57:00.000 And we want this to be many broad acres, then we're no longer going for a very specific performance, but now a distribution of performances. 101:57:00.000 --> 101:57:06.000 So we want to make sure that that germ plasm is gonna operate in a number of different environments. 101:57:06.000 --> 101:57:12.000 And so that changes that, changes your design goal. And then you are going to be, you're still specific, but it's just a different set of, now you're specific over a wide range of topics. 101:57:12.000 --> 101:57:27.000 And whereas before you're now specific over a smaller range of topics. So yeah. Whenever you're training these models they have they have a finite set of. 101:57:27.000 --> 101:57:33.000 When you have your architecture and your data. There's a finite amount of learning that can be achieved. 101:57:33.000 --> 101:57:43.000 And you have to, you have to choose exactly where you want that learning to explicitly go. And so, I think it brings a lot more. 101:57:43.000 --> 101:57:54.000 To the table if you define that very, clearly. But on the concept of just more data that's coming through with environment, there is a there's a little bit of a caveat to that one. 101:57:54.000 --> 101:58:10.000 So for example, I did a lot more fluid, mechanics and PhD in postdoc and Those are really complex systems that if you look over the last 30 years of weather, the 30 years of weather is nowhere near enough weather to really understand how weather is operating. 101:58:10.000 --> 101:58:22.000 So we need a lot more data. On environmental side to be particularly accurate or high fidelity with what's going on. 101:58:22.000 --> 101:58:37.000 So, I think it's, great that we're continuously getting more information about the environment, but the total weather scenarios, we probably still have to box those in on just a little bit more. 101:58:37.000 --> 101:58:42.000 Okay. 101:58:42.000 --> 101:58:53.000 So just wondering. If you're introducing anything new into the equation along, you know, with synthetic biology, synthetic genes. 101:58:53.000 --> 101:59:07.000 That's what I said, because what occurs to me, you're bringing all this. You know 1 million dollar technologies all this information, but preceding you has been millions of years of evolution in 4,000 years of farming. 101:59:07.000 --> 101:59:16.000 Who I know I wonder if you're just using the same set of genes how much design space there is to actually move So, we'll move into. 101:59:16.000 --> 101:59:31.000 With all these you know high-tech approaches. And as you generate new I suppose not upon biologists but new types of different species. 101:59:31.000 --> 101:59:52.000 Or are you sacrificing? In terms of for example taste right because you don't have a new design space to move into Yeah, so defining that problem, we will sacrifice there's a potential that we you might have a better answer for this one. 101:59:52.000 --> 101:59:53.000 So, so yeah, if we only care about yield and that's the only thing that we're measuring and that's what the model is going after. 101:59:53.000 --> 102:00:05.000 There, it's not guaranteed that everything else goes away, but it is definitely a risk that everything goes away. 102:00:05.000 --> 102:00:19.000 Now we have a lot more typically than just yield that we're designing for. There's a number of other metrics, that exist and all those kind of go into the calculation of a multi-objective, design principle. 102:00:19.000 --> 102:00:27.000 Maybe you were getting at another, a different point there about Have we pretty much seen most of the genomic? 102:00:27.000 --> 102:00:48.000 Have we squeezed everything out of there maybe from an I don't I don't think it's true but we could say from a traditional breeding standpoint let's assume that is true I think editing just by itself and what we can do there is going to completely change that and introduce a whole new set of variations that is going to continue to move. 102:00:48.000 --> 102:00:55.000 To move the boundaries. So even if that were the case, I think the new technology is gonna do that. 102:00:55.000 --> 102:01:01.000 Welcome think that genomes are very static. Yeah, I mean they're not. Contain it evolve even within breathing programs. 102:01:01.000 --> 102:01:10.000 So you get, you know, newer combinants, get gene duplications, the genomes dynamic, transposons moving, changing how genes work. 102:01:10.000 --> 102:01:16.000 And that continues to drive the variation that they're gonna have to continue capturing these models because that continues to evolve over time. 102:01:16.000 --> 102:01:22.000 I'm reminded of a paper back in the late ninetys for my postdoc advisor who was a chief science officer at USDA for a while. 102:01:22.000 --> 102:01:29.000 There's a breeding program in Barley, Minnesota. They have the same genetics. I think 60 some years they continue to make, yield improvements. 102:01:29.000 --> 102:01:35.000 And the question was, where's that, where's it coming from? Just you model it should stop, but it keeps it keeps moving. 102:01:35.000 --> 102:01:42.000 So there's all these other processes. It's a dynamic genome. There's things happening. 102:01:42.000 --> 102:01:47.000 Hmm. 102:01:47.000 --> 102:01:55.000 So first I would just have to say, love the talks all the way through there and I'm so happy I get to dinner with you so I can pick your brain. 102:01:55.000 --> 102:02:08.000 Minute to one question is difficult but the the there was one thing in one of your slides I thought was really interesting as it combines here with what you show on this slide, which is that importance around. 102:02:08.000 --> 102:02:36.000 Computational thinking, engineering thinking and biology thinking. There's 1 more piece that I think is really you're in a really cool spot for which is genetics thinking and specifically with maze researchers because You work with folks that have to sort of plan experiments years in advance and have this really limited number of iteration cycles that don't constrain as much on the data computational thinking or the engineering thinking. 102:02:36.000 --> 102:02:45.000 But you had this number up there. 26. We have 26 more years before we hit 2,050. 102:02:45.000 --> 102:03:00.000 And and all of these dire warnings that come out there and I was just sort of wondering around how do you think about what you can do within that time frame as you start today and specifically around thinking about how you set yourself up for the most success in the future. 102:03:00.000 --> 102:03:06.000 And if you have any predictions about, you know, where you'll be in 26 years, where we'll be in 26 years, as we sort of think forward both in the technology and the programs that are in play right now. 102:03:06.000 --> 102:03:18.000 Just what are your thoughts and how you think about that? Yeah, I think got 2. It's, yeah, I'm, I'm a 26, year piece. 102:03:18.000 --> 102:03:23.000 This was, so this is one of my concerns coming to, coming to Bayer originally was, Oh, we have to deal with. 102:03:23.000 --> 102:03:31.000 I was used to experiments that we would, we had simulations that we would have the AI work with. 102:03:31.000 --> 102:03:38.000 And so would. You get results back within a couple of minutes. And now, now you go, okay, well, we're gonna. 102:03:38.000 --> 102:03:43.000 If we had 2 inbread lines ready to go and put that in the field as a hybrid, you have to wait a whole year to get that data back. 102:03:43.000 --> 102:03:44.000 Now if you want to design a new inbread and you have to cross it, you have to cross it, go through a number of different processes. 102:03:44.000 --> 102:04:01.000 Like the shortest time frame I think is 3 years to get a hybrid data point. So if we were today to do something we're not gonna get that data point in 3 years. 102:04:01.000 --> 102:04:09.000 So, so I think that's a really, it's a really interesting question and I think it, it really underlines the aspect of. 102:04:09.000 --> 102:04:24.000 We have to look really far downstream and ask the question, are we exploring enough? Because I know and this is something that Maybe the public sector will be better at because we get down to points with we have to be able to provide a certain set of that are going to be high performing and sometimes we just have to exploit. 102:04:24.000 --> 102:04:38.000 We have to take the things that we have right now. And sometimes can't look downstream. So I think that's a it's a big question of risk. 102:04:38.000 --> 102:04:41.000 That I hope we can solve, but yeah, there's a lot of things. Asking someone to make a decision 3. 102:04:41.000 --> 102:04:54.000 4, 5 years downstream. Stuff. I don't know if I answer that in any useful way, but So I generally think of, you know, if you're modeling or that. 102:04:54.000 --> 102:05:00.000 Hey, I modeling that you're mostly being able to look for predictions. Well, you don't extrapolate. 102:05:00.000 --> 102:05:06.000 You can look within your data set, but you're not really being able to predict far outside of your dataset. 102:05:06.000 --> 102:05:21.000 Is that true given the constraints and the loss functions? That you're incorporating in the models so that if I'm really looking for something new that's gonna enable me to you know do agriculture in you know 2050. 102:05:21.000 --> 102:05:29.000 Will I be able to find that or do I need to really? How do I push those models so I get that extrapolation? 102:05:29.000 --> 102:05:35.000 Yeah. Yeah, so when you do this active learning approach, the goal is to be going on the edge. 102:05:35.000 --> 102:05:41.000 You're gonna try to find the edges continuously. And so you are trying to extrapolate. 102:05:41.000 --> 102:05:50.000 And one of the things that definitely is hard for me to discuss multiple times is your model and extrapolation is going to be wrong so many times. 102:05:50.000 --> 102:06:10.000 You might be if you're working in extreme events. It's so rare that 99.5% of the time you're gonna be wrong and a lot of people don't like that answer, but it's a it's a reality of you do have to you're gonna be wrong most of the time but if you run the statistics and you run a number of different simulations you can see that being wrong is worth 102:06:10.000 --> 102:06:20.000 it because you're gaining. Understanding and data assets that are much more interesting because they're diverse and they are solving they are answering a question. 102:06:20.000 --> 102:06:36.000 In the space of genetics, essentially. So, I think that that kind of cares well with that question is that most of the time in extrapolation the models are wrong and that's a really hard discussion to have but it's very you it's a useful wrong versus. 102:06:36.000 --> 102:06:56.000 Being right and not moving anything. So. 3 years to prove that you were wrong not to do this and you have to get this to your Okay. 102:06:56.000 --> 102:07:10.000 So fantastic talk. Thank you so much. I, so I'm just, you know, you're very fortunate having the luxury of having all of these data and years and years, decades of research on maze. 102:07:10.000 --> 102:07:18.000 So given your experience of working with maids. You know, recognizing as we. As the climate continues to change. 102:07:18.000 --> 102:07:25.000 We're gonna have to bring in additional crops. You know, be they often be they New crops. 102:07:25.000 --> 102:07:40.000 What recommendations would you give to researchers who are working on some of these orphan or these new crops that we're now developing to be able to leverage AI to the best extent possible to be able to improve them as quickly as possible. 102:07:40.000 --> 102:07:59.000 Oh, I'd say maybe the first point that is on this kind of active learning question is being willing to spread out your data learning question is being willing to spread out your data and allow the model to paint the lines, is being willing to spread out your data and allow the model to paint the lines in between it. 102:07:59.000 --> 102:08:05.000 That's gonna mean that your limited resources to go work with are going to be spent as efficiently towards building, something that can predict well. 102:08:05.000 --> 102:08:26.000 So I think that's something from if you're starting from scratch and you have that opportunity. Be super if have a mindset of data is an asset that you have to take a risk analysis approach to to really get the best data and if you do that Most of the data sets that exist out there if you do a historical like kind of a historical analysis on it you only need about 5 to 10% of the data that's out 102:08:26.000 --> 102:08:34.000 there to get the accuracy and you could throw away the other 80%. So there's a ton of experiments, ton of wasted data. 102:08:34.000 --> 102:08:41.000 So taking an approach like that I think is very critical from the AI side. Okay. For some extent this is out. 102:08:41.000 --> 102:08:51.000 So obviously corn. For bears the most profitable crop. And most resources go into it. So a lot of things get developed in corn maze and then propagated. 102:08:51.000 --> 102:09:00.000 And a saw and then rice and wheat and and even into the vegetables we have a bench division that has like 70 different vegetables that they breed. 102:09:00.000 --> 102:09:08.000 So over time these things, you know, we first introduced all the genotyping seat shipping that started in corn and then made its way down to other crops. 102:09:08.000 --> 102:09:15.000 Same with all the genotyping genomic resources. And if you think about orphan crops. 102:09:15.000 --> 102:09:23.000 Cover crops and some of the cover crust and things like that which we're investing in and other companies are investing in those technologies that he's talking about are also making the way into those is. 102:09:23.000 --> 102:09:26.000 Orphan crops as well. So yeah. It's happening slowly. All right, well, thank you. 102:09:26.000 --> 102:09:41.000 It's wonderful presentation and I'm sure there'll be additional presentation and I'm sure there'll be additional opportunities to meet with NSF staff for the rest of the day. 102:09:41.000 --> 102:09:42.000 Thank you. Okay. Yep. 102:09:42.000 --> 102:09:42.000 I can