WEBVTT

101:15:53.000 --> 101:15:59.000
Were here when I did those. They were in Arlington at the old place for security is much easier.

101:15:59.000 --> 101:16:04.000
Oh, but just a

101:16:04.000 --> 101:16:06.000
There's some message on the topic. Maybe just a quick, you know, this is something I spent my 10 years in academia in the last 5 years in industry.

101:16:06.000 --> 101:16:18.000
One thing I learned about industry is you almost start, we start almost every talk or presentation with a bio slide.

101:16:18.000 --> 101:16:32.000
Your journey and how you got there. And so, I have a software and I even have the, the soccer bear lever cruising, which just spawned for the first time in our 100 plus year history the German Championship football slash soccer.

101:16:32.000 --> 101:16:34.000
Which is really cool so I did my graduate work at Wisconsin, and number of other things working chromosome biology.

101:16:34.000 --> 101:16:45.000
Sata genetics, but within a plant reading programs. My PhD in Cromosome Biology, Sata Genetics, but within a plant reading program.

101:16:45.000 --> 101:16:52.000
So my PhDs and plant Did a postdoc in Minnesota? I got the football or the mascot for all the places I've been up there.

101:16:52.000 --> 101:17:00.000
So as a go for 2 years. For those that haven't lived in Minneapolis is cold in the winter Very cool.

101:17:00.000 --> 101:17:15.000
And, there I worked on, Wild Rice. You got, part of my future lab that worked on rice species, and relatives of cold food rice and how you might use information from us cultivate relatives to improve rice.

101:17:15.000 --> 101:17:22.000
I got a factory position at Purdue in 2,001 started the exact same day at Cliff while I started on back late for you.

101:17:22.000 --> 101:17:33.000
I know he looks much older. Cool. And they hired me to work on sweeping. The only thing I knew about swimming I went there is that you drove down the highway in was corn and the short stuff.

101:17:33.000 --> 101:17:40.000
It's probably soybean and that is literally all I knew about smoking. And I took a chance and hired me and I spent the next.

101:17:40.000 --> 101:17:49.000
15 plus years working on soy, other legumes. And. One interesting aspect of this is the early days of sequencing.

101:17:49.000 --> 101:17:57.000
We're part of the group that helped sequence us waving. That's where it's twice into some discussions we've been having this morning around workforce training and she learned AI.

101:17:57.000 --> 101:18:02.000
Bring that to bear on under biological questions and problems. As we're generating almost genomic data.

101:18:02.000 --> 101:18:10.000
Biologists were generating it, but didn't know what to do with it. So how do you get the mathematicians and computer scientists and data scientists to have an interest in this biological problem?

101:18:10.000 --> 101:18:18.000
So we went to show the same morning curve, but 20 years ago. I went to Georgia, became a bulldog in 2011 after a year here.

101:18:18.000 --> 101:18:28.000
My first stint as a rotating program officer. And probably, I was at a place that won a national championship with their football team.

101:18:28.000 --> 101:18:34.000
And I worked on peanut and other legium size at a bear. Correct, UGA.

101:18:34.000 --> 101:18:44.000
I joined Bear 5 years ago this August and I think my bio is a little bit off. I actually the North America swinging and cotton pipelines now as of a year and half ago.

101:18:44.000 --> 101:18:57.000
So on the Rd scale, I'm more on the development side of it now. Spent a lot of time with commercial partners and growers talking about our products, what is that they need and how we're going to deliver those from a genetic perspective.

101:18:57.000 --> 101:19:03.000
So it's, my academic career in one slide, 19 years. Bunch of students and postdocs.

101:19:03.000 --> 101:19:10.000
Not all that, but my, training is in plant breeding. My passion was criminals on biology.

101:19:10.000 --> 101:19:18.000
As you start sequencing genomes genomes, Gina was being able to type DNA sequence to understand how, how they function, what the structure is.

101:19:18.000 --> 101:19:23.000
And then when my other passions was polite, which is prevalence and plants. So we did a lot of sequencing and polyplite plants.

101:19:23.000 --> 101:19:38.000
Looking at a structural pace applied by genes or the beta of genes and poly. And other aspects of, how polyplays evolve and then trying to use that information to understand how that's improve crops in a more efficient way.

101:19:38.000 --> 101:19:51.000
Hmm. And this is getting to the, to the purpose of the talk today. So talk about.

101:19:51.000 --> 101:19:52.000
Background in plant breeding, the genomics for a number of years, hired in the bear.

101:19:52.000 --> 101:20:01.000
And when I first was hard and bear, I was doing a group in R&D focused on how do we use stomach information.

101:20:01.000 --> 101:20:11.000
Where do we generate that showwork information? And how do we use it more efficiently? What tools we on top of that to make better decisions in green pipelines to get the products to our growers if they want.

101:20:11.000 --> 101:20:26.000
And, but very quickly realized the scale, the scope, the pace of everything happened industries dramatically, dramatically, dramatically, dramatically, when you think about a genetic experiment in academia, it's 3 wraps, 3 locations 3 years.

101:20:26.000 --> 101:20:37.000
We don't, we don't talk about those numbers at all. You know, we're talking 60, 80 reps a year and tens of thousands of genetic entities within those reps.

101:20:37.000 --> 101:20:49.000
And having genetic information on all those. And so what we we have is a massive pipeline pushing project of millions of hundreds of thousands of progeny through on an annual basis.

101:20:49.000 --> 101:20:55.000
In, in, in various steps of that pipeline. And if you think about breeding, basically it's a large funnel.

101:20:55.000 --> 101:21:00.000
You create a bunch of progeny. You take and think through various cycles of testing to get down with very few that you want.

101:21:00.000 --> 101:21:04.000
Yeah. That's sort of like looking for a needle in the haystack. You create a huge pile, you want to find that one winner.

101:21:04.000 --> 101:21:19.000
So that one winner. So you spend the next 10 years after you create this huge pile trying to figure out which of these Hundreds of thousands of project you created is gonna be the one that's gonna be a successful variety or our hybrid.

101:21:19.000 --> 101:21:26.000
And we generally a lot, I generate a lot of data, genotyping things along the way, sequencing things along the way, collecting human data.

101:21:26.000 --> 101:21:39.000
And you can begin to build automation and tools around that to connect things together and be able to impute genetic information, and for what the phenotype might be based on relatives and progeny, your grandparents of that entity.

101:21:39.000 --> 101:21:48.000
That's we've built a lot of resources. We've had a lot of data scientists computer scientists to help build this infrastructure, these models tie these things together.

101:21:48.000 --> 101:21:53.000
Oh, at the end of the day, we're still looking for that needle. And so we get a little bit more efficient using these things to find the needle.

101:21:53.000 --> 101:21:58.000
We're still making hundreds and hundreds and hundreds of thousands of progeny. We're genotyping them.

101:21:58.000 --> 101:22:04.000
We're testing them trying to get down to those those few needles that we want to move forward.

101:22:04.000 --> 101:22:14.000
So maybe just on this slide here, that thing that looks like a cross section of a brain is actually representation of the of our maze germplasm based on genetic information.

101:22:14.000 --> 101:22:24.000
And it looks like 2 lobes of a brand. Those are the male and female header so we create hybrids and those are the 2 pools that we read with them.

101:22:24.000 --> 101:22:34.000
So as you can imagine over the past 20 years as we built and scaled this infrastructure to try and find these needles and this massive amount of entities that we generate.

101:22:34.000 --> 101:22:44.000
We create a lot of automation to get them to collect the data we need. Everything from the genetics all the way down to how they perform the field.

101:22:44.000 --> 101:23:02.000
And so we have, we have centers where the seeds are sent, the seeds are chips, so they take a small section out of a piece of seed, the genotype that section, and then we move that seat forward either into a waste can if we don't want to plant it or we put it in a greenhouse or feel based on the genetic information that we get from that show.

101:23:02.000 --> 101:23:08.000
And this is all automated and central, lab facilities.

101:23:08.000 --> 101:23:18.000
Once we go from knowing what's of the millions of seeds that we check annually and get genetic information on to know which are the hundreds of thousands that we want to actually plant.

101:23:18.000 --> 101:23:25.000
Those get sent to a central packaging facility, which looks a lot like an Amazon warehouse. It's conveyor belt, its automation.

101:23:25.000 --> 101:23:34.000
These things come in, they get packaged into what we call the sets. The cassettes can get sent out to centers, planning centers around the world.

101:23:34.000 --> 101:23:42.000
And they're, planning the field on a conceptment. We know where every plot, every scene, you know, gene type of everything in that, in that, in that field.

101:23:42.000 --> 101:23:46.000
And we know where it is geographically And then we collect data throughout the season. So how does it perform?

101:23:46.000 --> 101:23:59.000
How's it perform and stress? How's it perform? With various disease pressures. We fly UAVs or drones to collect that. When does it flower? When does it mature?

101:23:59.000 --> 101:24:03.000
When is it setting seed? All these other things All this data. So we start with millions of project genotype, plant hundreds of thousands to start collecting.

101:24:03.000 --> 101:24:16.000
Fantastic information. And over the next 7, 8, 9 years, window those hundreds of thousands down to the 10 or 20 that removed board is commercial products.

101:24:16.000 --> 101:24:24.000
It's an expensive process. Generate lots and lots and lots of data. A lot of this is automated within large greenhouses.

101:24:24.000 --> 101:24:31.000
So the one here in Rana. So 5 or 10 acres, I can't remember. Makes 10 acres under glass, 10. Yeah.

101:24:31.000 --> 101:24:43.000
All automated. So to start cycling populations more rapidly to move the genetics out of population doing multiple cycles per year rather than one cycle per year and planting in a fuel.

101:24:43.000 --> 101:24:49.000
So we can move the Chinax or population more quickly and then move them out into the field for testing.

101:24:49.000 --> 101:24:59.000
So if you think about breeding over time, going back to domestication thousands of years ago where people are picking things that didn't, but the seeds didn't fall on the ground so we got non-shattering.

101:24:59.000 --> 101:25:17.000
Those were sort of major changes. To breeding in the early 19 hundreds we started pulling statistical models hybrid seedless first developed in 1920 1930 and a commercial in 19 fortys 19 fifties we started applying,

101:25:17.000 --> 101:25:27.000
Mark, modern harvesting tools catching yield as they came off the harvester. We started doing local markers in the nineties and really full blast in the 2 thousands.

101:25:27.000 --> 101:25:34.000
And those are sort of like evolution in how we done plant improvement. At Bayer, Monsanto.

101:25:34.000 --> 101:25:38.000
Fair Bob and Sano 5 years ago so it's fair. But they sort of breaking in a greeting 1 point O 2.1 3 point.

101:25:38.000 --> 101:25:50.000
Oh, it's just they acquired a lot of genetics and germ plasma sea companies to get the genetics get get those tools to start creating those winning varieties.

101:25:50.000 --> 101:26:00.000
Bringing 2.1 3 point are really about increasing the precision. So knowing where you're planting things, predicting where you want to plant them, placed on what they expected.

101:26:00.000 --> 101:26:09.000
Bringing 3 point, has really around the digital enablement. So all the automation around seat shipping getting genetic information on all the millions of progenies at the very beginning.

101:26:09.000 --> 101:26:20.000
To know which one, which ones you want to plant in those initial stages of testing. And what we're at the phase we're in now and this is where we're, I'm gonna take over here a minute.

101:26:20.000 --> 101:26:30.000
Is really thinking more about design. So can we flip this breeding strategy from creating millions of progeny, trying to get down to those 10 that are going to be the winners?

101:26:30.000 --> 101:26:34.000
Can we think more intentionally about how we create this populations at the beginning knowing what our growers need?

101:26:34.000 --> 101:26:40.000
And can we design the genetics more intentionally? Using modern tools all the data that we've generated over the past 10 years.

101:26:40.000 --> 101:26:45.000
To note to more to to create the the chances. And reduce the haystack to get those needles that are gonna be those winners in the growers fields.

101:26:45.000 --> 101:27:01.000
So with that, I'm gonna turn over to Ethan. Okay. Alright. Thanks for, really excited to be here.

101:27:01.000 --> 101:27:04.000
I'm written a number of, you know, kind of stuff proposals and things like that and seeing this.

101:27:04.000 --> 101:27:14.000
All over the place and having opportunity to actually do it, and so nice. Oh thanks, it's gonna help a lot.

101:27:14.000 --> 101:27:38.000
Let's see, I think we have a couple slides to push through here. I just wanna quickly acknowledge, so I get to lead an AI genomics research team right now up there and then a number of different PhD researchers who does not work on last year or 2 just wanna make sure I mentioned them Bobby out of so a little bit about myself, since, we always have these timelines

101:27:38.000 --> 101:27:40.000
and, Scott gave a little bit of background, so I'll do it as well.

101:27:40.000 --> 101:27:49.000
Even though we love the same age, mine's a lot more abbreviated, in time.

101:27:49.000 --> 101:27:58.000
And something here all fast. So, so my youth was actually in agriculture. So, so my youth was actually in agriculture.

101:27:58.000 --> 101:28:06.000
So I grew up on a, vegetable farm in agriculture. So I grew up on a, vegetable farm in Ohio and we're, so I grew up on a, vegetable farm in Ohio and we're primarily going to be a vegetable farm in Ohio and we're primarily growing.

101:28:06.000 --> 101:28:35.000
Really enjoyed it a lot, but I started to recognize that biology was for, really enjoyed it a lot, but I started to recognize that biology was for, really enjoyed it a lot, but I started to recognize that biology was for, very unpredictable, but I started to recognize biology was for, very unpredictable, complex is going all over in different directions. And it's going all over in different directions.

101:28:35.000 --> 101:28:47.000
But some of the machines that we were using and, very unpredictable, complex is going all over in different directions.

101:28:47.000 --> 101:28:57.000
But some of the machines that we were using and kind of the engineering that was around agriculture was much more then you can start designing for it. Very intentionally.

101:28:57.000 --> 101:29:05.000
And then I had the unpredictable move that got a call one day about the position at bear and whether or not I'd be interested starting to go back into these messy biological complex problems that are not so predictable.

101:29:05.000 --> 101:29:14.000
So it's been a very uncomfortable jump into the unpredictable aspect, but it's been a lot of fun.

101:29:14.000 --> 101:29:27.000
And so, one of the things about jumping into the biological domain. One of the questions that I get very often, it's consistent and I have to wrestle with every days.

101:29:27.000 --> 101:29:36.000
I'll get the question. Can you interpret, your model? Can you give us the interpretation of your model?

101:29:36.000 --> 101:29:42.000
And generally that answer today is going to be now. It's a non-linear AI model.

101:29:42.000 --> 101:29:55.000
To do an interpretation not today but that's not necessarily the purpose it's for prediction not necessarily interpretation so I'm gonna make a couple arguments about why that's particularly important here.

101:29:55.000 --> 101:30:06.000
Oh, this isn't. So, in the background of playing around in physics for a long time and being very interested in physics and calculus, I think it's interesting to look back at how physics changed in time and how that was, how it developed.

101:30:06.000 --> 101:30:27.000
So for most of history, physics was a field of philosophy. So there were 3 branches. You had physics, then you had logic and if you were to propose anything physics, you had to reason with that between ethics and logics.

101:30:27.000 --> 101:30:36.000
Logic, and a human experience. And so you were not able to propose something unless you could interpret it and explain it within all 3 parts of the field.

101:30:36.000 --> 101:30:54.000
And so this was a very qualitative over quantitative approach to how physics was described. That was for 2 millennia starting with Aristotle and the Aristotle in physics all the way up until the current Copernicus in Galloway, we're starting to change some things.

101:30:54.000 --> 101:31:05.000
There's really Newton and Leibniz when they introduced, calculus. And calculus absolutely transformed the way of physics moved forward and how things were designed.

101:31:05.000 --> 101:31:14.000
But there's a, what calculus was not seen as necessarily a golden, it wasn't perfect right off that.

101:31:14.000 --> 101:31:22.000
So. Like neural networks and AI, this lack of interpretability also play calculus when it was originally introduced.

101:31:22.000 --> 101:31:33.000
And I really like this. This quote here that calculus is often taught as if it is a pristine thing emerging at you know like completing hole from the head of.

101:31:33.000 --> 101:31:42.000
It is not. This state it took over 200 years for us to actually create the foundations of modern calculus for us to actually create the foundations of modern calculus and there was a lot of concern about how it works.

101:31:42.000 --> 101:31:48.000
So in particular, noon and Levin said, hey, here's a tool. It predicts particularly accurately.

101:31:48.000 --> 101:31:55.000
It works very well and it works very effortlessly but they couldn't articulate or explain or interpret this to the various philosophers.

101:31:55.000 --> 101:32:07.000
And physicists of the seventeenth century. And so all of people push back on this and really what Noon and Leibniz said back, well this wasn't exactly our goal.

101:32:07.000 --> 101:32:34.000
But the engineers and I'm an engineer so I really like this kind of approach that will whatever that's fine I don't we don't care necessarily about the interpretability but if we can predict accurately or predict something we can design and this is gonna be really nice and we can move forward and it was this interaction between those new designs, those new steps that engineers took that provided a lot of the data that essentially

101:32:34.000 --> 101:32:42.000
created the foundations of calculus, which took about 200 years before we had the modern practice that work with now.

101:32:42.000 --> 101:32:55.000
So I think the purpose here is to really mention that I think This is a provocative statement that interpretability is not necessarily the goal of what we're trying to do with AI, but that is the goal.

101:32:55.000 --> 101:33:04.000
And here's, I believe neural networks or AI will be the biology what calculus was.

101:33:04.000 --> 101:33:11.000
Provides us a way to start interpreting or predicting. From some input variables, some downstream outlook variables.

101:33:11.000 --> 101:33:30.000
And there's a particular reason for why neural networks, I think, are, unique and useful for, biology versus calculus with physics and because physics has a ton of classical laws and it's relatively The universe is always seeking.

101:33:30.000 --> 101:33:35.000
So it's kind of this this ball that's rolling down the entire time. It's relatively elegant.

101:33:35.000 --> 101:33:39.000
It's relatively and calculus is also very elegant. We look at biology. We don't have all these laws and I really like this.

101:33:39.000 --> 101:33:53.000
Something pulled out of the dissertation from 2022 from a Haltx student.

101:33:53.000 --> 101:34:11.000
That life or perpetuates its existence out of equilibrium. Against the will the second law and I think that aspect there against the will of what thermodynamics wants to do is why biology is the will of what thermodynamics wants to do is why biology is so complex and why we've had such a hard time understanding it's why biology is so complex and why we've had such a hard time understanding it from other tools like

101:34:11.000 --> 101:34:15.000
calculus and why we've had such a hard time understanding it from other tools like calculus because it's using it as fighting.

101:34:15.000 --> 101:34:19.000
And if you've ever seen a fight, it's never elegant. It's always something crazy that's going up this.

101:34:19.000 --> 101:34:32.000
Inclined, March. So, This complex, neural that's an AI, I think it's the right tool for us to start predicting.

101:34:32.000 --> 101:34:38.000
So now this kind of gets more into just the general motivation of why we're doing this in agriculture and such.

101:34:38.000 --> 101:34:44.000
And, we know that agriculture must adapt fast than ever. We have a number of different pressures going on.

101:34:44.000 --> 101:34:54.000
We have a massive population increase. It's going to require 60% increase in agricultural production. We have ever changing growing conditions that we have to deal with.

101:34:54.000 --> 101:35:06.000
We have a larger spreads of disease due to globalization. We need to make sure that, with regulations that we need the societal demands for how food is produced.

101:35:06.000 --> 101:35:13.000
And finally we have to do all this somehow the 60% increase and all those other constraints without blowing up the planning.

101:35:13.000 --> 101:35:19.000
And when we look at generally what we have with respect to data, that's an agriculture.

101:35:19.000 --> 101:35:32.000
I think we have a really great opportunity to start accelerating even faster about how we start designing because of all the different data sets that are popping up across the planet and the different opportunities.

101:35:32.000 --> 101:35:41.000
That we can hopefully pull from that data. That's Scott was mentioning this and I think, and so I'll go somewhat quickly here, but.

101:35:41.000 --> 101:35:46.000
When we look at agricultural data, we see an increasing in a number of different ways. So it's not only in scale, but it's in resolution and its source and type.

101:35:46.000 --> 101:35:53.000
And so this demands that we have likewise advancements in modeling capabilities. Particularly on the AI side of things.

101:35:53.000 --> 101:36:02.000
So I like this is kind of a nice example of what were the genomic resolution resolutions that you could get at scale or a resolutions that you could get at scale or a big company.

101:36:02.000 --> 101:36:22.000
Resolutions that you could get at scale or a big company. And we're very close to seeing the ability to look at scale for a big company.

101:36:22.000 --> 101:36:52.000
And we're very close to seeing the ability to look at full, full assemblies, And so that traditionally would be a problem, because we have base pairs here, we have time series weather data, we have care, categorical management approaches.

101:36:57.000 --> 101:37:06.000
We have scalar variables that, we see in the soil. All of these are very different data sources.

101:37:06.000 --> 101:37:15.000
And so AI provides a really unique, flexible opportunity that you can start. Synthesizing all these different multimodal data streams to one particular architecture.

101:37:15.000 --> 101:37:21.000
To help you design. Today I'll show up. A couple of quick examples. Just focusing on the G part.

101:37:21.000 --> 101:37:46.000
So we're just gonna focus on the genomics and what we can do of modeling genomics to a that I'll mention here some observations that say yield, height, disease resistance, and we'll be using a genotype vector of some sort or some resolution to that to that.

101:37:46.000 --> 101:37:55.000
No noise. And there's 4 pieces. Of this approach of going from genotype to phenotype that will care about.

101:37:55.000 --> 101:37:59.000
The first one is the architecture of an AI model. So an architecture that is the bones, this gives the structure, this gets most of the properties that we can expect.

101:37:59.000 --> 101:38:13.000
Out of a model will be embedded in the design of the architecture. And, we'll show a kind of a cool, well, I think it's cool.

101:38:13.000 --> 101:38:25.000
Approach where we start putting, information biologically informed components into our architecture to make it predict at of increase accuracy.

101:38:25.000 --> 101:38:29.000
So everyone has lost functions. So lost functions are the learning criteria that you can use for your, AI model.

101:38:29.000 --> 101:38:35.000
And they're very important because they define the design. The design question that you care about.

101:38:35.000 --> 101:38:45.000
And so we should make sure that our learning and our loss functions align with those. And then I'll show 2 quick, other approaches here.

101:38:45.000 --> 101:38:54.000
Active learning is an idea and AI where you're, it's very similar to genomics selection.

101:38:54.000 --> 101:39:00.000
Where you have an AI model and you have, your system and you're gonna allow them to interact with each other.

101:39:00.000 --> 101:39:13.000
So they get to talk and they get to update. And continue to progress towards some downstream goal. And then we'll say a couple of quick things about large language models and their applications right now.

101:39:13.000 --> 101:39:25.000
So jumping into the architecture. So one of the questions that we wanted to answer was could we start embedding domain knowledge into our models?

101:39:25.000 --> 101:39:40.000
And so first when we look at the left side of the data that we have at scale at Bayer, we have tens of millions of phenotypes these being in yield disease etc. and we have perhaps over a hundred 1,000 unique are at marker levels.

101:39:40.000 --> 101:39:47.000
So we have very coarse information. It might only be 10,000 base pairs or something around Lagos lines.

101:39:47.000 --> 101:39:54.000
So we're missing a lot of what's really going on. And the genotypes that we care about.

101:39:54.000 --> 101:40:02.000
Now, on the other hand, when we look at domain knowledge and things like say gene regulatory networks or gene ontology terms.

101:40:02.000 --> 101:40:11.000
These provide some really high fidelity information. Things that we clearly know or at least at this point in time, believe or particularly important.

101:40:11.000 --> 101:40:16.000
Those are really high fidelity pieces of information, but the problem is that we have very little data to call model.

101:40:16.000 --> 101:40:22.000
So if you have gene expression data, typically we might only have a couple of different genetics.

101:40:22.000 --> 101:40:26.000
So can't even make a design model for that. So he said, well, if you could combine those 2.

101:40:26.000 --> 101:40:42.000
So you could take the general structure of a neural net with all of these parameters and we can embed that domain all in the center of it and make the model have to learn to predict through this particular graph.

101:40:42.000 --> 101:40:51.000
And so Give a couple more reasons for why this we think this is a good idea not only from a biological standpoint but from a mathematical standpoint.

101:40:51.000 --> 101:41:03.000
Is that, graphs are very attractive for this. but this approach is one off the shelf AI models, which we see a lot of off the shelf AI models being used.

101:41:03.000 --> 101:41:11.000
And that's, a bit of a concern I would say. We want to be very particular of how we're using our AI models.

101:41:11.000 --> 101:41:14.000
And so we're going to be very particular of how we're using our AI models. And so we're gonna get over prioritization.

101:41:14.000 --> 101:41:18.000
Now if we build a graph, we can reduce that complexity. Now if we build a graph, we can reduce that complexity.

101:41:18.000 --> 101:41:27.000
Now if we build a graph, we can reduce that complexity substantially. The other problem with off the shelf AI models are Pretty much all AI models is that they struggle with understanding very long range interactions.

101:41:27.000 --> 101:41:36.000
So if we know that we have some gene, say, comes on one and another genome from 10, they're billions of base bears away.

101:41:36.000 --> 101:41:37.000
Yeah, I know. Generally, it's never going to be able to pick that up.

101:41:37.000 --> 101:41:41.000
It's never gonna be able to understand that. If we have a graph, we can call out those known interactions very quickly and very exclusively.

101:41:41.000 --> 101:41:54.000
And so that provides a very big map now. Advantage So, here's an example of building one of these.

101:41:54.000 --> 101:42:14.000
IoT and N's. So yeah, we call the bioinform GN and we're building this is all open source data actually so we built the graph from the genometology resource so we asked okay here are various genes that we have in the maze genome.

101:42:14.000 --> 101:42:26.000
Build a graph of all the different interactions. And then we took that graph and then we link that graph up to the marker sets that we have so that way you had base pairs but then certain distance are going to be linked to that gene.

101:42:26.000 --> 101:42:33.000
And then we put there's some that were just really far away. We didn't necessarily need to do this, but they're really far away and so we put them into their own little neural net.

101:42:33.000 --> 101:42:40.000
And this was using the genomes, the fields, data set and we were able to see somewhere around 1520% increases.

101:42:40.000 --> 101:42:48.000
And our root mean square root mean squared error. With you, the plan night in here, right?

101:42:48.000 --> 101:42:53.000
What I'm most excited about this approach is that this is organism agnostic.

101:42:53.000 --> 101:43:06.000
So there's a ton of other gene ontology, perhaps that you could build for a number of other different data sets that exist out there and start to continuously learn a through other organisms about what these graphs could look like.

101:43:06.000 --> 101:43:14.000
These graphs are not unique. There's not one silver bullet graph most likely, but you could tune these to Explicit questions that you care about.

101:43:14.000 --> 101:43:35.000
So here we cared about yield, so we kind of just have to have everything. But if we cared about something much more specific, say something like flowering time, we could build a graph that's very explicitly defined for flowering time and we don't really care about a number of other interactions perhaps.

101:43:35.000 --> 101:43:36.000
So now to the lost functions. This is gonna be the most, mathematical component of this.

101:43:36.000 --> 101:43:51.000
I'll go a little bit further through because I think we have so much time. So to talk about lost functions, which are a learning functions, the general goal of creating a lost function is that, or whenever you have any model.

101:43:51.000 --> 101:43:58.000
You want your observed values to align with your predicted values. So you want to be along this perfect prediction line.

101:43:58.000 --> 101:44:08.000
And so anything above this line is overpredicted and below this line it's under predicted and the goal is that you want to push these as close together as possible.

101:44:08.000 --> 101:44:13.000
So typically when we train a model, we'll use something like mean squared air or use mean average air.

101:44:13.000 --> 101:44:26.000
And generally, this is just going to take all the points and try to squish them. But when we look at a lot of the data that we work with and the design goal that we care about for, genomics selection and crop improvement.

101:44:26.000 --> 101:44:32.000
Is that if we look at all the data that we have. We're trying to typically, it's this is yield.

101:44:32.000 --> 101:44:43.000
We're trying to improve you. And most of our data does not sit anywhere near the upper bounds of the things that we really care about products that we want to design.

101:44:43.000 --> 101:45:04.000
So what can this lead to? What can lead to very poor tail versions reason why is because mean mean Mean that we're only going we tend to emphasize all the data points where all the data points exist and there's no If these are anti-correlated in any way to the table events, they will just spread out.

101:45:04.000 --> 101:45:11.000
And that makes that means that for what we're trying to design for, we're not gonna be very good at predicting.

101:45:11.000 --> 101:45:18.000
There's a second case of this and this one's not, I don't observe this one too often, but observe this one all the time, especially in agricultural data.

101:45:18.000 --> 101:45:31.000
And I argue this is perhaps even worse. This is compression. Where we have observed data that extends a pretty long span and our model is only able to predict over a much shorter span.

101:45:31.000 --> 101:45:37.000
So it doesn't even understand the edges. Whatsoever in both of those tales, whether that could yield.

101:45:37.000 --> 101:45:44.000
Or say disease resistance. So what we can do if we're thinking about this from a design perspective.

101:45:44.000 --> 101:45:54.000
We can actually create lost functions that target only learning about the tales or prioritize not only, but prioritized learning about the tales.

101:45:54.000 --> 101:45:59.000
Well at some other time giving up a little bit on me. So there's no free lunch or you're allowed to.

101:45:59.000 --> 101:46:05.000
Pivot yourself towards what you actually, want to design for. And so, there's some interesting work.

101:46:05.000 --> 101:46:20.000
Is comes out of the MIT, a postdoc lab that we're working with extreme events and how do you how do you tease out extreme and rare events from different systems with AI and one of the ways is to build these out.

101:46:20.000 --> 101:46:26.000
So I'm gonna jump past some of the, I had a proof. But I don't know, we'll pass up the proof.

101:46:26.000 --> 101:46:40.000
It's a very elegant This is a way to build in. No constraints. Is that?

101:46:40.000 --> 101:46:51.000
Okay, exactly. Yeah. If I have, if I'm trying the last function, I'm just trying to understand what you're telling us.

101:46:51.000 --> 101:47:18.000
I 1 option would be just to ignore half of the data and only function or focus on the data that's in the The game is somehow so that as a more important thing or they have more heavily weighted stuff that is where that data in the middle is very useful for understanding the extremes.

101:47:18.000 --> 101:47:25.000
But sometimes there's data that comes at the expense of understanding those extremes. So that's why we weight it that way.

101:47:25.000 --> 101:47:36.000
And we waited in this very particular way to make sure it's a continuous distribution. And so you're able to, in the case that everything's perfectly correlated.

101:47:36.000 --> 101:47:49.000
It still works great across the entire span. So, yeah, we still don't wanna throw away by just completely ignoring, we'd likely be missing a lot of information.

101:47:49.000 --> 101:48:02.000
So here's an example of using that for disease. And so disease is. A great goal, any problematic disease by definition means that resistance is going to be rare.

101:48:02.000 --> 101:48:05.000
It wasn't problematic and it wouldn't be rare and we wouldn't really care so much.

101:48:05.000 --> 101:48:11.000
And in all these cases we have, so resistance is over here. This is this tail, very limited data.

101:48:11.000 --> 101:48:14.000
Mostly everything here's susceptible. And if you use this standard genomics model, you get this compression effect.

101:48:14.000 --> 101:48:25.000
So you see everything being compressed to the mean. The average value and so your model is just giving you tons of average values out left and right.

101:48:25.000 --> 101:48:33.000
And so your model is just giving you tons of average values out left and right. But you can start pulling this.

101:48:33.000 --> 101:48:44.000
You can start pulling and teasing out these resistance components of the genetics by adding and teasing out these resistance components of the genetics by adding in one of these resistance components of the genetics by adding in one of these resistance components of the genetics by adding in one of these resistance components of the genetics by adding in one of these lost functions.

101:48:44.000 --> 101:48:51.000
And then it's really hard to see with the green here, but this ends up removing the compression and gives you more of a diagonal line on your predicted and observed.

101:48:51.000 --> 101:48:56.000
And so here we don't have this over I'm not. So now we're telling our model that you need to focus.

101:48:56.000 --> 101:49:13.000
Very explicitly on what makes things. Rare in this case which is poor diseases. This means that we can now move way faster when we see to focus very explicitly on what makes things, rare in this case, which is poor diseases.

101:49:13.000 --> 101:49:16.000
This means that we can now move

101:49:16.000 --> 101:49:25.000
One other part here on kind of genomics selections that's very useful for teaching that or teasing that out for genomics selection, but we can start implementing some ideas of active learning.

101:49:25.000 --> 101:49:45.000
I think many of them are probably familiar with genomic selection, but, we typically will go test some set of genetics, observe the phenotypes, we then train some model and then we try to use that model to choose the next set of genotypes to put out in the meeting.

101:49:45.000 --> 101:49:46.000
Now this has typically traditionally taken this approach where this is, this is what we call the acquisition function.

101:49:46.000 --> 101:49:54.000
It tells you which new genetics you want to put out in the field. And traditionally we just exploited.

101:49:54.000 --> 101:50:07.000
So the model says this is the best. Let's put that out there. But that doesn't allow the model to ever learn about other interesting ideas that are out there.

101:50:07.000 --> 101:50:21.000
So we need to make sure we start embedding some exploratory terms so when we put this way we're not biasing our model just to one particular solution but allowing it to search the space much more dynamically.

101:50:21.000 --> 101:50:27.000
And we've done a little bit of some analyses of various different genomic data sets and Really all this gift is trying to show is that this is a blue blah.

101:50:27.000 --> 101:50:35.000
Because all the data and the model is picking out all these red terms. To the high performance. They can do it at extremely efficient levels.

101:50:35.000 --> 101:50:47.000
It has an exploration term. And so this is perhaps maybe one tenth of the data. So there's massive accelerations that we've.

101:50:47.000 --> 101:50:57.000
Potentially see if we do appropriate exploration and active learning techniques. And then the final thing with our language models, I have to say it because everyone's doing it.

101:50:57.000 --> 101:51:09.000
So one of the things that we're interested in. There is that you have a massive genome and you need to find what are interesting regions for us to go and edit.

101:51:09.000 --> 101:51:22.000
And so we've been using some of the large language models, find out, find segments that have unmethylated regions, and can serve 9 code sequences and transcription factor binding sites.

101:51:22.000 --> 101:51:32.000
And so we use these models to try to figure out where that is and say, that's a good or high value editing target and then we go try to collect that here.

101:51:32.000 --> 101:51:34.000
Now we have a nice advantage that we have a ton of data on our very specific journalism that we want to make specific edits.

101:51:34.000 --> 101:51:44.000
So that really helps with building some of these models.

101:51:44.000 --> 101:52:10.000
And just to start wrapping up here, I talked all about genetics, but there's so much opportunity in the soil and weather and management components here as well as imaging to either image or image some of these things like weather or management practices and imaging that gives you much better higher resolution unit typing and genotypes that we have yet to even start modeling.

101:52:10.000 --> 101:52:18.000
And those will all fit very nicely and the AI architectures. I like showing this slide that we have.

101:52:18.000 --> 101:52:23.000
I didn't build this slide, but somebody didn't. Kind of nice to show the progress of what we've done.

101:52:23.000 --> 101:52:33.000
But I think even though we've come a very long way, the next steps are gonna have to go beyond the gonna be about efficiency, but about other things.

101:52:33.000 --> 101:52:39.000
How can we make sure that we, meet, the livelihoods of farmers and also the regulations and societal pressures of how foods produced and other sustainability metrics.

101:52:39.000 --> 101:52:53.000
And I think being able to synthesize all these different data. Strange is going to be very critical and going beyond the traditional, Okay.

101:52:53.000 --> 101:53:06.000
So, oh, last couple of comments here about. Where is I think maybe on the educational training side, Must that shift, to recognize the opportunity of these data-driven models.

101:53:06.000 --> 101:53:14.000
I would say that first, that AI and add requires a bit of a perspective change. And this is.

101:53:14.000 --> 101:53:30.000
That interpretability and explain ability. Which are important things that we should continue to ask questions about, but they should not undermine the capability of And sometimes we see that that something can't be, we don't move forward with it.

101:53:30.000 --> 101:53:37.000
But prediction and design we don't necessarily need interpret building exploitability at least not today. We'll give it some time.

101:53:37.000 --> 101:53:50.000
The next part is formalizing quantitative design goals and really making sure that our design goals are aligned are aligning perfectly with what we're doing with our tools that we have And that's more of an engineering perspective here of trying to teach these creative solutions.

101:53:50.000 --> 101:54:06.000
We're clear assumptions and hypothesis and boundaries that we want to operate in. And the third one is that we still need to make sure we identified problems from deep biological domain knowledge.

101:54:06.000 --> 101:54:26.000
I think one of the most interesting things over the last 2 years being at fair is The very critical conversations that I had with a lot of career biologists that have been absolutely, that they've been amazing in terms of figuring out what other problems we cancel.

101:54:26.000 --> 101:54:32.000
So this deep biological domain knowledge can't go away here in this discussion of these kind of 3 items.

101:54:32.000 --> 101:54:41.000
I'm going forward in. Maybe if I leave one last thing, this is kind of how I see it as, as this is going to be a work of arts of some sort and I think the engineering mindset really comes in building the frame.

101:54:41.000 --> 101:54:59.000
Setting the boundary conditions and the design goal. Ai is really the tool and biology provides all the different, colors and interesting components that we can use to start painting this picture.

101:54:59.000 --> 101:55:11.000
So with that, I think that was the end of what we had.

101:55:11.000 --> 101:55:12.000
Docking hand. Yeah. Thank you, Scott. Thank you, Ethan.

101:55:12.000 --> 101:55:15.000
Do we, so we now have a little bit of time for some questions. Do we have a little bit of time for some questions?

101:55:15.000 --> 101:55:24.000
So we now have a little bit of time for some questions. Do we have any questions in the room?

101:55:24.000 --> 101:55:34.000
Thanks. Do we need a microphone down here? So people can hear.

101:55:34.000 --> 101:55:45.000
Here it comes. Yeah, I hope that my phone is working.

101:55:45.000 --> 101:55:54.000
Yeah. Hi, and thanks. I'm Chris Erques. So I'm a plan physiologist in high West.

101:55:54.000 --> 101:56:04.000
So, So my question to you is, I imagine that in your data, you're looking at you and this is resistance, but you probably are, I assume my integrating data from the environment as well.

101:56:04.000 --> 101:56:14.000
And imagine that you guys have amazing sensors and you know measurements of all the differences in environmental conditions during the day, during the seasons.

101:56:14.000 --> 101:56:17.000
So how hard is it to integrate all this into you then disease resistance or just you? And is it better like with the precision that I assume you guys have?

101:56:17.000 --> 101:56:30.000
Either in greenhouses or feuds, is it better to look at things very specifically or is it better to look at?

101:56:30.000 --> 101:56:35.000
All the changes, all the complex changes in the environment. Is it? In a way better to look at all the noise at once or is it better to be very specific?

101:56:35.000 --> 101:56:48.000
So it's gonna it will depend on your design goal. So in the case of we want a germ plasma that operates really well and very select region, then we can be very specific for that.

101:56:48.000 --> 101:57:00.000
And we want this to be many broad acres, then we're no longer going for a very specific performance, but now a distribution of performances.

101:57:00.000 --> 101:57:06.000
So we want to make sure that that germ plasm is gonna operate in a number of different environments.

101:57:06.000 --> 101:57:12.000
And so that changes that, changes your design goal. And then you are going to be, you're still specific, but it's just a different set of, now you're specific over a wide range of topics.

101:57:12.000 --> 101:57:27.000
And whereas before you're now specific over a smaller range of topics. So yeah. Whenever you're training these models they have they have a finite set of.

101:57:27.000 --> 101:57:33.000
When you have your architecture and your data. There's a finite amount of learning that can be achieved.

101:57:33.000 --> 101:57:43.000
And you have to, you have to choose exactly where you want that learning to explicitly go. And so, I think it brings a lot more.

101:57:43.000 --> 101:57:54.000
To the table if you define that very, clearly. But on the concept of just more data that's coming through with environment, there is a there's a little bit of a caveat to that one.

101:57:54.000 --> 101:58:10.000
So for example, I did a lot more fluid, mechanics and PhD in postdoc and Those are really complex systems that if you look over the last 30 years of weather, the 30 years of weather is nowhere near enough weather to really understand how weather is operating.

101:58:10.000 --> 101:58:22.000
So we need a lot more data. On environmental side to be particularly accurate or high fidelity with what's going on.

101:58:22.000 --> 101:58:37.000
So, I think it's, great that we're continuously getting more information about the environment, but the total weather scenarios, we probably still have to box those in on just a little bit more.

101:58:37.000 --> 101:58:42.000
Okay.

101:58:42.000 --> 101:58:53.000
So just wondering. If you're introducing anything new into the equation along, you know, with synthetic biology, synthetic genes.

101:58:53.000 --> 101:59:07.000
That's what I said, because what occurs to me, you're bringing all this. You know 1 million dollar technologies all this information, but preceding you has been millions of years of evolution in 4,000 years of farming.

101:59:07.000 --> 101:59:16.000
Who I know I wonder if you're just using the same set of genes how much design space there is to actually move So, we'll move into.

101:59:16.000 --> 101:59:31.000
With all these you know high-tech approaches. And as you generate new I suppose not upon biologists but new types of different species.

101:59:31.000 --> 101:59:52.000
Or are you sacrificing? In terms of for example taste right because you don't have a new design space to move into Yeah, so defining that problem, we will sacrifice there's a potential that we you might have a better answer for this one.

101:59:52.000 --> 101:59:53.000
So, so yeah, if we only care about yield and that's the only thing that we're measuring and that's what the model is going after.

101:59:53.000 --> 102:00:05.000
There, it's not guaranteed that everything else goes away, but it is definitely a risk that everything goes away.

102:00:05.000 --> 102:00:19.000
Now we have a lot more typically than just yield that we're designing for. There's a number of other metrics, that exist and all those kind of go into the calculation of a multi-objective, design principle.

102:00:19.000 --> 102:00:27.000
Maybe you were getting at another, a different point there about Have we pretty much seen most of the genomic?

102:00:27.000 --> 102:00:48.000
Have we squeezed everything out of there maybe from an I don't I don't think it's true but we could say from a traditional breeding standpoint let's assume that is true I think editing just by itself and what we can do there is going to completely change that and introduce a whole new set of variations that is going to continue to move.

102:00:48.000 --> 102:00:55.000
To move the boundaries. So even if that were the case, I think the new technology is gonna do that.

102:00:55.000 --> 102:01:01.000
Welcome think that genomes are very static. Yeah, I mean they're not. Contain it evolve even within breathing programs.

102:01:01.000 --> 102:01:10.000
So you get, you know, newer combinants, get gene duplications, the genomes dynamic, transposons moving, changing how genes work.

102:01:10.000 --> 102:01:16.000
And that continues to drive the variation that they're gonna have to continue capturing these models because that continues to evolve over time.

102:01:16.000 --> 102:01:22.000
I'm reminded of a paper back in the late ninetys for my postdoc advisor who was a chief science officer at USDA for a while.

102:01:22.000 --> 102:01:29.000
There's a breeding program in Barley, Minnesota. They have the same genetics. I think 60 some years they continue to make, yield improvements.

102:01:29.000 --> 102:01:35.000
And the question was, where's that, where's it coming from? Just you model it should stop, but it keeps it keeps moving.

102:01:35.000 --> 102:01:42.000
So there's all these other processes. It's a dynamic genome. There's things happening.

102:01:42.000 --> 102:01:47.000
Hmm.

102:01:47.000 --> 102:01:55.000
So first I would just have to say, love the talks all the way through there and I'm so happy I get to dinner with you so I can pick your brain.

102:01:55.000 --> 102:02:08.000
Minute to one question is difficult but the the there was one thing in one of your slides I thought was really interesting as it combines here with what you show on this slide, which is that importance around.

102:02:08.000 --> 102:02:36.000
Computational thinking, engineering thinking and biology thinking. There's 1 more piece that I think is really you're in a really cool spot for which is genetics thinking and specifically with maze researchers because You work with folks that have to sort of plan experiments years in advance and have this really limited number of iteration cycles that don't constrain as much on the data computational thinking or the engineering thinking.

102:02:36.000 --> 102:02:45.000
But you had this number up there. 26. We have 26 more years before we hit 2,050.

102:02:45.000 --> 102:03:00.000
And and all of these dire warnings that come out there and I was just sort of wondering around how do you think about what you can do within that time frame as you start today and specifically around thinking about how you set yourself up for the most success in the future.

102:03:00.000 --> 102:03:06.000
And if you have any predictions about, you know, where you'll be in 26 years, where we'll be in 26 years, as we sort of think forward both in the technology and the programs that are in play right now.

102:03:06.000 --> 102:03:18.000
Just what are your thoughts and how you think about that? Yeah, I think got 2. It's, yeah, I'm, I'm a 26, year piece.

102:03:18.000 --> 102:03:23.000
This was, so this is one of my concerns coming to, coming to Bayer originally was, Oh, we have to deal with.

102:03:23.000 --> 102:03:31.000
I was used to experiments that we would, we had simulations that we would have the AI work with.

102:03:31.000 --> 102:03:38.000
And so would. You get results back within a couple of minutes. And now, now you go, okay, well, we're gonna.

102:03:38.000 --> 102:03:43.000
If we had 2 inbread lines ready to go and put that in the field as a hybrid, you have to wait a whole year to get that data back.

102:03:43.000 --> 102:03:44.000
Now if you want to design a new inbread and you have to cross it, you have to cross it, go through a number of different processes.

102:03:44.000 --> 102:04:01.000
Like the shortest time frame I think is 3 years to get a hybrid data point. So if we were today to do something we're not gonna get that data point in 3 years.

102:04:01.000 --> 102:04:09.000
So, so I think that's a really, it's a really interesting question and I think it, it really underlines the aspect of.

102:04:09.000 --> 102:04:24.000
We have to look really far downstream and ask the question, are we exploring enough? Because I know and this is something that Maybe the public sector will be better at because we get down to points with we have to be able to provide a certain set of that are going to be high performing and sometimes we just have to exploit.

102:04:24.000 --> 102:04:38.000
We have to take the things that we have right now. And sometimes can't look downstream. So I think that's a it's a big question of risk.

102:04:38.000 --> 102:04:41.000
That I hope we can solve, but yeah, there's a lot of things. Asking someone to make a decision 3.

102:04:41.000 --> 102:04:54.000
4, 5 years downstream. Stuff. I don't know if I answer that in any useful way, but So I generally think of, you know, if you're modeling or that.

102:04:54.000 --> 102:05:00.000
Hey, I modeling that you're mostly being able to look for predictions. Well, you don't extrapolate.

102:05:00.000 --> 102:05:06.000
You can look within your data set, but you're not really being able to predict far outside of your dataset.

102:05:06.000 --> 102:05:21.000
Is that true given the constraints and the loss functions? That you're incorporating in the models so that if I'm really looking for something new that's gonna enable me to you know do agriculture in you know 2050.

102:05:21.000 --> 102:05:29.000
Will I be able to find that or do I need to really? How do I push those models so I get that extrapolation?

102:05:29.000 --> 102:05:35.000
Yeah. Yeah, so when you do this active learning approach, the goal is to be going on the edge.

102:05:35.000 --> 102:05:41.000
You're gonna try to find the edges continuously. And so you are trying to extrapolate.

102:05:41.000 --> 102:05:50.000
And one of the things that definitely is hard for me to discuss multiple times is your model and extrapolation is going to be wrong so many times.

102:05:50.000 --> 102:06:10.000
You might be if you're working in extreme events. It's so rare that 99.5% of the time you're gonna be wrong and a lot of people don't like that answer, but it's a it's a reality of you do have to you're gonna be wrong most of the time but if you run the statistics and you run a number of different simulations you can see that being wrong is worth

102:06:10.000 --> 102:06:20.000
it because you're gaining. Understanding and data assets that are much more interesting because they're diverse and they are solving they are answering a question.

102:06:20.000 --> 102:06:36.000
In the space of genetics, essentially. So, I think that that kind of cares well with that question is that most of the time in extrapolation the models are wrong and that's a really hard discussion to have but it's very you it's a useful wrong versus.

102:06:36.000 --> 102:06:56.000
Being right and not moving anything. So. 3 years to prove that you were wrong not to do this and you have to get this to your Okay.

102:06:56.000 --> 102:07:10.000
So fantastic talk. Thank you so much. I, so I'm just, you know, you're very fortunate having the luxury of having all of these data and years and years, decades of research on maze.

102:07:10.000 --> 102:07:18.000
So given your experience of working with maids. You know, recognizing as we. As the climate continues to change.

102:07:18.000 --> 102:07:25.000
We're gonna have to bring in additional crops. You know, be they often be they New crops.

102:07:25.000 --> 102:07:40.000
What recommendations would you give to researchers who are working on some of these orphan or these new crops that we're now developing to be able to leverage AI to the best extent possible to be able to improve them as quickly as possible.

102:07:40.000 --> 102:07:59.000
Oh, I'd say maybe the first point that is on this kind of active learning question is being willing to spread out your data learning question is being willing to spread out your data and allow the model to paint the lines, is being willing to spread out your data and allow the model to paint the lines in between it.

102:07:59.000 --> 102:08:05.000
That's gonna mean that your limited resources to go work with are going to be spent as efficiently towards building, something that can predict well.

102:08:05.000 --> 102:08:26.000
So I think that's something from if you're starting from scratch and you have that opportunity. Be super if have a mindset of data is an asset that you have to take a risk analysis approach to to really get the best data and if you do that Most of the data sets that exist out there if you do a historical like kind of a historical analysis on it you only need about 5 to 10% of the data that's out

102:08:26.000 --> 102:08:34.000
there to get the accuracy and you could throw away the other 80%. So there's a ton of experiments, ton of wasted data.

102:08:34.000 --> 102:08:41.000
So taking an approach like that I think is very critical from the AI side. Okay. For some extent this is out.

102:08:41.000 --> 102:08:51.000
So obviously corn. For bears the most profitable crop. And most resources go into it. So a lot of things get developed in corn maze and then propagated.

102:08:51.000 --> 102:09:00.000
And a saw and then rice and wheat and and even into the vegetables we have a bench division that has like 70 different vegetables that they breed.

102:09:00.000 --> 102:09:08.000
So over time these things, you know, we first introduced all the genotyping seat shipping that started in corn and then made its way down to other crops.

102:09:08.000 --> 102:09:15.000
Same with all the genotyping genomic resources. And if you think about orphan crops.

102:09:15.000 --> 102:09:23.000
Cover crops and some of the cover crust and things like that which we're investing in and other companies are investing in those technologies that he's talking about are also making the way into those is.

102:09:23.000 --> 102:09:26.000
Orphan crops as well. So yeah. It's happening slowly. All right, well, thank you.

102:09:26.000 --> 102:09:41.000
It's wonderful presentation and I'm sure there'll be additional presentation and I'm sure there'll be additional opportunities to meet with NSF staff for the rest of the day.

102:09:41.000 --> 102:09:42.000
Thank you. Okay. Yep.

102:09:42.000 --> 102:09:42.000
I can