WEBVTT 1 00:00:11.920 --> 00:00:14.390 Okay, getting ready to start the Webinar now, 2 00:01:07.780 --> 00:01:26.769 Michael Littman: all right. It looks like maybe the numbers are leveling off a little bit. So so I think I think we'll get started. Welcome, everyone. Thanks. Thanks a lot for coming. This is the the Distinguished Lecture series for the Directorate and and Nsf. Of computer information, science and engineering. 3 00:01:26.830 --> 00:01:31.839 Michael Littman: Uh, led by Margaret Montagnosi. My name is Michael Litman. I'm. One of the 4 00:01:32.080 --> 00:01:50.389 Michael Littman: for division directors in the Directorate. My own area is in artificial intelligence and machine learning, and my directorate is called Iis Information and Intelligence Systems. And if you, if you look on the web which I just did to see what it is that my division does. 5 00:01:50.400 --> 00:02:07.909 Michael Littman: It says Information and intelligence systems, studies the interrelated roles of people, computers and information to increase the ability to understand data as well as mimic the hallmarks of intelligence in computational systems. And so i'm really excited about today's speaker, Fernando Vas 6 00:02:07.920 --> 00:02:37.890 Michael Littman: from from Harvard University. She Well, you can see from the title harnessing the power of data, visualization, from insight and storytelling to ai explanation and data art. So she is working right at that boundary between people and data, information and computation, and when we were soliciting within the division for for possible speakers, and and her name came up, the The response was overwhelming. Her work is just beloved here in our division, and i'm very, very excited to hear 7 00:02:37.900 --> 00:02:49.400 Michael Littman: she's got to say on on these topics. So Um. Oh, and I also I forgot to mention this sooner. I probably should have mentioned this different and sooner is that we'd like, if you could, to say a little bit about 8 00:02:49.410 --> 00:03:08.639 Michael Littman: your own journey, how you came to be a researcher and working on these things. This is something that we've done with all our distinguished lecture Guests and people really resonate with that. It's a great way of of helping them see themselves in this kind of work that goes on. Scientists can be, of course, very, you know, 9 00:03:08.650 --> 00:03:25.579 Michael Littman: precise and official, and but we're really all people, and I think, Fernando, I have no no question. There's no doubt that you would get into that, especially using words like data. Art. You're very much in touch with your human side. But if you can tell us a little bit about yourself, that would be that would be lovely. So, ladies and gentlemen, Ah, Fernando be, I guess, 10 00:03:25.590 --> 00:03:53.809 Fernanda Viégas: Thank you so much. Thank you for the first of all. Thanks for the invitation. Thanks for having me here. I was very much. Ah! I was very excited, and looking forward to this and Michael, i'm so glad that you asked about my career, c. Because I even have a slide about that. So I definitely do not have a traditional path to where I am at today. In fact, I would never have guessed that I would do what I do today. I am from Brazil. 11 00:03:53.820 --> 00:04:01.759 Fernanda Viégas: I grew up in Rio, and I actually, when it came time to decide what Major what university 12 00:04:02.060 --> 00:04:22.790 Fernanda Viégas: I had a huge, huge. It was a very hard time. I could not decide what I wanted to do. And in Brazil, if you change it, if you going to university for say computer science or something, and then you change your mind and you decide you want to do something else. You have to leave the university and try to get in again for another major. 13 00:04:22.800 --> 00:04:30.049 Fernanda Viégas: I did this three times, and three times I failed to stick to whatever major I was going. 14 00:04:30.060 --> 00:04:57.159 Fernanda Viégas: I was like I don't think University is for me, and then I got a scholarship to come to the Us. The main reason I came to the Us is because it could be an undecided major here. So that is the whole point of having come to the Us. I thought that was an amazing idea. I ended up going to the University of Kansas. My background is in graphic design and art history. So again, nothing to do with technology. 15 00:04:57.170 --> 00:05:03.410 Fernanda Viégas: But then I decided, when I was about to graduate from graphic design and art history. 16 00:05:03.420 --> 00:05:18.599 Fernanda Viégas: I started looking around, and I became familiar with the Media lab at Mit, and I noticed that they were welcoming people from different backgrounds, graphic design included. So that's where I ended up doing my graduate studies. 17 00:05:18.610 --> 00:05:24.060 And that's where I learned about data visualization. And to me my mind data visualization is a great, 18 00:05:24.140 --> 00:05:53.800 Fernanda Viégas: you know, connection between graphic design and computation. So I had to learn how to program. I had to. It was really hard, but I I loved the field Um! And then from two thousand and three on, I started collaborating with Martin Watenberg, who is a mathematician by training and everything you're going to see today that I'm going to talk about is work that I've done with Mark. We've been working together for almost twenty years now. So it's 19 00:05:53.810 --> 00:06:05.590 Fernanda Viégas: It's been great, all right. So now let's jump in. I want to give a lot of demos, and then I want to give a broad overview of the field of data visualization. 20 00:06:05.600 --> 00:06:19.200 Fernanda Viégas: So let's start with this little project, Martin and I did many years ago. It's called webs here. This was before we joined Google. We became very interested in the fact that when you start typing something on Google, 21 00:06:19.330 --> 00:06:36.189 Fernanda Viégas: Google gives you a bunch of suggestions to finish whatever text string you've started. So here I typed Will Brazil, and then Google gave me a bunch of suggestions. And Google does that for utilitarian reasons, it says, Oh, if you're saying, will Brazil, 22 00:06:36.200 --> 00:06:50.360 Fernanda Viégas: chances are you're going to ask, Will Brazil, when the World Cup or something like that right now, right, and it's trying to save you time. So it's trying. It's showing you the most popular endings to your string. 23 00:06:50.370 --> 00:07:10.400 Fernanda Viégas: But Martin and I took a look at this, and we thought, Oh, wait a second. This is not only utilitarian, it's also a little peek into the public Psyche right? It's what people come to Google, for so can we visualize this. So this is what we we did. We created a little visualization, and here i'm going to 24 00:07:10.410 --> 00:07:16.929 Fernanda Viégas: show you how it works, I can say. Will Brazil. Oh, 25 00:07:17.570 --> 00:07:29.540 Fernanda Viégas: if I spell it correctly, will Brazil, and then I am literally just visualizing the same stream of completions that Bobo was giving me. But now, because I am 26 00:07:29.590 --> 00:07:37.820 Fernanda Viégas: visualizing this, I can start playing little games, so I can compare the connections between Will Brazil to the 27 00:07:37.830 --> 00:07:49.619 Fernanda Viégas: will the Us. And boom. I have the places where they come together, so people are very interested. If, whether Brazil or the Us. Will win the World cup, 28 00:07:49.630 --> 00:08:18.829 Fernanda Viégas: they are also so, which is very hopeful. But then another completion that is the same between the two of them is kind of the splendid. Will Will there be a civil war in either one of these countries? Um. And you can also see how the different completions go for each one of the countries. So you start to see It's kind of like a van diagram, if you will, of these things. Um. So this is very World Cup related sometimes, but because of the moment we're in. But 29 00:08:18.840 --> 00:08:26.730 Fernanda Viégas: I can do things like Why doesn't he versus why doesn't she? 30 00:08:26.740 --> 00:08:45.750 Fernanda Viégas: And you can see what people are curious about? Will he text me? Will she love me? And so forth? I can also do things like is my son versus is my daughter, 31 00:08:45.920 --> 00:09:13.940 Fernanda Viégas: and you can see what parents are coming to Google to to ask, and this is both interesting. But to me it's also kind of a a gut punch. It's, you know people are feeling quite vulnerable. They are asking vulnerable questions to this search engine right? And I think that's that's interesting. And one of the other things that I think data visualization does that you can start to see? Hopefully, here 32 00:09:14.300 --> 00:09:15.700 Fernanda Viégas: is that 33 00:09:15.900 --> 00:09:20.980 Fernanda Viégas: to me a lot of times it makes data seem less 34 00:09:20.990 --> 00:09:39.270 Fernanda Viégas: cold and statistical and official, and more humane and kind of like. Oh, wow! Really, people are coming here with these questions, and you can start to put yourself in their shoes. So is my husband versus is my wife. 35 00:09:39.370 --> 00:09:48.289 Fernanda Viégas: Um, and you can. You know people are wonder if their partners are cheating on them, or attracted to them, or artistic, and and so forth. 36 00:09:48.300 --> 00:09:59.140 Fernanda Viégas: Don't, worry not. Everything is negative or a downer. You can also say you can also ask if coffee versus is chocolate. 37 00:09:59.300 --> 00:10:10.679 Fernanda Viégas: Two things I really love. So it's coffee good for you. It's chocolate and good for it. And what is up with the dogs? Right? It's it's interesting. Actually, let's go. There 38 00:10:10.690 --> 00:10:15.610 Fernanda Viégas: can say cats are versus 39 00:10:15.620 --> 00:10:31.789 Fernanda Viégas: dogs are, and the good news is that everybody agrees they're all the best, which is very good and positive. So I will leave on this note here. But this is a live public demo. Anyone can play with us. 40 00:10:31.800 --> 00:10:33.579 Fernanda Viégas: Um: Okay. 41 00:10:33.880 --> 00:10:48.590 Fernanda Viégas: One of the reasons why I wanted to start with. This is the fact that sometimes there is this notion that data visualization is mainly for visualizing numbers, and it and a lot of it is right, and it's very useful that way. 42 00:10:48.600 --> 00:11:04.290 Fernanda Viégas: But I think one of the things that we now have the ability to do is to visualize a lot of different kinds of data text being one of them. And so you're going to see some of that today in this talk, so. Part of the points I want to make here is that visualization 43 00:11:04.300 --> 00:11:05.890 Fernanda Viégas: is not just for numbers, 44 00:11:05.900 --> 00:11:12.749 Fernanda Viégas: and it's not just for individuals. Usually we think about one person in front of a computer. 45 00:11:12.760 --> 00:11:32.499 Fernanda Viégas: It's it's not like that. Actually, visualization tends to be a very social thing, and it's not just for experts. For the longest time. Visualizations were built by experts for experts, and I think today we're moving beyond this, which is very exciting, because it helps with things like 46 00:11:33.160 --> 00:11:41.520 Fernanda Viégas: number, literacy and statistics, statistics, literacy for society at large. So we'll talk a little bit about that. 47 00:11:42.330 --> 00:12:11.780 Fernanda Viégas: I also want to take a step back and say, Ah, a lot of what you're going to see here today, and a lot of what we do in data visualization we've been doing for a while, Truth is, and so like. We've been mapping things for a very, very long time. So like these are some of the earliest maps we have, you know. Ah, notice of the the one on the on the left here. So, being a Babylonian map carved the one on the right. Very interesting is a Ptolemaic 48 00:12:11.790 --> 00:12:35.180 Fernanda Viégas: world map from the second century. And the really cool thing about this map is that it's the first time we see the use of longitude and latitude. So they came up with this concept. They came up with this standard of mapping things around the globe, and we use it to this day. And I think these are really interesting, you know. First first offs, 49 00:12:35.190 --> 00:12:37.350 Fernanda Viégas: when we move away from like 50 00:12:37.490 --> 00:12:43.789 Fernanda Viégas: things that have shapes. So, for instance, if I go back to the maps. These things have shapes in the physical world 51 00:12:43.800 --> 00:12:48.940 Fernanda Viégas: around us, right when we move from, though from those things to the 52 00:12:48.950 --> 00:13:05.259 Fernanda Viégas: information, abstract information that really doesn't have a shape. How can we visualize those things that are invisible? And this is where people like William Playfair comes in. So this is literally one of the first. He invented the line chart, 53 00:13:05.270 --> 00:13:25.449 Fernanda Viégas: and he also invented the pie chart and the bar chart. I mean, how crazy is that! How amazing is that that the same person invented all of these techniques to visualize data. We have someone like Florence Nightingale who also invented it. She invented this this kind of we call it a rose diagram. And 54 00:13:25.460 --> 00:13:39.560 Fernanda Viégas: to me the really interesting thing here is that she was both a nurse and a statistician, and she used charts for activism. The whole point for creating this chart you see on the left, which, by the way, 55 00:13:39.570 --> 00:13:57.160 Fernanda Viégas: shows the death of soldiers in the Crimea and Crimian Crimea and War Um. She was trying to make the point that by far most soldiers were dying of wounds in the hospital, not in the battlefield. 56 00:13:57.240 --> 00:14:14.219 Fernanda Viégas: And And so all of the blue. You see, there are deaths at the hospital, and the red are deaths in the battlefield, and you can see just how much more. And so the point that she was trying to make is, we need to change the way our hospitals work. And so this chart 57 00:14:14.230 --> 00:14:39.859 Fernanda Viégas: um convinced the British Parliament to enact sanitation reforms in hospitals, and thank you for those we have. We do things today to this day? Um, in better ways, because uh, because of the kind of data and visualization she created which is wonderful. Um, um, Webb du boy. So he was really interested in visualizing the 58 00:14:39.870 --> 00:14:56.899 Fernanda Viégas: condition of uh black Americans after slavery, he was saying, Look, we're not slaves anymore. But this is not working people. And so he created a whole series. He and a team of sociologists created a whole series of charts that are quite 59 00:14:56.910 --> 00:15:13.430 Fernanda Viégas: aesthetically striking about the condition. Um, in society of black Americans so very influential, So all of this is to say, this is where a lot of the work that you're going to see today comes from. It has a lot of historical context. 60 00:15:13.440 --> 00:15:32.389 Fernanda Viégas: Okay, So not just for numbers. Let's go back to that idea that I talked about in the beginning. Um, One of the things that you may think about when you, when you think about visualizing text immediately, are like tag clouds or word clouds, and and those are fine, but they take away a lot of the context 61 00:15:32.400 --> 00:15:51.829 Fernanda Viégas: of what is being said. So one of the things that Martin and I were interested in. How can we visualize trends in text, but still keep context available. So we created the word treat. Imagine you have something like Romeo and Julia the entire play, and you do a search for a string, like, if love. 62 00:15:51.910 --> 00:16:11.880 Fernanda Viégas: What I want to do after this is find all the completion. So it's kind of like the Google thing, all the completions that come after, if love. So if love be rough with you, or if you love to be blind, all of these things. Now I want to visualize those completion. So i'm going to do kind of a tree. I'm going. It's kind of like a Suffolk street, 63 00:16:11.890 --> 00:16:34.099 Fernanda Viégas: and I will show you the big trends, but also the context. I will show you all the sentences at once. And so, for instance, I have a dream. The speech by Martin Luther King. We can visualize it as a tree to understand all the different places in the speech that that phrase comes up. 64 00:16:34.110 --> 00:16:44.780 Fernanda Viégas: Um! This is a visualization of the Bible. And so what are all of the places in the Bible where love shows up? 65 00:16:44.790 --> 00:17:07.059 Fernanda Viégas: Um, and I will show you a very quick little demo. This is. This was a word tree iteration done by Jason Davies, based on the algorithm that Martin and I created and it's available online. Anyone can play with this, and we're going to look at very quickly at Steve Jobs commencement speech at Stanford. 66 00:17:07.069 --> 00:17:15.830 Fernanda Viégas: So here, what i'm doing is, and you can see that this is interactive. I have looked for the string life, 67 00:17:15.890 --> 00:17:21.769 Fernanda Viégas: but one of the things I've done is to reverse the treat. I want everything that ends in life, 68 00:17:21.960 --> 00:17:36.810 Fernanda Viégas: and I can now click on, for instance, my life. And now i'm zooming in into that part, that branch of the tree, and I can keep doing this. I am going to reverse this and say just my 69 00:17:37.010 --> 00:18:06.639 Fernanda Viégas: and you can see all the places in his Commencement speech where the word mine comes up, and he's he's talking about his life. He's talking about his parents, his biological mother. He's even talking about his cancer um right. And so there are one of the things that to me is really powerful about this is that it gives you a very fast way of navigating a large body of text without again losing the context of what is being said. 70 00:18:07.010 --> 00:18:15.620 Fernanda Viégas: So again, things, you know, ideas around visualizing hard data sets like like text. 71 00:18:15.630 --> 00:18:18.030 Fernanda Viégas: Another way of visualizing 72 00:18:18.050 --> 00:18:20.730 Fernanda Viégas: activity around text is 73 00:18:20.800 --> 00:18:35.970 Fernanda Viégas: a project Martin and I did back in two thousand and three. In fact, this is the very first project I ever worked on with Martin, and this is what it is. We were interested back then. Nobody understood Wikis 74 00:18:35.980 --> 00:18:43.400 Fernanda Viégas: and everybody was skeptical of Wikipedia. They're like. How can this work? This doesn't make any sense. Anyone can edit this thing. 75 00:18:43.410 --> 00:18:50.250 Fernanda Viégas: And so we got very interested in understanding the dynamics. How are people collaborating around Wikipedia articles? 76 00:18:50.320 --> 00:18:59.160 Fernanda Viégas: This is a vintage screenshot of an article from two thousand and three that happens to be on chocolate again. One of the things I really love, 77 00:18:59.300 --> 00:19:02.689 Fernanda Viégas: and we learned that behind each article 78 00:19:02.700 --> 00:19:22.579 Fernanda Viégas: you have kind of an activity, a history log of all the Times. That article had been edited, and we decided, this is what we want to visualize. We want to visualize. It has timestamps, it has. Who did what? And so this is the data we decided to visualize. We created a technique to visualize this. 79 00:19:22.590 --> 00:19:33.999 Fernanda Viégas: So imagine you have three people who are going to work together on an article Mary Suzanne and Martin and I'm going to give a different color for each one of those people, 80 00:19:34.010 --> 00:19:45.439 Fernanda Viégas: and I'm. Going to color each version by whoever was active in that version. So version one was all written by Mary. 81 00:19:45.500 --> 00:19:50.910 Fernanda Viégas: The length of the line on version one is the length of the article, 82 00:19:50.920 --> 00:19:53.350 Fernanda Viégas: and then you can see that in version two 83 00:19:53.360 --> 00:20:11.430 Fernanda Viégas: that draft that that article stays. But there is a little blue line at the bottom. That's the paragraph that Suzanne added to the end of the article, and then on version two. It gets shrunk again because Martin came and deleted a piece of the initial 84 00:20:11.440 --> 00:20:31.400 Fernanda Viégas: orange line and insert it a little. You know a little piece of text. So and this goes on. And to make these things even more clear to see, we connect all the text that survives. Okay. So you can start to see holes and and things being growing and and shrinking. 85 00:20:31.480 --> 00:20:44.479 Fernanda Viégas: Another thing we can do now that we have this visualization is to actually visualize the same data in real time. So I can now start to see rhythms. So oh, yeah, between version one and version two a long time, 86 00:20:44.490 --> 00:20:54.830 Fernanda Viégas: you know it took a long time, but then version three came up right after. I can also highlight different versions of this text, and I can actually see the 87 00:20:54.840 --> 00:21:07.269 Fernanda Viégas: the raw text being displayed with the color of the author. So now let's actually see a demo of this. So this is the project is called history flow, 88 00:21:07.280 --> 00:21:22.940 Fernanda Viégas: and you can see a big diagram in the middle. This is the evolution of the article on Design on Wikipedia on the left. Here I have all the people who have edited this article on the right. I have the 89 00:21:22.950 --> 00:21:29.789 Fernanda Viégas: the article itself, and I also have a little wand that I can move around and I can see the article changing over time. 90 00:21:29.800 --> 00:21:30.700 Fernanda Viégas: Okay, 91 00:21:30.840 --> 00:21:33.800 Fernanda Viégas: Nothing to 92 00:21:34.250 --> 00:21:38.199 Fernanda Viégas: different. Oh, i'm going to need access to a little 93 00:21:38.620 --> 00:21:39.689 Fernanda Viégas: drop-down menu. 94 00:21:39.700 --> 00:21:45.550 Fernanda Viégas: Okay. Now let's look at a different article. So the article on cats. So first off 95 00:21:45.990 --> 00:22:14.059 Fernanda Viégas: many more people like cats than design on the Internet. I'm not surprised. Um, and it's a long article. It's a long article, and there are interesting things going on here. One is this stripy pattern here, and what this is. I can go back here to the beginning of the pattern, and I can see that someone added a table of the kingdom and the class and the order and the family of cats, and this survives. I can see that it's arise forever. So that's great. 96 00:22:14.310 --> 00:22:28.460 Fernanda Viégas: There's one one thing that's different. Here there's this antenna at the bottom that doesn't go anywhere what's going on there? So if I come over here I can see that someone added a whole bunch of paragraphs about the unix command cat, 97 00:22:28.750 --> 00:22:45.019 Fernanda Viégas: and right so for those of you who know that command It's funny right. It's like, Oh, does that fit in the page about cats on Wikipedia, or does it not because it's gone in the next in the next 98 00:22:45.040 --> 00:23:05.049 Fernanda Viégas: edit. So it's in the next version. So if I go there I actually see that it's not completely gone. Someone just created a new page called Cat Unix, and redirected all of that content there. So we were starting to see how are people? What is the dynamics of of collaboration here? 99 00:23:05.060 --> 00:23:14.559 Fernanda Viégas: I'll show you a different page. So we're going to see the page on abortion now and again lots of people 100 00:23:14.570 --> 00:23:43.949 Fernanda Viégas: contributing to this a very long article, but there are a couple of interesting things here. There are a couple of gashes that happen here in the middle of the diagram. So if I come over here I can see that the entire article was deleted. So this is vandalism. It's called mass dimension. If I come over here I can see that not only someone deleted it, it said. Someone said, Abortion is great, and then they added, Abortion is good, and then it got reverted. 101 00:23:44.130 --> 00:23:48.020 Fernanda Viégas: Okay. So vandalism exists in Wikipedia. 102 00:23:48.030 --> 00:23:58.549 Fernanda Viégas: But here's the thing that's interesting. If we look at when this happened the very first vandalism count. It happened on December the seventeenth, at four hundred and six, 103 00:23:58.560 --> 00:24:06.289 Fernanda Viégas: and it got reverted on the same day at four hundred and seven. So it took a minute for them to solve this, 104 00:24:06.300 --> 00:24:21.529 Fernanda Viégas: and we kept seeing this over and over again where we were like. How are they doing this, so we got in touch with Wikipedians, and we learned about something called the Watch page. You can have a watch list of all the articles you care about, and you get notifications when someone makes an edit. 105 00:24:21.540 --> 00:24:41.290 Fernanda Viégas: If that person is an Ip you've never seen before or a new user you may want to go check. And this is how they police a lot of what happens on Wikipedia. In fact, if I show this in real time, you you don't even see the the deletions because they are fixed so quickly. 106 00:24:41.300 --> 00:24:46.729 Fernanda Viégas: The last thing I want to show here is chocolate. So this is the 107 00:24:46.770 --> 00:24:54.630 Fernanda Viégas: article on chocolate, and if I visualize it by versions, you see this beautiful zigzag. 108 00:24:54.720 --> 00:24:58.099 Fernanda Viégas: What this is It's a net edit war 109 00:24:58.110 --> 00:25:14.209 Fernanda Viégas: so, and i'll show you what it is back here, someone, This person, Daniel C. Boyer, added. This little white paragraph here that says extremely rarely melted chocolate has been used to make a kind of surrealist sculpture, called 110 00:25:15.010 --> 00:25:21.780 Fernanda Viégas: the White Stripe, survives for a while, and then someone says, Removing boy or invention, 111 00:25:22.280 --> 00:25:27.309 Fernanda Viégas: Daniel Siboria comes back and says, Reverting Collage is not a boy or invention. 112 00:25:27.660 --> 00:25:31.199 Fernanda Viégas: Google search for chocolate. Collage finds only Boyer 113 00:25:31.210 --> 00:25:32.190 Fernanda Viégas: reverting. 114 00:25:32.200 --> 00:25:43.959 Fernanda Viégas: Leave your humbug out reverting, and so forth. So this is really a fight, and it happens on Wikipedia, and it's too bad because Daniel Sibora gets tired and leaves, 115 00:25:43.970 --> 00:26:06.980 Fernanda Viégas: and it's unfortunate because Martin and I did a search for a chocolate collage, and it does exist. But such is life on Wikipedia. The last thing I want to show about this visualization is that we also have a mode where we got rid of all the author colors, and we just colored the text based on how old it was. So the older it is, the darker it is. 116 00:26:07.310 --> 00:26:31.109 Fernanda Viégas: It was really nice to see that there is a lot of old text on Wikipedia, and one of the things that's interesting to us is that we think we're thinking about in this case about old texts as a proxy for quality. Because if you have communities that are, you know where nobody in the community touches the text. Chances are it's high quality, high quality text. 117 00:26:31.120 --> 00:26:38.259 Fernanda Viégas: Ok. So another example of visualizing, visualizing text and visualizing activity around text. 118 00:26:38.810 --> 00:26:42.590 Fernanda Viégas: Now let's turn to something very, very different. 119 00:26:42.600 --> 00:26:57.109 Fernanda Viégas: We're going to go back to numbers, but we're going to go back to numbers in in terms of massively high dimensional spaces, and we're going to jump into the present and talk about machine learning and how visualization can help. 120 00:26:57.340 --> 00:27:01.189 Fernanda Viégas: I only have time to focus on one area today, 121 00:27:01.200 --> 00:27:06.550 Fernanda Viégas: and it's going to be kind of embeddings. 122 00:27:06.560 --> 00:27:24.850 Fernanda Viégas: And so we're going to talk about kind of the hello as a as a warm up. We're going to talk about one of the Hello World data sets of machine learning which is amnest which you can see here. It's nothing more than handwritten digits. Okay, And the idea is to get a system that can 123 00:27:24.860 --> 00:27:28.490 Fernanda Viégas: separate the zeros from the ones from the two's, no 124 00:27:28.500 --> 00:27:37.890 Fernanda Viégas: no matter how bad your handwriting is. And so the way we do this is that we can turn images into vectors, 125 00:27:37.900 --> 00:27:49.169 Fernanda Viégas: and what we do is we literally go pixel by pixel, for each one of these images, and we give it a value. Each pixel has a value based on its color. 126 00:27:49.180 --> 00:27:58.479 Fernanda Viégas: So if the pixel is black, the value is zero. If the pixel is white, the value is one, and if the pixel is somewhere in between, 127 00:27:58.490 --> 00:28:17.189 Fernanda Viégas: the value is somewhere in between. So I do that for literally every pixel I end up with a vector and the vector in this case it's a vector of seven hundred and eighty, four dimensions, because I had seven hundred and eighty four pixels. 128 00:28:17.200 --> 00:28:23.370 Fernanda Viégas: Okay, so far, so good. I do this for literally every digit in my data set, 129 00:28:23.550 --> 00:28:47.499 Fernanda Viégas: and the good news is that now I can work in math right? I've gone from images to math. I can actually just look at how similar these vectors are, and once I have that we can map. Basically I have data that I can map. And the the demo I want to show you now is a visualization we created 130 00:28:47.510 --> 00:29:05.690 Fernanda Viégas: called the embedding projector, and it is available. It's open source. It's it's publicly available and what it does is it tries to visualize. It projects these highly dimensional vectors into a three D space. 131 00:29:05.900 --> 00:29:24.010 Fernanda Viégas: And what I have here is the same mnist. Ah, data set that we were looking at. They are all digits. I know the ground truth. In other words, humans have gone through each one of those digits and said, This is a zero. This is a seven, And this is how I am coloring 132 00:29:24.020 --> 00:29:29.829 Fernanda Viégas: these images that you see here. So the colors are the truth, 133 00:29:30.090 --> 00:29:49.870 Fernanda Viégas: and the clusters are what my system is doing. It's best to try to kind of like put things in where it thinks is the right places. So it's. You know the way I think about it is when we do brainstorming exercises, and everybody has a post-it note. And then afterwards you try to cluster the post-it notes. 134 00:29:49.880 --> 00:29:53.490 Fernanda Viégas: That's what this system is trying to do it's trying to cluster these post-it notes, 135 00:29:54.090 --> 00:30:18.490 Fernanda Viégas: and I can see that it's doing somewhat of a good job sometimes, and sometimes not so much So, for instance, this is a cluster of six, a number, you know, digits. I can click on it, and I can see the nearest neighbors. They happen to be all sixes for for this, which is great. But here it's getting a little confused between five and three. So if I click on an eight, 136 00:30:18.500 --> 00:30:43.979 Fernanda Viégas: if I click on this eight. Look! The nearest neighbors are five and eight and nine, and it's kind of a mess. So this starts to tell me how well or not, my system is being able to cluster things that are similar, and this is really important for me to try to improve my system for me to try to debug my system. And so this is one way in which data visualization can help. 137 00:30:44.430 --> 00:30:45.790 Fernanda Viégas: Okay, 138 00:30:45.800 --> 00:31:01.579 Fernanda Viégas: Now, let's talk about a more real world situation. So when we were at Google, one of the things that was really interesting that happened a number of years ago is this multilingual translation? 139 00:31:01.780 --> 00:31:12.519 Fernanda Viégas: So, Google, As you know, there's Google translate. And Google was interested in starting to use machine learning for translation. 140 00:31:12.530 --> 00:31:26.200 Fernanda Viégas: And one of the things that had always been the case is that you would have to train a specific model for one pair of languages. So you have a model for English to French, and then back and forth. 141 00:31:26.360 --> 00:31:36.029 Fernanda Viégas: You have a different model for English to Portuguese, and so many models depending on the pairs of languages you want to translate. 142 00:31:36.200 --> 00:31:46.240 Fernanda Viégas: At one point people at Google started researchers at Google started experimenting with a single model that maybe they could try to put in multiple languages, 143 00:31:46.250 --> 00:32:12.720 Fernanda Viégas: and they started to get interesting good results. Um! And they were kind of curious about what was the massive internal state? Ah! Of of these languages in the system? How was the system, thinking and separating these languages. So let's talk a little bit about this. One of the things that ah happened on these systems is. Imagine the system had been trained on 144 00:32:12.840 --> 00:32:16.390 Fernanda Viégas: pairs of sentences between English and Japanese, 145 00:32:16.400 --> 00:32:18.140 Fernanda Viégas: Ok. Back and forth, 146 00:32:18.320 --> 00:32:21.250 Fernanda Viégas: English and Korean, back and forth. 147 00:32:21.260 --> 00:32:23.879 Fernanda Viégas: This is all the data the system had ever seen 148 00:32:24.910 --> 00:32:30.530 Fernanda Viégas: without it ever having seen one sentence go from Japanese to Korean, 149 00:32:30.660 --> 00:32:49.340 Fernanda Viégas: it was able to do translations, high-quality translations between Japanese and Korean, and This is what we call zero shot. So it never had training data that looked like that. And yet it was doing a good, a good job which is surprising. And so one of the things that the scientists 150 00:32:49.350 --> 00:33:08.409 Fernanda Viégas: came to us, and we're puzzling about is, what does the embedding space look like for these multilingual systems, and what i'm showing you here. These are just abstract. It's not real data, but it's kind of like the image that the scientists had. So the image on the left. 151 00:33:08.420 --> 00:33:12.889 Fernanda Viégas: It's kind of imagine I am. Each one of these dots is a sentence, 152 00:33:12.900 --> 00:33:24.050 Fernanda Viégas: and they are colored by different languages, so let's say Japanese is green and English is blue, and then Portuguese is yellow. 153 00:33:24.520 --> 00:33:38.349 Fernanda Viégas: What the scientists couldn't figure out is, is the system dividing the languages into different corners, and then mapping for translation between these different corners? 154 00:33:38.360 --> 00:33:45.260 Fernanda Viégas: Or is it more like what you see on the right, where the system is bringing together multiple languages, 155 00:33:45.270 --> 00:33:54.609 Fernanda Viégas: and, despite say strings looking very different, like I have a string that says home and another string that says Kaza, 156 00:33:55.330 --> 00:34:15.230 Fernanda Viégas: does it know that the semantic meaning is the same. Does it not care about the strings right? The fact that it comes from different languages? So this was the question that they couldn't answer. And so what we did is, we decided. Well, let's try to visualize this, this, this space of embeddings and let's see what it looks like. 157 00:34:15.239 --> 00:34:26.860 Fernanda Viégas: So imagine. Because again we're dealing with sentences. Imagine a sentence like the stratosphere extends from ten kilometers to fifty kilometers, an outage altitude, 158 00:34:27.070 --> 00:34:56.579 Fernanda Viégas: and what i'm going to do to and I up to an approximation. I'm going to create a little dot for each one of these words, and it may look something like this again. This is all not real data, but just to give you a sense. And so the dots for the sentence will be scattered somehow, but things like ten and fifty, because they're both numbers will be closer together, because there's a certain similarity between them. And then what I do is, I connect these dots 159 00:34:56.590 --> 00:35:07.950 Fernanda Viégas: because that is the sequence of my sentence. Okay. So I visualize now one sentence in the embedding space. When I translate that sentence to a different language. 160 00:35:07.960 --> 00:35:15.090 Fernanda Viégas: Does it look like this? Where English is red and Portuguese is blue, and they are separated? 161 00:35:15.100 --> 00:35:22.890 Fernanda Viégas: Or does it look like this? Where I have English and Portuguese roughly together in a cluster. 162 00:35:22.900 --> 00:35:30.349 Fernanda Viégas: Okay, and this is what we try to to answer. So now let's take a look 163 00:35:30.360 --> 00:35:56.029 Fernanda Viégas: at the actual visualization. So this is a visualization of a system that takes three languages, English, Japanese, and Korean. I am visualizing sentences, and I am visualizing them in different colors, depending on their source language. So if the source language is is English, it's going to be blue. If it's, you know Japanese, it's going to be yellow and so forth. Okay, 164 00:35:57.210 --> 00:35:59.390 Fernanda Viégas: complicated image. 165 00:35:59.400 --> 00:36:07.629 Fernanda Viégas: I don't know exactly what's going on. But the main point here is, Remember those two pictures I showed you. 166 00:36:07.830 --> 00:36:19.329 Fernanda Viégas: Do the colors separate into different corners, so do I have a corner of only English, only blue, and another corner of only Japanese only red? 167 00:36:19.340 --> 00:36:38.349 Fernanda Viégas: Or are these colors coming together and clustering. They are coming together in clustering. I don't have different neighborhoods here of only a single color, and so let's look at one at our example sentence that 168 00:36:38.400 --> 00:36:47.529 Fernanda Viégas: we were looking at with the stratosphere. So I just highlighted. This tratosphere is in the range of ten kilometers to fifty kilometers, 169 00:36:47.540 --> 00:37:03.759 Fernanda Viégas: and my nearest neighbors, regardless of the language, are all in that little cluster together in the same cluster. So this was super interesting to us, because it was the first time we were seeing 170 00:37:05.140 --> 00:37:34.350 Fernanda Viégas: what looked like the the the initial signs of an interlingual of a universal language, the system was able to actually bring together different languages based on the semantic meaning of these sentences. And so that was very exciting. And it was a real like, you know, scientific insight, for for the team who had not been able to resolve this question of of what the space looks like, 171 00:37:34.360 --> 00:37:38.359 Fernanda Viégas: and you may be thinking Well, that's great. But why does that matter? 172 00:37:38.380 --> 00:37:51.190 Fernanda Viégas: Well, let's go see a sister. Ah, system. So this is the same kind of visualization, but of a system that takes Portuguese, Spanish, and English. And if you look at this 173 00:37:51.200 --> 00:37:57.720 Fernanda Viégas: you will see a neighborhood of just red to the side. 174 00:37:57.820 --> 00:38:19.299 Fernanda Viégas: It looks very different. We looked at this. We're like what is going on here. These are the same kinds of systems. What's going on? So we downloaded all the data, and we ran a statistical analysis of the quality of the data of the translations, and we found that, sure enough for this system, the quality was low. 175 00:38:19.310 --> 00:38:34.630 Fernanda Viégas: And so, in other words, if your system cannot solve the geometry of the space matters here. If your system is not being able to bring these languages together in terms of their semantic meaning, 176 00:38:34.640 --> 00:38:55.080 Fernanda Viégas: you have a problem. Your Your system needs to be debugged, improved. You maybe need more training data. But this is a problem. And so this is another way in which visualization can help us kind of get a little bit inside the inner workings of of these systems. 177 00:38:56.100 --> 00:38:57.250 Fernanda Viégas: Okay, 178 00:38:57.380 --> 00:39:10.750 Fernanda Viégas: Now, everything I was showing you here, these projections. Here they use a protection technique called T snee, which is a non-linear. 179 00:39:10.840 --> 00:39:23.609 Fernanda Viégas: So again, remember, I have a massive amount of dimensions. And now I am projecting all of those dimensions into unknownd. So I can try to understand what's happening. 180 00:39:23.870 --> 00:39:31.959 Fernanda Viégas: This comes at a price. So these visualizations are highly used in machine learning. 181 00:39:32.010 --> 00:39:36.140 Fernanda Viégas: They are tricky. And let's talk a little bit about this. 182 00:39:36.160 --> 00:40:00.570 Fernanda Viégas: Things can be deceptive in these visualizations, because Um and i'll show you a couple of very simple examples here. So imagine I have an original synthetic. I'm: I'm. Showing you some synthetic data here. Okay, with three clusters. Okay, So I know this is the ground truth. Okay? And then I do a T snee again, the same projection onto this data set. 183 00:40:01.030 --> 00:40:02.720 Fernanda Viégas: And it looks like this. 184 00:40:02.730 --> 00:40:32.580 Fernanda Viégas: So I have all of a sudden lost this global relationship between my clusters, because now they all look like they are equidistant, which they are not. And this is unknown problem with t snee, even though it's very useful that it does a good job of keeping the local clusters. So you can still see there are three clusters. It's not lying about that, but it doesn't do a good job of keeping the global structure 185 00:40:32.590 --> 00:40:54.659 Fernanda Viégas: of the original space high dimensional space. Okay? And There are other things, too. There are. T. Snee has a number of parameters that we can play with. So one of them is called perplexity. And again, if you look on the left. This is my original data set. This is the ground truth, 186 00:40:54.670 --> 00:41:08.589 Fernanda Viégas: and depending on my perplexity that I may choose. These clusters look very different, and so which one is right, which one is the truth. And so 187 00:41:08.650 --> 00:41:24.759 Fernanda Viégas: Martin and I wrote a a whole article about how to try to use these in in better ways, but these are used across the board, and in a sense they are kind of the new scatter plot 188 00:41:24.880 --> 00:41:36.160 Fernanda Viégas: of our age in terms of machine learning, and I think we have a long way to go still to try to understand. How do we? How do we work with the distortions that they create. 189 00:41:36.740 --> 00:41:45.740 Fernanda Viégas: So let's just pause there for a moment, and think like, Is this useful at all? Or are we lying to ourselves as we use this visualization technique, right? 190 00:41:46.030 --> 00:41:52.960 Fernanda Viégas: The truth is, we are already in a world where we use distortion all the time. Think about it 191 00:41:53.100 --> 00:41:55.870 Fernanda Viégas: any time we have a map of the earth, the 192 00:41:55.970 --> 00:41:57.670 Fernanda Viégas: we are destroying 193 00:41:57.680 --> 00:42:05.470 Fernanda Viégas: the data right? Because basically it's impossible to show something from threed to twod without distortion. 194 00:42:05.790 --> 00:42:21.220 Fernanda Viégas: But at this point we know enough about the distortions. We know enough about each projection technique. Hopefully as we're looking at these maps that we know how to take those distortions into account. So, as 195 00:42:21.350 --> 00:42:23.860 Fernanda Viégas: again because I grew up in Brazil, 196 00:42:24.080 --> 00:42:25.430 Fernanda Viégas: I 197 00:42:25.440 --> 00:42:53.329 Fernanda Viégas: would hate the way they would show the world map, because one of the most familiar and most popular projections is the Mercado Mercator Projection, and it makes Brazil tiny, and it makes everything in the in the northern hemisphere. Brazil and the rest of the southern hemisphere, tiny and the northern hemisphere huge, and i'm like No, Brazil is just as big as the Us. But you would never know if you didn't know it was destroyed right. 198 00:42:54.110 --> 00:42:56.020 Fernanda Viégas: All I'm saying is, 199 00:42:56.240 --> 00:43:08.940 Fernanda Viégas: as long as we understand some of these distortions, and we work critically with them. It is incredibly useful still to map this kind of high dimensional data. 200 00:43:10.270 --> 00:43:28.399 Fernanda Viégas: Another way in which data visualization can be helpful in terms of machine learning is to help people pro these systems. And here i'm talking about non experts, non-machine learning scientists. So let's talk about one example. 201 00:43:28.410 --> 00:43:40.299 Fernanda Viégas: Everybody talks about fairness and bias in in the machine learning in Ai systems. And one of the things that I think is really important as we have these conversations is that 202 00:43:40.760 --> 00:43:58.119 Fernanda Viégas: I Don't think these decisions should be left only for engineers and and machine learning experts to make. As some of these systems, you know, hit real communities and societies. So how can we share the responsibility? 203 00:43:58.130 --> 00:44:03.210 Fernanda Viégas: And I think part of it is that people need to understand what some of the trade-offs are 204 00:44:03.230 --> 00:44:06.729 Fernanda Viégas: these systems? And so how do you do that? 205 00:44:06.740 --> 00:44:17.789 Fernanda Viégas: I think visualization and simulation can be real, powerful tools in that effort. So i'll show you one little example that Martin and I did 206 00:44:17.800 --> 00:44:37.740 Fernanda Viégas: um. We're going to play a little game. We're going to pretend we're a bank, and we're going to decide who to give loans to each person who comes to our bank looks like a little circle, and they have a credits core. My credit score is a make-believe credit score. It goes from zero to one hundred 207 00:44:37.750 --> 00:44:53.889 Fernanda Viégas: higher is better. Um! Anyone colored in the light blue color would default on our loan. Anyone in a dark color would pay us back. So we definitely want to give loans to these people to the dark-colored people. 208 00:44:53.900 --> 00:44:55.140 Fernanda Viégas: Okay, 209 00:44:55.300 --> 00:45:04.769 Fernanda Viégas: everybody comes to our to our bank. We set a back bank threshold above that threshold we give loans to below. We deny loans, 210 00:45:04.780 --> 00:45:19.479 Fernanda Viégas: we're all happy except that real life isn't like this right kind of no matter where I put my threshold. I'm going to make incorrect guesses. I'm going to give loans to people who don't pay me back, and I'm going to deny loans to people who would pay me back. 211 00:45:19.840 --> 00:45:21.189 Fernanda Viégas: Okay, 212 00:45:21.200 --> 00:45:31.509 Fernanda Viégas: this is what this is still kind of made up, but it's what a distribution in real world starts to look like, and you can see that 213 00:45:31.640 --> 00:45:39.119 Fernanda Viégas: ah kind of no matter where I put my threshold. I have a mix of people who pay me back and don't pay me back. 214 00:45:39.130 --> 00:45:40.319 Fernanda Viégas: Okay. 215 00:45:40.330 --> 00:45:42.490 Fernanda Viégas: Now, Um. 216 00:45:42.520 --> 00:45:54.380 Fernanda Viégas: Together with these things, I have indicators. I'm a bank. I care about profit. I care about hopefully about How many people am I giving loans to? So I have things like, you know, positive rate. 217 00:45:54.390 --> 00:46:09.589 Fernanda Viégas: Um! Of all the people who come to my bank, what is the percentage of people who are getting loans. In this case it's fifty, two percent. Okay, Ah, percentage of correct guesses, incorrect guesses, and so forth. True, positive rate 218 00:46:09.600 --> 00:46:20.490 Fernanda Viégas: of all the people who come to my bank and would pay me back. What is the percentage of people i'm giving loans to so good news for me. I'm giving loans to most of them eighty, six percent. 219 00:46:20.500 --> 00:46:32.090 Fernanda Viégas: Okay, Now let's think about something broader than this. I have two populations that come to my bank, and they have different distributions, and I know that. 220 00:46:32.330 --> 00:46:47.289 Fernanda Viégas: How can I be fair to them? How can I be fair in deciding who who gets alone. So we created this visualization so that people could play with it online. So i'll show you what this looks like right now, 221 00:47:00.110 --> 00:47:07.500 Fernanda Viégas: all right. So This is the visualization, and I'm going to make it a little bit smaller, so that it fits here. 222 00:47:07.920 --> 00:47:31.790 Fernanda Viégas: So I have my distributions. I have the blue folks, blue people, the orange people who come to my to my bank. You can see that I can change threshold if I don't care at all about about fairness, and all I want to do is maximize my profit. I have a preset up here. I'm going to click on it. Yep, this is my maximum profit. Give it the given distribution of people. 223 00:47:31.800 --> 00:47:32.720 Fernanda Viégas: Okay. 224 00:47:33.180 --> 00:47:35.470 Fernanda Viégas: But if I decide to be fair, 225 00:47:35.490 --> 00:47:38.210 Fernanda Viégas: how can I be fair? Maybe I say, you know 226 00:47:38.470 --> 00:47:54.760 Fernanda Viégas: I don't care if you're blue. If you're orange, you come to my bank. I treat you the same. So this is called group unaware. I optimize, for I have the same threshold for everybody. I treat everybody the same. That's what fairness means to me. 227 00:47:54.770 --> 00:47:59.989 Fernanda Viégas: Okay, Well, once you do that, then let's look at your positive rates here. 228 00:48:00.000 --> 00:48:13.830 Fernanda Viégas: So of the blue people coming to my bank, I am giving loans fifty, two percent of the time of the Orange people coming to my bank. I'm giving loans. Only thirty percent of the time. Is that fair? 229 00:48:14.610 --> 00:48:37.049 Fernanda Viégas: Maybe Maybe not. So let's try harder. Maybe we can do something called demographic parity. So i'm going to click on this button next. And now i'm optimizing for the positive rate. So now I have the same percentage of blue people and orange people receiving loans from my bank. What happened, though, is that I had to set different thresholds. 230 00:48:37.060 --> 00:48:41.289 Fernanda Viégas: Maybe that's okay. Maybe that's what fairness means to me. 231 00:48:41.380 --> 00:48:47.389 Fernanda Viégas: Now look at true positive rate, though the people who would pay me back, 232 00:48:47.540 --> 00:48:53.989 Fernanda Viégas: I'm only giving it so i'm giving seventy one percent to the of loans to the 233 00:48:54.000 --> 00:49:07.290 Fernanda Viégas: you know, good orange people, but only sixty, four percent of the time. Am I giving loans to the good blue people? Okay, Is that fair, so there is something called equal opportunity. So I can click on that. And now i'm optimizing for that parameter, 234 00:49:07.300 --> 00:49:08.240 Fernanda Viégas: now 235 00:49:08.460 --> 00:49:10.160 Fernanda Viégas: long story short, 236 00:49:10.170 --> 00:49:26.239 Fernanda Viégas: the whole point of creating this visualization, and we created it. We put it online. It was an an accompaniment to an academic publication that had all the math. So the the paper had all the math, and we decided to visualize it. 237 00:49:26.790 --> 00:49:46.869 Fernanda Viégas: It was to start to bring a broader set of stakeholders to this discussion around fairness. So. Um! And what we saw once we put this online is that people resonated with this and that they wanted They started having conversations on Twitter and be like. Oh, I always thought demographic parity was the way to go. But now I can see that you 238 00:49:46.880 --> 00:49:54.210 Fernanda Viégas: and make no mistake. At this point there are hundreds of different metrics for fairness, 239 00:49:54.550 --> 00:49:56.790 Fernanda Viégas: and part of the point here is that 240 00:49:56.800 --> 00:50:00.419 Fernanda Viégas: none of them are perfect. There is no perfect 241 00:50:00.430 --> 00:50:18.810 Fernanda Viégas: all of them are a set of trade-offs. And so, once you were choosing whatever metrics make sense to you understand that you were always going to be making a set of trade-offs. And so one of the things that was really nice of doing this work was that we started getting emails. So for instance, we got an email from 242 00:50:18.820 --> 00:50:48.599 Fernanda Viégas: a Criminal Justice Department in one of the States here in the Us. And they were asking us, you know. Obviously, we're not a bank. We're not giving loans. But we are using algorithms, and we are making trade-offs in our choices here. Do you think a visualization like this could help people in the Department understand better the choices they're making, and we're like, Yes, we definitely think so. So, in fact, we created a whole tool at Google that is open source and 243 00:50:48.610 --> 00:51:07.070 Fernanda Viégas: publicly available, so that you can actually do these kinds of visualizations and change thresholds and understand what are the choices you're making? How do those choices impact the different people, or it may not be people with maybe products, or whatever that you're dealing with, 244 00:51:07.080 --> 00:51:11.629 Fernanda Viégas: so broadening the sense of, 245 00:51:12.050 --> 00:51:21.980 Fernanda Viégas: you know, broadening the number of people who can think critically about these mathematical trade-offs sometimes can be a really empowering. 246 00:51:22.200 --> 00:51:31.090 Fernanda Viégas: I want to talk very briefly about different perspectives and uses of database today, because this is interesting. 247 00:51:31.100 --> 00:51:55.829 Fernanda Viégas: There has been, especially in Academia. A lot of attention has been given to things like um. Ah, Edward Tufte, so i'd say the the The book you see here on the left is by Edward Tufti. He's one of like the Popes of of Data visualization, and has done amazing work to kind of educate people about how to make 248 00:51:55.910 --> 00:51:58.789 Fernanda Viégas: useful, effective data visualization. 249 00:51:58.800 --> 00:52:07.810 Fernanda Viégas: He also has a very specific take on things like your visualization should be neutral care about data, ink, ratio 250 00:52:07.830 --> 00:52:14.729 Fernanda Viégas: just by taking this notion just might make, you know, bringing up this notion of a neutral point of view, 251 00:52:15.240 --> 00:52:31.589 Fernanda Viégas: he tends to overlook a whole lot of choices that people make when they create visualization. So in contrast to that today there is a whole set of different voices that are talking about data visualization. So what you see here, data feminism on on the right, 252 00:52:31.760 --> 00:52:33.189 Fernanda Viégas: where people are saying 253 00:52:33.200 --> 00:52:50.599 Fernanda Viégas: nothing, we do with data. Initialization is neutral. There is no neutral. Even when you decide to collect data, let alone visualize. You are making a series of choices, and it's important to acknowledge those things. So there is a whole discussion ongoing discussion about 254 00:52:50.610 --> 00:53:03.519 Fernanda Viégas: the importance of emotion. How do we study emotion in relation to data visualization and the responses that people have, and I think the reason why I bring this up right now. I don't have a lot of time to talk about this. But 255 00:53:03.530 --> 00:53:33.239 Fernanda Viégas: um! It's because I think, even in a in an era where we are faced with so much misinformation, it is really important for us to think about data visualization as a rhetorical device It is. It is a common, it's a, you know, communication device, and like any other communication device, you can manipulate it. You can use it in whichever way. And I think it's really important that we realize that there are different points of views that people are using when they create 256 00:53:33.280 --> 00:53:39.260 Fernanda Viégas: data visualization, because i'm running out of time. I'm going to skip very quickly. 257 00:53:39.270 --> 00:54:01.690 Fernanda Viégas: And i'm going to talk. I'm going to finish off with Ah! Data visualization as art. So many years ago Martin and I were confronted with a question. The Boston Magazine came to us and said, Can you visualize Boston, and this is a print magazine. It's not interactive. We're like. Oh, my gosh! How do you visualize a city? 258 00:54:01.700 --> 00:54:14.130 Fernanda Viégas: So we started thinking about what's unique for us about Boston, and one of the things that we thought about is the seasonality. The seasons are very well marked here, and we're like. Is there any way we can capture them. 259 00:54:14.690 --> 00:54:32.009 Fernanda Viégas: So we decided to look at a public collection of user-generated images of the Boston Common The Boston Common is the biggest Public Garden in Boston. It's kind of like the central part of Boston, and 260 00:54:32.020 --> 00:54:51.249 Fernanda Viégas: there was something called Flickr back. Then, when we were doing this project, and people would post public photos, and they would label them. And a lot of those photos were licensed under creative comments. And so what we decided to do is, we decided to download a whole year's work of pictures 261 00:54:51.260 --> 00:55:01.389 Fernanda Viégas: of the bus in common, and then we bend those pictures by all the pictures from January, all the pictures from February, March, and so forth. 262 00:55:01.400 --> 00:55:03.670 Fernanda Viégas: Okay, that was our data. 263 00:55:04.330 --> 00:55:11.599 Fernanda Viégas: Then we did the following: We decided to kind of squeeze the color from those pictures, 264 00:55:11.810 --> 00:55:40.279 Fernanda Viégas: and with this we started counting colored pixels. That's what we did, and so imagine you are. You have a color cube. You have a color space what i'm showing you here, and you were literally using very pure hues right for each one of the little color cubes there, and you just fill it up. You just make each cube bigger. The more pixels of that color and some of the other colors are going to be smaller. 265 00:55:40.550 --> 00:55:47.999 Fernanda Viégas: And then what we decided to do was to remember we had a bin per month, 266 00:55:48.240 --> 00:55:57.409 Fernanda Viégas: and so we decided to connect those months in terms of color. Imagine a ribbon of color that gets thicker, 267 00:55:58.630 --> 00:56:03.759 Fernanda Viégas: depending on the month of the year. And so this is what the visualization looks like. 268 00:56:03.810 --> 00:56:10.519 Fernanda Viégas: So this is a whole year's worth of colors in the Boston Common, 269 00:56:10.530 --> 00:56:40.119 Fernanda Viégas: and the year starts at the very bottom with January at the bottom, and you can see just the amount of race that exists there. That's just That's that's our winter. Lots of lots of snow here. Um right and then, but still lots of blue sky, too blue never never leaves us, which i'm very, very grateful for. But then, as you and it moves cornwise right so if you keep going up to the to the left. You see you start seeing 270 00:56:40.130 --> 00:56:47.229 Fernanda Viégas: like a thin line of like fuchsia or this magenta. 271 00:56:47.400 --> 00:57:16.680 Fernanda Viégas: Those are the these are the flowers starting to grow in the park during spring, and then you have a bulge of green around June, and those are the the, you know the leaves coming out. If you go all the way to the bottom, right like October and November. That's when you start to see the earth tones of of the famous fall foliage here in Boston, And so you can really see how these things are changing 272 00:57:16.690 --> 00:57:35.729 Fernanda Viégas: kind of ebb and flowing over time. And the reason why we decided that this should be a circular display is also because time and seasons are cyclical. So to us it made a lot of sense to go with a circular display. Here 273 00:57:35.740 --> 00:57:38.490 Fernanda Viégas: I want to finish with another 274 00:57:38.500 --> 00:58:04.819 Fernanda Viégas: art project that that Martin and I did, based on data. And this was we were interested in visualizing the wind, and this was again during a very cold winter here, so who knows what's going to happen this winter? Maybe it's not so, Code. We won't come up with a visualization. But we were looking at the when we wanted to understand. Is there anything we can do with when we that we could visualize the wind? And 275 00:58:05.070 --> 00:58:06.620 Fernanda Viégas: truth is, 276 00:58:06.630 --> 00:58:31.820 Fernanda Viégas: wind has been visualized, in fact, for hundreds of years. What you're seeing here are two screenshots of vector fields where you aggregate. Then you use these arrows to show both direction and the length of the arrow, to show the strength of the wind. Um! And these work fine. But we wanted to make sure that there was. You know that we could do um. 277 00:58:31.910 --> 00:58:47.829 Fernanda Viégas: We could do something more um dynamic. We could use the web, and it's dynamic form, and also because the wind is dynamic. So we wanted movement. And so what ended up happening? I will show you now, if I can go there, 278 00:58:48.310 --> 00:58:53.799 Fernanda Viégas: let me. It's the wind map. So this is the windmap. 279 00:58:53.890 --> 00:59:02.510 Fernanda Viégas: It's it's a public website. It's live, and it shows the wind. 280 00:59:02.660 --> 00:59:17.250 Fernanda Viégas: It actually it's stuck right now in the pest a little bit. I'm two hours back. This was the win two hours ago. Um in the Us. And what you can do is you can start to zoom in, 281 00:59:17.260 --> 00:59:22.990 Fernanda Viégas: and I can see that Boston, where I am right now, has a lot of wind coming 282 00:59:23.000 --> 00:59:45.080 Fernanda Viégas: kind of from the um southeast, which is good news to us, because it means it's going to be warmer wind this time of the year than ah otherwise. It also has really broad patterns like this. Is all this wind going towards the east at the top of the country, and it has very delicate patterns here 283 00:59:45.090 --> 01:00:00.080 Fernanda Viégas: because of the mountains. So this if I zoom in, If I start to zoom in around Denver, Colorado, you can start to see how the wind kind of parts and and comes together because of of 284 01:00:00.470 --> 01:00:02.459 Fernanda Viégas: because of the terrain. 285 01:00:02.470 --> 01:00:31.579 Fernanda Viégas: Um, Another thing we started seeing when we created this visualization was just how much the wind changes day after day. So, for instance, there are days like these, and I'm going to bring up a day different day, when you know Canada steals all of our air, and not even the mountains can keep us from keeping. You know the air here. Um! There are days like this, 286 01:00:32.060 --> 01:00:46.779 Fernanda Viégas: and this was the first time after we created this visualization that we were seeing a hurricane make landfall in the Us. This is hurricane isaac in two thousand and twelve. And 287 01:00:46.790 --> 01:01:01.330 Fernanda Viégas: again, because This is a real-time visualization. We started getting emails from people. So I remember one email we got was from someone in New Orleans saying, i'm here in New Orleans. I'm hunkering down praying that this thing 288 01:01:01.420 --> 01:01:06.180 Fernanda Viégas: it passes over. I'm looking at your visualization, And it was a really, 289 01:01:06.350 --> 01:01:34.370 Fernanda Viégas: you know, powerful thing to get that email at that moment as these as this was happening. Um, in fact, uh later on. Um, this uh was Hurricane Sandy making landfall, and we were kind for for a change. Boston was kind of in the past. It was not, you know, the I of the hurricane, or anything like that. But we were also looking at the visualization and hoping that things would be okay. 290 01:01:34.380 --> 01:01:36.060 Um. And so 291 01:01:36.350 --> 01:01:37.549 Fernanda Viégas: it it. 292 01:01:38.550 --> 01:01:41.390 Fernanda Viégas: Visualizing data in real time 293 01:01:41.400 --> 01:01:53.650 Fernanda Viégas: has this added effect and power that we had not seen before. In fact, we thought oh, so just one note. This is the same data that you see here. 294 01:01:53.660 --> 01:02:18.790 Fernanda Viégas: So um! It's interesting to me just how much the different technique can change what you, what you see in the data and the fact that there is a lot more aggregation happening in these vector fields than there is here. And yet our eyes can resolve these images so easily, I think, is an unintuitive but interesting point. 295 01:02:19.230 --> 01:02:35.600 Fernanda Viégas: But the other thing about the windm is just the response it started to get. We put it up as an art piece. If you think about it, we're not labeling much anything. We don't show state boundaries. We don't show capitals. We don't show many things. We don't even use color, 296 01:02:35.610 --> 01:02:49.070 Fernanda Viégas: which is one of the major dimensions you have in data visualization, just because all we wanted was to see the shape of the wind. And so we thought about this as very much an art piece, 297 01:02:49.080 --> 01:03:17.750 Fernanda Viégas: and not not utilitarian in any way. And yet the response was very interesting. So farmers started using the map to decide when to spray their props, for instance, scientists, and would look at the map to better understand bird migration, butterfly migration teachers would sit with the kids and and look at the wind map and try to figure out weather forecasts. 298 01:03:17.760 --> 01:03:27.689 Fernanda Viégas: Um pilots. We started getting emails from pilots, both commercial pilots and military pilots, saying that they were using the windmill, at which point we're like. 299 01:03:27.700 --> 01:03:36.400 Fernanda Viégas: Oh, no, no, no, no, no, please, no! This visualization uses surface wind, which is from zero to twelve meters. 300 01:03:36.410 --> 01:03:48.069 Fernanda Viégas: It's definitely not where you fly an airplane, and the winds up there are very different, and the pilots know this, and we had a very clear disclaimer, saying, This is surface wind. 301 01:03:48.080 --> 01:03:54.789 Fernanda Viégas: And yet we kept getting emails and emails and me to the point where we decided to put a disclaimer on our website, we said, 302 01:03:54.800 --> 01:04:02.710 Fernanda Viégas: Please do not use the map or its data to fly a plane, sail a boat or fight when wildfires 303 01:04:02.830 --> 01:04:06.680 Fernanda Viégas: because we felt responsible. We kept getting emails. 304 01:04:06.690 --> 01:04:20.820 Fernanda Viégas: And then, after we put the disclaimer, we started getting emails like this, where they're like, Yeah, Yeah, I see your disclaimer. But please respect the power of this visualization in promoting the prevention of wildfires. So 305 01:04:21.180 --> 01:04:32.739 Fernanda Viégas: it was interesting, and it told us a couple of things which again, when you make. I think this amount of data 306 01:04:32.850 --> 01:04:52.450 Fernanda Viégas: very easily understandable to people, because again, this is data that that is already out there. It's government data which i'm very grateful for um. It just takes a life on its own, and people appropriate it and use in whichever ways they want to use, which is also really 307 01:04:52.460 --> 01:04:58.990 Fernanda Viégas: interesting and powerful to see. In fact, the technique ended up influencing 308 01:04:59.000 --> 01:05:22.420 Fernanda Viégas: most media outlets. Ah! In the way they show whether and when specifically, we did not obfuscate our code on purpose. And today, when you see, you know maps of wind in all these different Ah, ah! Places! It actually comes! Ah! From the wind map, which is really cool really, and these are professionals. We're 309 01:05:22.430 --> 01:05:43.069 Fernanda Viégas: doing something for art. And yet, you know, it ended up there. It's also part of the permanent collection at Moma, and it's actually being shown right now in New York City. So I just want to end on this note to say that hopefully, 310 01:05:43.080 --> 01:06:03.790 Fernanda Viégas: you know, after all these Demos Um, I've been able to impress upon you that data visualization is much more than visualizing numbers, only it's not just for individuals. It is definitely not just for experts. Um. And I I personally think about this as a very broad and expressive medium that invites it. 311 01:06:03.800 --> 01:06:13.679 Fernanda Viégas: Big variety of stakeholders into the world of data. And so with that, Thank you so much. I would love to take questions. If there are any, 312 01:06:14.760 --> 01:06:23.839 Michael Littman: you can't hear everyone clapping, but we peaked at about two hundred and five people out there listening to what you had to say. And fantastic job. Thank you for sharing. 313 01:06:23.850 --> 01:06:40.500 Michael Littman: Ah, this the the story of of the work that you do, the the the actual visuals that are are high by popping. This is great. Yeah, we do. Have. We have a bunch of questions that that i'd be happy to share with you as we go through. So i'm going to do them in 314 01:06:40.510 --> 01:06:44.669 Michael Littman: chronological order. So we're going to be harkening back to the beginning of the talk 315 01:06:44.700 --> 01:06:50.189 Michael Littman: You were talking about the difference visualizing the search term completions. 316 01:06:50.200 --> 01:06:55.690 Michael Littman: Yes, what are they? They They call it the the Google suggest. 317 01:06:55.700 --> 01:07:20.570 Michael Littman: Yeah, but but the there's a there's a group, is it? Is it wired that does this. They actually turn these into interviews for people they ask, they say, what would Fernando have to say about? And then, you know, or is Fernanda really blank, and then and then they'd actually ask the person the questions that they Google has. Um. Have you tried more than two search terms? You had this this beautiful way of visualizing the Venn diagram of two. What happens when you do more than two, does it? Just? 318 01:07:20.580 --> 01:07:34.709 Michael Littman: Oh, that's that's interesting. We have not tried. That, too, seemed like the obvious thing for us to do. Venn Diagrams are hard. They're hard. So once you get out of two or three. It's 319 01:07:34.780 --> 01:07:38.919 Fernanda Viégas: It's a good. It's a good question. I don't have an answer. No, we have not tried that. 320 01:07:38.930 --> 01:07:50.110 Michael Littman: Okay, All right, fantastic. And I was noticing that some of the arrows Crick crossed, and I didn't understand why they were doing that. They're kind of changing the order that the arrows were coming out to where they landed. 321 01:07:50.120 --> 01:07:55.349 Michael Littman: Oh, was that doing that? Let me see. Yeah, yeah, 322 01:07:55.360 --> 01:08:11.329 Michael Littman: that's interesting. So one of the things we try to do, and this could be Why, um. So there is an order in which Google the Api gives you the most popular to the least popular, if you will. So there's an orator there 323 01:08:11.340 --> 01:08:24.930 Fernanda Viégas: for both of those terms. Right that you're quering now, because we want to emphasize the places where the two queries have the same completion. It could be that 324 01:08:24.939 --> 01:08:34.689 Michael Littman: the order gets messed up, and we're going to have to. You know the the the arrows may end up. It's been something to the outside when it was actually more central before. But 325 01:08:34.700 --> 01:08:40.589 Michael Littman: yeah, I get that. And I saw that. But it was also before you put in the second query. One time there was an end. 326 01:08:40.600 --> 01:08:42.490 Michael Littman: Oh, it was getting this, Really, 327 01:08:42.500 --> 01:08:50.430 Michael Littman: I didn't know what. Okay? All right. So maybe that was either. Maybe it had a guess as to what your second query was going to be, and it was pre-loaded. 328 01:08:50.560 --> 01:08:59.729 Michael Littman: Oh, oh, oh, no, no, no, I'm looking i'm. Looking. Okay. There is one thing that happens with text. That is one of the challenges of text, which is, 329 01:08:59.810 --> 01:09:04.869 Fernanda Viégas: imagine I have an ordered list, which I do in this case. 330 01:09:05.220 --> 01:09:28.609 Michael Littman: It also matters the length of the text. So if I have, let's say, will Brazil win the World Cup? That's a long string, or will Brazil, and whatever it is, Will Brazil nuts hurt me. But if I have a very short completion that happens to be very popular, 331 01:09:28.620 --> 01:09:45.890 Michael Littman: that arrow is going to be longer than the other ones, and it's going to do so. It looks like they cross, but it's because literally we have different lengths of strings. Oh, wow! And and we are 332 01:09:45.899 --> 01:09:46.889 Michael Littman: right. Yeah, 333 01:09:46.899 --> 01:09:48.390 Michael Littman: we are censoring the text. 334 01:09:48.399 --> 01:09:54.540 Michael Littman: Yeah, thanks for clarifying, all right. So And that question was from anargia dos. So i'm gonna. 335 01:09:54.780 --> 01:10:06.009 Michael Littman: I guess that was answered, Live all right. So, um you had said in the Wikipedia thing was very cool. You had said that the older text was black, but we were thinking. Maybe the older text was white. 336 01:10:06.020 --> 01:10:12.289 Fernanda Viégas: No, the older text um, the the the older the text, the darker it becomes, 337 01:10:12.300 --> 01:10:16.790 Michael Littman: and so new additions will be white, The bright color. Yeah, 338 01:10:16.800 --> 01:10:23.399 Michael Littman: yeah, does that. Is that a confusing thing with respect to visualization? Because when there was missing text that was also plaque. 339 01:10:23.980 --> 01:10:30.150 Fernanda Viégas: Oh, yes, that's a good point. Yeah, so it's It's 340 01:10:30.200 --> 01:10:37.109 Fernanda Viégas: It's the kind of thing. Let me put it this way. We built that visualization, so that Martin and I, 341 01:10:37.400 --> 01:10:54.940 Fernanda Viégas: as curious scientists could understand those editing dynamics. And so we knew exactly how to read those colors. But to your point, and that's a very good point. It's a very important point. If we had released this as a tool 342 01:10:54.950 --> 01:11:23.659 Michael Littman: for anyone, we would definitely have to have a very clear caller legend and make it and and annotate things, so that so that there would be no confusion on how we're using color. So yeah, it very, very good point about. Yeah, we were doing. We were creating this little microscope for ourselves. But yes, when you release, you have to really pay attention to annotations and legends. Okay, So that observation was from the Glen Langston. 343 01:11:23.670 --> 01:11:26.410 Michael Littman: Uh, let's see. So um 344 01:11:27.270 --> 01:11:28.429 Michael Littman: I don't 345 01:11:29.070 --> 01:11:43.349 Michael Littman: just so that everybody knows we have this slot until one thousand two hundred and thirty Eastern time. So that's another twenty minutes. So i'm going to prioritize a little bit with the questions. Let's see um 346 01:11:46.240 --> 01:11:48.989 Michael Littman: all right. So uh, so in Drakeiki, 347 01:11:49.000 --> 01:12:02.570 Michael Littman: who, I think, one of my program managers said the problem with Tee's knee. So he he recognized that you were using. Excuse me before you had said, it is that a small change in one of the parameters could yield vastly different visualizations. 348 01:12:02.580 --> 01:12:16.119 Michael Littman: Yeah. And look at this data. It's easy to fiddle with the parameters, and the results make sense. However, data is abstract. One could not know what the clusterings have clustering to, and this relates to something that you actually said in the talk as well, which is sure distortions happen. 349 01:12:16.130 --> 01:12:22.600 Michael Littman: But as long as you know what the distortions are, and you can adjust for them. That's fine. But, as we know, with the Mercator projection, 350 01:12:22.660 --> 01:12:31.500 Michael Littman: Don't think most people know what that that there is a distortion, and I know that there's a distortion, but I actually don't remember what it is. It's something about 351 01:12:31.510 --> 01:12:51.790 Michael Littman: kind of where the where you are on the the thickness of the anyway. So I don't. I don't know how to do that. But the Disney I also. I've used it, but I I don't know what it's trying to optimize. What is it trying to capture? So I can't counter the distortion at all. What What do you say to that? Is there a way to use visualization to understand how to understand the visualization. 352 01:12:51.800 --> 01:13:00.400 Fernanda Viégas: So this is one of the things that I think should be a very active area of research. 353 01:13:00.440 --> 01:13:09.120 Fernanda Viégas: My sense. My good feeling is that one of the things we could start doing right now is a couple of things. So one is 354 01:13:09.400 --> 01:13:15.120 Fernanda Viégas: Let's say you find it's to combine data, visualization and statistics. Right? So 355 01:13:15.510 --> 01:13:21.260 Michael Littman: let's say you were interested in it. Of course those are both things that are hard for people, probably more. So the statistics. 356 01:13:21.270 --> 01:13:26.399 Fernanda Viégas: Yeah. Yeah. So yes, let's okay. So let's take take a step back 357 01:13:26.440 --> 01:13:29.409 Fernanda Viégas: of all the visualizations I showed today. 358 01:13:29.990 --> 01:13:42.390 Fernanda Viégas: The Tsni kind of visualization to me is the least accessible to regular people. Right? It is the most. It is the kind of visualization that you're going to have your experts. 359 01:13:42.400 --> 01:13:43.380 Fernanda Viégas: Look at 360 01:13:43.730 --> 01:13:59.349 Fernanda Viégas: having said that. I also believe we're now in a world where you're going to have cross-functional teams. For instance. Let's say you are trying to launch a product or you are trying to decide whether your machine learning system is doing a good job. 361 01:13:59.520 --> 01:14:28.809 Fernanda Viégas: Chances are you're gonna have a situation where you're uh software engineers who know the system will be looking at T snee. Um. You may have a research scientist who knows the data hopefully understands how to interpret tea, sneaky pots. Look at tea sneaks. But then you may want to show some of that to your project, manager, or even to your executive, and I think that's where a layer of translation, and also 362 01:14:28.820 --> 01:14:34.930 Fernanda Viégas: maybe some checks with statistics are really important. So you can do things like, 363 01:14:34.940 --> 01:15:04.610 Michael Littman: okay. We see this cluster. How reopens this cluster? Can we do things like? What is the centroid? How how much, how pet, how cohesive is this cluster when we look at all the the metadata about those data points. Do we see things that to us look like a cohesive cluster? So I think more analysis is important. Want to make sure you are not projecting things that aren't there, and too, so that you can have 364 01:15:04.690 --> 01:15:14.740 Fernanda Viégas: a real critical conversation with other stakeholders that are not going to be as versed as you on how these plots are distorted. 365 01:15:15.330 --> 01:15:29.589 Michael Littman: Yeah. And so this is. I have found this to be really relevant in my still somewhat new job as as as a division director. So um on Julie Bums bombs, I from, I think, from Nsf. Said I see an analogy. So this is now looking at your your bank loan. Visualizations 366 01:15:29.600 --> 01:15:47.720 Michael Littman: analogy between this example of bank loans and our peer review process in terms of fairness inherent bias. This is something that we think about a lot. And and Nsf: we really want to try to do the right thing. But we've got lots of data. We've got lots of sort of confusing signals that we're all trying to integrate together. We've got lots of 367 01:15:47.740 --> 01:15:57.610 Michael Littman: stakeholders that are pulling us in different directions, and I know since that since I've arrived, people have presented me with visualizations of for example, these are the proposals that are coming in, and 368 01:15:57.620 --> 01:16:08.390 Michael Littman: I don't know if i'm dumb, or if i'm too sophisticated in my use of visualizations. But I look at them, and I say I need to know how these were generated because I don't know how to interpret these blobs of color. 369 01:16:08.400 --> 01:16:27.449 Michael Littman: Yes, yes, and it doesn't seem to be. It's becoming common practice for data visualization to be accessible, because i'm being bombarded with it. But it doesn't seem to be common practice yet to say, Oh, and by the way, here's what Here's what It means right. It's just it's just It's just hitting us in the eyes, and we're reacting to it. But I don't know, 370 01:16:27.460 --> 01:16:42.390 Michael Littman: reacting appropriately. So do you have like best practices. I know in your artwork. Maybe it's less significant. But when we're actually trying to make decisions based on this data, how how should we be informed about how it was generated. In the first place. 371 01:16:42.400 --> 01:16:47.050 Fernanda Viégas: Yeah. So I think to your point. There is 372 01:16:47.480 --> 01:16:57.290 Fernanda Viégas: a big difference between like traditional data visualization bar charts like those. I think we know we are on solid ground. 373 01:16:57.300 --> 01:16:59.790 Michael Littman: It's a very literal in a sense, right 374 01:16:59.800 --> 01:17:00.990 Michael Littman: interpretation Also, 375 01:17:01.000 --> 01:17:10.730 Fernanda Viégas: like I feel like we've come a long way about things like label, Label your axes, label your units right? Have a title. Explain to me, where is this guy? So 376 01:17:10.740 --> 01:17:22.670 Fernanda Viégas: all of those practices should should somehow translate. But when you get to a place like t snee or u-map, or those things axes don't mean anything right, and So 377 01:17:22.930 --> 01:17:52.570 Fernanda Viégas: it needs to be clear to to data scientists that if the axes don't mean anything, you're gonna you better have more um justification, or you better have more data around again, whether or not the the clusters or the relationships you're looking at really stand any test or any statistical test. I feel like those are the things that should be brought to front, even, 378 01:17:52.580 --> 01:18:07.289 Fernanda Viégas: you know, as as you're presenting these things to the people who didn't do the data analysis that you did right so to your point. Like you were the director, you should. It shouldn't just be a screenshot and say, this is the cluster 379 01:18:07.300 --> 01:18:10.490 Michael Littman: That's the starting point. That's the starting point. Only, 380 01:18:10.500 --> 01:18:26.089 Michael Littman: yeah, And so we should be able to poke at it a little bit and see if it's bridal right? Because exactly it's very pretty, and it's evocative to get real. So I like I like that that you are suggesting that additional statistics should be available. It should just be the the dead snapshot 381 01:18:26.100 --> 01:18:27.769 Michael Littman: that's that's great. 382 01:18:27.780 --> 01:18:52.189 Michael Littman: Ah! Somewhat related to this is Andy Rubin asks. So this is in the context of the the overlapping distributions that you were showing and playing with the thresholds for the different subgroups, the the blue group and the and the orange group. Um! That was Ah, it's just a beautiful exercise. I feel like I feel like the New York Times has been doing things like this sometimes, letting people play with it. They should do. Yours. That was like that was really. 383 01:18:52.200 --> 01:18:57.740 Michael Littman: It really does It makes it very vivid that it's trade-offs it's just trade-offs. It's not like 384 01:18:57.920 --> 01:19:11.230 Michael Littman: fair versus unfair. It's like Well, it's fair in this way, and this is unfair in this other way. What matters to you so? But if we were talking about real data, like real bank loan data or real proposal data at Nsf. 385 01:19:11.240 --> 01:19:19.460 Michael Littman: Isn't. It true that that in the real world we don't know the actual distributions in particular. The missing piece is the counter-factual. If we, 386 01:19:19.720 --> 01:19:33.380 Michael Littman: I guess the one the one critical counterfactual is. If we had given this person the loan, would they have paid us back? So the negative people we don't get to find out the positive people we do, and they can be labeled appropriately. But the negative people like, 387 01:19:33.390 --> 01:19:38.289 Michael Littman: and so we don't get to directly visualize that. So we 388 01:19:38.300 --> 01:19:43.690 Fernanda Viégas: we don't have the gods-eye view of the whole thing right? It's true, 389 01:19:44.210 --> 01:19:51.090 Fernanda Viégas: but even so, I think one so the tool that I didn't the work that you saw with the bank loan. 390 01:19:51.100 --> 01:20:00.909 Fernanda Viégas: It was a very small simulation visualization, and then from there, and from the reaction we got, we built an actual tool called the What if tool 391 01:20:01.000 --> 01:20:18.850 Fernanda Viégas: that again is available? It's online it's it's open source where you can visualize your data and to try to understand what your system is doing to your data. So to your point, one of the things. Let's say you're talking about proposals Right 392 01:20:18.860 --> 01:20:32.850 Fernanda Viégas: You could. The idea would be that in this tool you would be able to play with different thresholds. But then you would be able to do things like. Now let me fat. So, instead of just having one threshold, I want to facet these proposals by 393 01:20:32.860 --> 01:20:51.450 Fernanda Viégas: whatever institutions, or by the gender of the pi, or by whatever. And now I want to see what happens if I play with these thresholds in a more sophisticated way. Right so? Yes, to your point. You don't have all the counterfactuals all the time. 394 01:20:51.590 --> 01:20:54.920 Fernanda Viégas: But I think one of the things that we are really missing 395 01:20:54.940 --> 01:20:57.310 Fernanda Viégas: is to give people 396 01:20:57.430 --> 01:21:00.900 Fernanda Viégas: good ways of developing an intuition 397 01:21:01.000 --> 01:21:08.480 Fernanda Viégas: for how these choices affect things in the real world, and I think that's a little bit of what 398 01:21:08.550 --> 01:21:27.660 Fernanda Viégas: ah visualization tools and kind of can get you closer to It's it's developing. This does not look right to me. I don't know, you know, like. I don't know what the ideal is, or if there is a perfect solution here, probably chances are no, 399 01:21:27.800 --> 01:21:31.970 Fernanda Viégas: but I feel like we don't even have the intuition right now, 400 01:21:31.980 --> 01:21:40.089 Michael Littman: right that sometimes the once you develop that intuition, at least, if something is off just really off, it hits you in the in between the eyes, and you can act on it, 401 01:21:40.100 --> 01:21:51.890 Fernanda Viégas: or you can come out of the session and say, Oh, my gosh! The one thing we should be asking for in these proposals is this other piece of information? 402 01:21:52.500 --> 01:21:54.190 Fernanda Viégas: And and we're missing that 403 01:21:54.200 --> 01:22:10.250 Michael Littman: So for the next round can we have an additional piece of information, right? So it's kind of like shining light on on these different dimensions of your data That might be useful. That's great, 404 01:22:10.280 --> 01:22:25.089 Michael Littman: all right. So the the Boston color Visualization got people very excited, so maybe we could do this as a quick call and response. So very best answers to a couple of these questions. Um, Have Jonathan Madison asked. Is there other cities that you have done this with? 405 01:22:25.100 --> 01:22:32.789 Fernanda Viégas: Oh, yes, we did, Rio, where I grew up, and we did. We did a couple, but yes, we haven't shown them to anyone. 406 01:22:32.800 --> 01:22:37.889 Michael Littman: Okay, I think I think there's There's some hunger out there. Okay, Good 407 01:22:37.900 --> 01:22:43.280 Michael Littman: um. Catalina Arkansas I'm. I'm asked. 408 01:22:43.290 --> 01:23:03.289 Michael Littman: But making the loop made a lot of sense because of global climate change. The years are actually changing over time. Would there be any interest in extending that and using time as the X axis and seeing, can we actually see the vibrations changing or the shifts shifting? 409 01:23:03.300 --> 01:23:08.540 Fernanda Viégas: Oh, I would love that that is such a good idea, one of the challenges there 410 01:23:08.760 --> 01:23:18.970 Michael Littman: that we're for this project. We were relying on literally a public collection of photos, so the signal is noisy. 411 01:23:18.980 --> 01:23:22.490 Michael Littman: But yeah, but I really like that idea. 412 01:23:22.500 --> 01:23:38.090 Michael Littman: It's you have biased in all kinds of ways like people are going to be more likely to take pictures of the grass and and just kind of wrapping that up? Sandra Handy asked. Okay, this is great. You You pointed us to a bunch of public examples are the yeah Urls collected someplace like, Can we share them with? 413 01:23:38.100 --> 01:23:44.990 Michael Littman: Oh, yes. So if you just go to my website here, can I i'll put it here. 414 01:23:45.000 --> 01:23:46.200 Fernanda Viégas: It's 415 01:23:46.210 --> 01:23:48.949 Michael Littman: so. Blaine might have to copy it to everybody else. 416 01:23:48.960 --> 01:23:50.589 Fernanda Viégas: Fernando Viegas dot com. 417 01:23:50.600 --> 01:23:53.949 Fernanda Viégas: You can see all of this, all of this work there, 418 01:23:54.190 --> 01:23:58.640 Michael Littman: all right. So it's Fernando Vegas calm. No space, no accent. Yes. Calm. 419 01:23:58.720 --> 01:24:03.090 Michael Littman: That's it all right. And Blaine copied it for everybody. All right. Fantastic. 420 01:24:03.100 --> 01:24:18.989 Michael Littman: Yeah, I I just think I mean It's just wonderful how I want to say tactile. You've made the visuals, which I guess is a kind of sense crossing, but it becomes a thing when you're manipulating. It makes it so much more embody and and relatable. It's just It's just great. And so I know people are going to want to play. 421 01:24:19.000 --> 01:24:20.690 Michael Littman: Oh, good. Good. 422 01:24:20.700 --> 01:24:26.690 Fernanda Viégas: Yeah. The web here, for instance, you can spend hours there. It's It's a time sink. 423 01:24:26.700 --> 01:24:29.990 Michael Littman: Yeah, So the the okay, the wind visualization. 424 01:24:30.000 --> 01:24:47.489 Michael Littman: First of all, what? So this is another example of like me as a viewer, not really understanding what it is that you were showing. So how does the wind readings which I guess, are made in like little spinning cups? That's the big places. How does that get turned into lines that are that are shown across an entire region. 425 01:24:47.500 --> 01:24:54.710 Fernanda Viégas: Yeah, it's a great question. So it turns out the Us. Government relies on a huge network of sensors, 426 01:24:54.720 --> 01:25:23.660 Fernanda Viégas: and it's even more interesting than that. Each State has its own network of sensors, and they all send these readings Ah! To ah to be collected by Noah. And so Noah makes this Ah, this data set available. It's also a forecast data set in the following sense: They know what tries to forecast the wind every three hours, and then we show that forecast. And so, 427 01:25:23.670 --> 01:25:31.069 Fernanda Viégas: the closer you are to that three-hour mark the closer you are going to be to a new reading from Noah. Right 428 01:25:31.080 --> 01:25:46.379 Fernanda Viégas: and literally all it is. It's the direction of the wind. It's the force. It's the speed of the wound. There are other dimensions, too, but we chose just to to literally use speed and direction. 429 01:25:46.390 --> 01:26:08.469 Fernanda Viégas: Um and all. We're doing. It's like the way we visualize this. It's like a particle system that we, where we let the particles leave little traces, and and so they they leave their trace, and then they finish their life, and then they are born again. And so literally, that's that's what's happening with those lines that you see moving. 430 01:26:08.800 --> 01:26:13.890 Michael Littman: So it is a somewhat complex computation that you're doing to aggregate these readings? Right 431 01:26:13.900 --> 01:26:17.289 Fernanda Viégas: it is it? But but no Noah does that. 432 01:26:17.300 --> 01:26:24.940 Michael Littman: No one does that for us, and it's funny because it doesn't happen very often; but every once in a while you will see 433 01:26:24.990 --> 01:26:43.929 Fernanda Viégas: that the readings from a State are a little off on the border, and so like. For instance, you will see Texas like the wind in Texas, looks different. It's not because the wind is really different. It's just like the sensors and the and the bringing together the weaving of these things. It's a little off, 434 01:26:44.270 --> 01:26:45.420 I see. 435 01:26:45.460 --> 01:26:48.490 Michael Littman: All right. That's yeah, that's super interesting. Um, 436 01:26:48.500 --> 01:27:17.510 Michael Littman: yeah, data, you know data can be noisy, and and it's uh dirty. Is that the right where you want us to clean that? Yes, all right. That's yeah, that's really interesting. So um, let's see this is a long question, so i'm not sure what to pull out of it. Well, let me ask a different question while I read this, which is related to this question of aggregating the data and then making it available the way that you were showing it. It not only was was animating over an entire country scale of 437 01:27:17.520 --> 01:27:24.640 Michael Littman: area. But then you could zoom in, and it was. It was updating, as you were, as you were doing that. How much 438 01:27:24.650 --> 01:27:38.620 Michael Littman: you see a lot of your research is concerned with just the visual display of the information thinking about what to display, how to display. What colors to use, what smoothness to use It's It's beautiful stuff. It's definitely like considered. 439 01:27:38.640 --> 01:27:51.390 Michael Littman: But then, to make all this happen in real time, I would think there'd be some fairly sophisticated algorithm design. Are you? Are you leveraging other People's work to do that? Or is that also part of what you're responsible for, because you know the data in new ways. 440 01:27:51.400 --> 01:28:04.099 Fernanda Viégas: So I mean as much as we can reuse existing things. We will happily reuse them. But for the windmap. Specifically we did everything from scratch, because 441 01:28:04.340 --> 01:28:06.490 Fernanda Viégas: the the tricky thing with the win. So, 442 01:28:06.500 --> 01:28:22.860 Fernanda Viégas: even though in retrospect, the when the way we decided to visualize, when may seem very obvious. We were stuck in this project for like two months, and we almost gave up because we we have a whole talk, just talking about the failures around the women. 443 01:28:22.870 --> 01:28:41.380 Fernanda Viégas: And so one of the tricky things about this kind of visualization is like you also have visualizations of fluid dynamics, and that that kind of look like like the wind up. But it's those are three, d They're much more sophisticated than what we're doing. We just needed something really to be two D, 444 01:28:41.390 --> 01:29:02.790 Fernanda Viégas: and to kind of leave these traces we're talking about. So we had we. We did it all uh from scratch and and yes, to your point. We had to, you know, to make sure things were performant. We had to do things like recycling the number of particles and not drawing anything that's off screen and all of those tricks that you do. Yeah, 445 01:29:03.550 --> 01:29:05.350 Michael Littman: Okay. So that's 446 01:29:05.520 --> 01:29:08.000 Michael Littman: that's what it looked like. And it's it's fantastic. 447 01:29:08.010 --> 01:29:36.260 Michael Littman: Just getting it right on so many of those levels is obviously a tremendous amount of work. See Maya Plankington first, she wrote, clap, Clap, because again people all over the all over the world. Today they were clapping, and you couldn't hear them. But, my! Down at the clip I figured out a clever way of getting it across. She says. Amazing work. I see this work being using criminal justice for creating better and more equitable models predicting recidivism. Is there anything about data visualization that you would like to caution people about 448 01:29:36.270 --> 01:29:50.490 Michael Littman: in terms of potential misuse, especially as it pertains to understanding bias. So the more that people use this stuff the more it could potentially mislead people, either accidentally or sometimes even on How can you caution us? 449 01:29:50.500 --> 01:29:59.920 Fernanda Viégas: Yes, so I um. And this is a little bit of the note that I was talking about with different voices being brought up, 450 01:30:01.160 --> 01:30:10.349 Fernanda Viégas: I think, always having in the back of your mind that data. Visualization is not a neutral just because you're looking at data. It doesn't mean It's neutral in any way, 451 01:30:10.360 --> 01:30:28.050 Fernanda Viégas: and it is used to persuade. In fact, it is used all the time to persuade. So if people are presenting data visualizations to you, I mean sometimes you're just like you have insights, and it's wonderful, and that's what I look forward to, 452 01:30:28.060 --> 01:30:57.899 Fernanda Viégas: but also be skeptical. Be skeptical and ask questions, ask questions about the data about the framing of the data um about what scales people are using, and and again like this is going to depend a lot on what technique is being presented to you. We know how to do these things at this point much much more effectively for simple wraps. So, for instance, one of the things that's very hard for people to understand, for 453 01:30:57.910 --> 01:31:10.099 Fernanda Viégas: for lay users to understand are things that differences in scale. You have a linear scale, then you have a log scale. People have no idea how to so be mindful of that. 454 01:31:11.010 --> 01:31:14.750 Fernanda Viégas: At the same time, I feel like hopefully, data visualization 455 01:31:14.870 --> 01:31:39.540 Fernanda Viégas: gives people more of a more confidence to ask questions about the data and about what it is that they are seeing. I really hope that this is one of the things that we get from from data visualization. So I think I think you know It's going to depend on the on the data you're looking at on the on the visualization technique. But be on the mindset of being curious and and asking questions to the person who created the visualization. 456 01:31:40.150 --> 01:32:07.540 Michael Littman: Okay? Well, that is fantastic advice. And i'm going to I'm going to answer the last question myself. Well, there's okay, just to be clear. There's been a lot of other questions that I skipped. I apologize to people as questions I skipped hopefully. You still got a lot out of the answers that that we got to talk about, and the last question is, will this recording be available. So um! Ah! Blaine had actually posted that, and I will be posted right now. The link that will become live is not live yet. It will have a copy of the recording, so you can find. 457 01:32:07.550 --> 01:32:10.409 Michael Littman: Just talk there. And 458 01:32:10.850 --> 01:32:23.489 Michael Littman: and yeah, fantastic. So this is. It's been great getting a chance to talk to you. The Nsf. People we're going to. We're going to grab another hour of Fernando's time at three o'clock today, so we're going to have a more informal chat just amongst 459 01:32:23.500 --> 01:32:32.389 Michael Littman: us folks. So please please join us for that. And I Yeah, my last question for you. Of course Fernanda is. Will Brazil win the World Cup, 460 01:32:32.400 --> 01:32:33.460 Fernanda Viégas: and 461 01:32:34.060 --> 01:32:35.289 Fernanda Viégas: I am who 462 01:32:35.300 --> 01:32:42.089 Michael Littman: but I have to be very humble after that final game we had with Germany many years ago. 463 01:32:42.100 --> 01:32:45.690 Michael Littman: I am humbly, hoping so. 464 01:32:45.700 --> 01:32:53.890 Michael Littman: The best on that. All right, I think that that's it. Believe anything we need to do to close this down. I think that's it 465 01:32:53.980 --> 01:33:02.789 Michael Littman: all right. Well, thank you, everybody for coming, and they're starting to disappear. Fernando a little bit really just delightful. 466 01:33:02.800 --> 01:33:05.109 Fernanda Viégas: Ah, thank you so much. All right. I'll see you later.