Transcript for Vladimir Vapnik: Statistical Learning

SPEAKER_02

00:00 - 01:03

The following is a conversation with Vladimir Vapnik. He's the conveyor of the support vector machine, support vector clustering, VC theory, and many foundational ideas in statistical learning. He was born in the Soviet Union and worked at the Institute of Control Sciences in Moscow. Then, in the United States, he worked at AT&T and AC Labs, Facebook Research, and now as a professor at Columbia University. His work has been cited over 170,000 times. He has some very interesting ideas about artificial intelligence and the nature of learning, especially on the limits of our current approaches and the open problems in the field. This conversation is part of MIT course on artificial general intelligence and the artificial intelligence podcast. If you enjoy it, please subscribe on YouTube or rate it on iTunes or your podcast provider of choice. Or simply connect with me on Twitter or other social networks at Lex Friedman spelled FRID. And now here's my conversation with Vladimir Vapnik.

SPEAKER_01

01:19 - 01:27

Einstein famously said that God doesn't play dice. You have studied the world to the eyes of statistics.

SPEAKER_02

01:27 - 01:35

So let me ask you in terms of the nature reality. Fundamental nature reality does God play dice.

SPEAKER_00

01:37 - 03:10

don't know some factors and because we don't know some factors which could be important it looks like good way guys but we're not sure described in philosophy they distinguish between two positions of instrumentalism, where you create a theory of production and position of realism, where you try to understand what God did. For example, if you have some mechanical laws, what is that? Is it law which, true, always scenario, or it is law which allow you to predict position of moving element? What you believe, you believe that it is God's law, the God created the world which obey to this physical law, or it is just law for predictions. for predictions. If you believe that this is law of God and it is always true everywhere, that means that you're the earliest. So you're trying to really understand God's thought.

SPEAKER_01

03:10 - 03:13

So the way you see the world is as an instrumentalist,

SPEAKER_00

03:15 - 04:22

You know, I'm working for some models, models of machine learning. So in this model, we can see setting and we try to solve, resolve the setting, solve the problem. And you can do in two different ways from the point of view of instrumentalists and that's what everybody does now. because they say the goal of machine learning is to find the role for classification. That is true, but it is instrument for perfection. But I can say the goal of machine learning is to learn about conditional probability. So how God played use and He is He played what is probability for one, what is probability for another given situation. But for prediction, I don't need this. I need zero, but for understanding, I need conditional probability.

SPEAKER_01

04:23 - 05:10

So let me just step back a little bit first to talk about, you mentioned, which I read last night, the parts of the 1960 paper by Eugene Wigner on reasonable effectiveness of mathematics and natural sciences. It's such a beautiful paper by the way. It made me feel to be honest to confess my own work in the past few years on deep learning, heavily applied. It made me feel that I was missing out on some of the beauty of nature in the way that math can uncover. So let me just step away from the poetry of that for a second. How do you see the role of math in your life? Is it a tool? Is it poetry?

SPEAKER_00

05:10 - 06:22

What words it's said and does math for you have limits of what it can describe Some people saying that mass is language which use God So I believe it's big to God or use God so I believe that this article about effectiveness, unreasonable effectiveness, as mass, is that if you're looking at a mathematical structure, they know something about reality. And the most scientists for natural science, they're looking on a question in trying to understand reality. So the same in machine learning. If you try and very carefully look on all equations, which define conditional probability, you can understand something about reality more than from your fantasy.

SPEAKER_01

06:23 - 06:28

So Matt can reveal the simple underlying principles of reality, perhaps.

SPEAKER_00

06:28 - 07:40

You know what means simple? It is very hard to discover them. But then when you discover them and look at them, you see how beautiful they are. And it is surprising why people did not see it before. You're looking at the equation and derive it from equations. For example, I talk yesterday about Lisquernet. And people had a lot of fantasies have to improve Lisquernet. But if you look going step by step by solving some equations, you suddenly you get some term which, after thinking, you understand it and describe position of observation point. the list of all the lot of information. We don't look in composition of point of observation, so we look in calmly, on residuals. But when you understood that, that's very simple idea, but it's not too simple to understand. And you can derive this just for my questions.

SPEAKER_01

07:40 - 07:49

So some simple algebra, a few steps will take you to something surprising that what you think about.

SPEAKER_00

07:49 - 08:03

And that is proof that human intuition not to reach and very primitive. And it does not see very simple situations.

SPEAKER_01

08:03 - 08:31

So let me take a step back in general, yes. But what about human is opposed to intuition and ingenuity? The moments of brilliance. So I use so, do you have to be so hard on human intuition? Are there moments of brilliance in human intuition that can leap ahead of math? And then the math will catch up.

SPEAKER_00

08:32 - 08:58

I don't think so. I think that the best human intuition is putting in axioms. And then it is technical, where they axioms take you. But if they correctly take axioms, but it's axiom polished during generations of scientists. And this is integral.

SPEAKER_01

08:58 - 09:24

Use them. So that's beautifully put. But if you look at, when you think of Einstein and a special relativity, what is the role of imagination coming first there in the moment of discovery of an idea? So there's obviously mix of math and out of the box imagination there.

SPEAKER_00

09:25 - 10:13

That's identical. Whatever I did, I exclude any imagination. Because whatever I saw in machine learning, the confirmation imagination, like features, like deep learning, they are not relevant to the problem. when you're looking very carefully from mathematical equations, you derive in very simple theory, which goes far by them theoretically, then whatever people can imagine. Because it is not good fantasy. It is just interpretation, it is just fantasy, but it is not what you need. You don't need any imagination to derive, say, mind principle of machine learning.

SPEAKER_01

10:15 - 10:37

When you think about learning and intelligence, maybe thinking about the human brain and trying to describe mathematically the process of learning, that is something like what happens in the human brain. Do you think we have the tools currently? Do you think we'll ever have the tools to try to describe that process of learning?

SPEAKER_00

10:37 - 12:27

It is not description of what's going on. It is interpretation. It is your interpretation. Your vision can be wrong. You know, when God invent Microsoft, living good for the whole time, when he got this instrument and nobody, he kept secret about Microsoft. But he wrote reports in London Academy of Science. and his report when he looked into the blood, he looked everywhere on the water, on the blood, on the spear. But he described blood like fight between Queen and King. So he saw blood cells, red cells, and he imagines that it is army fighting each other. And it was his interpretation of situation. And he sent it this report in the Kadeemiau side. They very very fully looked because they believed that he is right. He saw something. Yes. But he gave wrong interpretation. And I believe this ain't can happen. He's praying. The most important part, you know, I believe in human language. In some product, it's so much wisdom. For example, people say that it is better than a thousand days of diligent studies one day, this great teacher. But if you ask him what teacher does, nobody knows and that is intelligence and but we know from history and now from mass and machine learning that Teacher can do a lot.

SPEAKER_01

12:27 - 12:32

So what from a mathematical point of view is the great teacher? I don't know.

SPEAKER_00

12:32 - 13:04

I don't know, but we can say what teacher can do. He can introduce some invariants, some predicate for creating invariants. Have he do it? I don't know because teacher knows reality. and can describe from this reality a predicate invariance, but you know that when you're using invariance, you can decrease the number of observations, hundred times.

SPEAKER_01

13:04 - 13:37

So, but maybe try to pull that apart a little bit. I think you mentioned like a piano teacher saying to the student play like a butterfly, right? I played piano playing guitar for a long time. Yeah, that's, there's maybe a romantic poetic, but it feels like there's a lot of truth in that statement. There's a lot of instruction in that statement. And so, can you pull that apart? What is that? The language itself may not contain information.

SPEAKER_00

13:37 - 13:44

It's not blah, blah, blah, blah, blah. It's not blah, blah, blah. It's what? Effective. Effective, you're playing.

SPEAKER_01

13:44 - 13:56

Yes, it does, but it's not the language. It feels like what is the information being exchanged there? What is the nature of information? What is the representation of that information?

SPEAKER_00

13:57 - 15:11

I believe that it is sort of predicate, but I don't know. That's exactly what intelligence and machine learning should be. Because the rest is just mathematical technique. I think that what was discovered recently is that there is two mechanisms of learning. one called strong convergence mechanism, and weak convergence mechanism, before people use only one. In weak convergence mechanism, you can use predicate, that's what play life, butter life, and it will immediately affect your playing. You know, this is a resumption, probably, great. It looks like a dark. It looks like a dark. and fuck like a duck, then it is probably duck, yes. But this exact about predicates looks like a duck, what it means. So you so many ducks, that you're training data. So you have description of how, how, how looks integral, looks, ducks.

SPEAKER_01

15:11 - 15:13

Yeah, the visual characteristics of a duck.

SPEAKER_00

15:13 - 16:46

Yeah, yeah, but you won't. And you have model for the cognition now. So you would like to show that theoretical description for model coincide with empirical description, which you saw on the editing. So about looks like a duck, it is general, but what about seems like a duck? You should know the duck's whims. You can say it play chess like a duck. Okay, duck does not play chess. And it is completely legal, predicate, but it is useless. So, half teacher can recognize not useless predicate. So, out to now, we don't use this predicate in existing machine learning. So, why we use elements of data? So you can't deny the fact that swims like a duck in quacks like a duck has humor in it. Has ambiguity. Let's talk about swim like a duck. And it does not say jumps, jumps, like, let's say dark. Why? Because it's not relevant. But that means that you know, darks, you know, different darts, you know, animals. And you derive from this, that it is relevant to say simple idea.

SPEAKER_01

16:47 - 17:16

So underneath, in order for us to understand swims like a duck, it feels like we need to know millions of other little pieces of information. We pick up along the way. You don't think so. It doesn't need to be this knowledge base. In those statements, carries some rich information that helps us to understand the essence of duck. How far are we from integrating predicates?

SPEAKER_00

17:16 - 18:18

No, you know that when you consider it complete, so you're machine learning. So what it does, you have a lot of functions. And then you're talking, it looks like a duck. You see your training data. From training data, you recognize life. Expect that Duck should look. Then you remove all functions which does not look like you think it should look from training day. So you decrease amount of function from which you pick up one. then you give a second predicate and again, decrease the set of functions. And after that, you pick up the best function again. Fine, it is standard machine learning. So why you need not too many examples.

SPEAKER_01

18:18 - 18:24

As your predicates are very good.

SPEAKER_00

18:24 - 18:30

Because every predicate is invented to decrease a divisible set of functions.

SPEAKER_01

18:32 - 18:39

So you talk about admissable set of functions and you talk about good functions. So what makes a good function?

SPEAKER_00

18:39 - 18:52

So admissible set of function is set of function, which has small capacity of small diversity, small VC dimension, which contain good function inside.

SPEAKER_01

18:52 - 19:06

So by the way, for people who don't know VC, you're the V in the VC. So how did describe to the person what VC theory is? How did describe VC?

SPEAKER_00

19:09 - 20:15

So machine capable to pick up one function from the admissible set of functions. But set of admissables function can be big. So you contain all continuous functions and users. You don't have so many examples to pick up function. But it can be small. Small. We call it capacity, but maybe better call diversity. So not very different function in the set is infinite set of function, but not very diverse. So it is small, this is dimension. When this dimension is small, you need small amount of training date. So the goal is to create admissible set of functions which is have a small VC dimension and contain good functions. Then you should use the able to pick up the function using small amount of observations.

SPEAKER_01

20:17 - 20:31

So that is the task of learning is creating a set of admissible functions that has a small VC dimension and then you've figured out a clever way of picking up

SPEAKER_00

20:32 - 21:19

No, sorry, that is goal of learning, which I found for my way to yesterday. Statistical learning theory does not involve in creating a admissible set of function, in classical learning theory everywhere, 100% in textbook. The set of function admissible set of function is given, but this is science about nothing, because the most difficult problem to create admissible set of functions. given, say, a lot of functions can kill set of functions, created missible set of functions, that's means that it has finite VC dimension, small VC dimension, and contained good function. So this was out of consideration.

SPEAKER_01

21:20 - 21:32

So what's the process of doing that? I mean, it's fascinating. What is the process of creating this admissible set of functions? That is important. That's invariant. Can you describe invariance?

SPEAKER_00

21:32 - 23:10

Yeah, you're focusing of properties of training data and properties means that you have some function. And you just count what is value, average value of function on training data. You have model and what is expectation of this function on the model and they should coincide. So the problem is about half to pick up functions. It can be any function. In fact, it is true for all functions, but because when they're talking said, say, duck does not jump in, so you don't ask question, jump like a duck, because it is trivial, it does not jump in, it doesn't help you to recognize you, but you know something. which question to ask, and you are asking for it, it seems like it. Like a duck. But look like a duck at this general situation. Look, slight say, guy who has this illness, is this disease. It is legal. So, there is a general type of predicate clubs like and special type of predicate, which related to this specific problem. And that is intelligence part of all this business. And that, the teacher is involved.

SPEAKER_01

23:11 - 23:34

incorporating the specialized predicates. What do you think about deep learning as a neural networks, these arbitrary architectures, as helping accomplish some of the tasks you're thinking about? Their effectiveness or lack thereof, what are the weaknesses and what are the possible strengths?

SPEAKER_00

23:35 - 26:35

You know, I think that this is fantasy. Every sink, like deep learning, like features, let me give you this example. One of the greatest books is Churchill Book about history of Second World War. And he's starting this book describing that in all time, when war is over. So, the great kings, the gods are together, and most of them were relatives, and they discussed what should be done, how to create peace. And they came to a agreement. And when it happens first of all, the general public came in power. They were so greedy that rock Germany. And it was clear for every body that it is not peace. that peace will last only 20 years because they was not professional. And the same I see in machine learning. Zaramotimotitions were looking for the problem from a very deep point of view of mathematical point of view. Zaram computer scientists was mostly does not know mathematics. They just have interpretation of that. and they invented a lot of blah blah blah interpretations like deep learning. why you did deep learning, but it does not know deep learning. But you might not know neurons. It is just function. If you like to say piecewise linear function, say that. And do in class of piecewise linear function. But they invent something. And then they try to prove a advantage of that through interpretations, which mostly wrong. And when it comes, they appeal to brain, they know nothing about that, nobody knows what kind of brain. So I think the more reliable, what can mass? This is the mathematical problem, to your best to solve this problem. Try to understand that there is no only one way of convergence, which is strong way of convergence. There is a weak way of convergence, which requires predicate and if you will go through all the stuff. You will see that you don't need deep learning. Even more, I would say, one of the theory, which could represent the theory. It says that optimal solution of mathematical problem, which is described learning, is on shadow network, not on deep learning.

SPEAKER_01

26:36 - 27:20

and a shallow network. Yeah, the ultra problem is there. Absolutely. So in the end, what you're saying is exactly right. The question is, you have no value for throwing something on the table, playing with it, not math. It's like in your own network or you said throwing something in the bucket or by the biological example and looking at kings and queens of the cells with a microscope. You don't see value in imagining the cells or kings and queens and using that as inspiration and imagination for where the math will eventually lead you. You think that interpretation basically deceives you in a way that's not productive.

SPEAKER_00

27:21 - 27:41

I think that if you try to analyze the nature of learning and especially discussion about deep learning, it is discussion about interpretation. Not about things about what you can say about things.

SPEAKER_01

27:41 - 28:46

That's right, but I'm just surprised by the beauty of it. So the not mathematical beauty but the fact that works at all, or you are criticizing that very beauty. Our human desire to interpret, to find our silly interpretations in these constructs. Let me ask you this. Are you surprised? Does it inspire you? How do you feel about the success of a system like Alpha Go, a beating the game of Go? using neural networks to estimate the quality of a board and the quality of the presentation quality of the board. Yeah, yes. But it's not our interpretation. The fact is, a neural network system doesn't matter. A learning system that we don't, I think, mathematically understand that, well, beats the best human player. That's something that was taught in pause.

SPEAKER_00

28:46 - 28:48

So you empirically have discovered that this is not a very difficult problem.

SPEAKER_01

28:57 - 29:03

It's true. So maybe it can't argue.

SPEAKER_00

29:03 - 30:49

So even more, I would say, that if the use deplorning, it is not the most effective way of learning theory. And usually, when people use deplorning, they're using zillions of training days. Yeah, but you don't need this. So I describe challenge, can we do some problems, which do well? Deep learning method, with deep net using content time's less, training data. Even more, some problems, deep learning cannot solve. because it's not necessary, they create admissible set of functions. You create deep architecture, means you create admissible set of functions. You cannot say that you create a good admissible set of functions. It is not conforming. But it is possible to create admissible set of functions because you have electronic data. It actually, for mathematicians, when you consider your variant, you need to use law of large numbers. When you make a training in existing algorithm, you need uniform law of large numbers. which is much more difficult. So you see the mention and also stuff. But nevertheless, if you use balls, weak and stroke way of convergence, you can decrease a lot of training that.

SPEAKER_01

30:49 - 30:54

Yeah, you could do the three, this one's like a duck and quacks like a duck.

SPEAKER_00

30:54 - 30:54

Yeah, yeah.

SPEAKER_01

30:54 - 31:43

But, so let's, let's step back and Think about in human intelligence in general. I'm clearly that has evolved in a non mathematical way. It wasn't as far as we know, God or whoever. didn't come up with a model in place in our brain of admissible functions. It kind of evolved. I don't know, maybe you have a view on this, but so Alan touring in the 50s in his paper asked and rejected the question, can machines think? Is not a very useful question, but can you briefly entertain this useful, useless question? Can machines think? So talk about intelligence and you view of it?

SPEAKER_00

31:43 - 33:22

I don't know. I know the hearing described imitation if computer can imitate human being. Let's call it intelligent. And he understands that it is not syncing computer. He completely understand what he does. But he set up problem of limitation. So now we understand that the problem not in imitation. I'm not sure that intelligence just inside of us. It might be also outside of us. I have several observations. So, when I prove some theory, it's very difficult to understand. In couple of years, in several places, People proved the same theory. So, it's a dilemma. After that, it was done. Then, another guy is proved the same theory. In the history of science, it's happened all the time. For example, geometry. It's happened simultaneously. Lubachersky, Zen, Gauss, and Bajai, and another guy, and it approximately, in ten times, period, ten years, period of time. And I saw a lot of examples like that. And when they developed something, they developed pink, something congeneral, which effect everybody. So maybe our models that intelligence only inside the fast is incorrect.

SPEAKER_01

33:22 - 33:24

It's our interpretation. Yeah.

SPEAKER_00

33:24 - 33:31

It might be the exist. Some connection. Yes. Bold intelligence. I don't know.

SPEAKER_01

33:31 - 34:04

You're almost like plugging in into. Yeah, exactly. And contributing to this network and to a big maybe in your own network. No, no, no. On the flip side of that, maybe you can comment. And a big old complexity in how you see classifying algorithms by worst case running time in relation to their input. So that way of thinking about functions. Do you think P equals NP? Do you think that's an interesting question?

SPEAKER_00

34:04 - 35:45

Yeah, it is an interesting question. But let me talk about complexity in about worst case scenario. Zureze mutimatical setting. When I came to United States in the 1990s, people did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. The Central Reserve did not know. Because it is mathematical tool. You can do only what you can do using mathematics. And which has clear understanding and clear description. And for this reason, we introduce complexity. and we need this because using actually this diversity like this one. Well, this is the mention you can prove some theory. But we also create theory for case when you know probability measure. And that is the best case we can happen to this entropy theory. So from a mathematical point of view, You know the best possible case in the most possible case. You can derive different model in the video, but it's not so interesting.

SPEAKER_01

35:45 - 35:47

You think the edges are interesting?

SPEAKER_00

35:53 - 36:09

It is not so easy to get good ball, exact ball. It's not my new cases where you have a ball, it's not exact. But interesting principles, which discover the mass.

SPEAKER_01

36:09 - 36:39

Do you think it's interesting because it's challenging and reveals interesting principles that allow you to get those bounds? Or do you think it's interesting because it's actually very useful for understanding the essence of a function of an algorithm? So it's like me judging your life as a human being by the worst thing you did in the best thing you did versus all the stuff in the middle. It seems not productive.

SPEAKER_00

36:40 - 37:06

I don't think so, because you cannot describe situation in the middle, or it will be not general. So you can describe edges, and it is clear, it has some model, but you cannot describe model for every new case. So you will be never accurate.

SPEAKER_01

37:08 - 37:32

But from a statistical point of view, the way you've studied functions and the nature of learning in the world, don't you think that the real world has a very long tail that the edge case is a very far away from the mean? The stuff in the middle or no?

SPEAKER_00

37:34 - 38:11

I don't know. I think that for my point of view, if you will use formal statistic, you need uniform law of large numbers. If you will use this In variance business, you will need just law of large numbers. And there's a huge difference between uniform law of large numbers and large numbers.

SPEAKER_01

38:11 - 38:16

As I used to describe that a little more, or you will show you just take it.

SPEAKER_00

38:16 - 39:52

For example, when I talking about dark, I gave three predicates. It was enough. But if you will try to do formal distinguishing, you will need a lot of observation. And so that's means that information about looks like a duck contains a lot of bit of information, formal bits of information. So we don't know that how much bit of information contains things from artificial intelligence. And that is the subject of analysis. all business. I don't like how people consider artificial intelligence. They consider as some code which imitate activity of human being. It is not science. It is applications. You would like to imitate go ahead, it is very useful and a good problem. You need to learn something more. How people try to do, how people can to develop, say, predict, see if it's like a dog or play like butterfly or something like that, then not the teacher says you. How it came in his mind. How he chooses image. So that process.

SPEAKER_01

39:52 - 40:00

That is the problem of intelligence. That is the problem of intel and you see that connected to the problem of learning. Absolutely.

SPEAKER_00

40:00 - 40:11

Are they? Because you immediately give this predicate like specific predicate, swims like a dog or guaculate a dog. It was chosen somehow.

SPEAKER_01

40:12 - 40:27

So what is the line of work, would you say? If you were to form a set of open problems, that will take us there. Play a link about a fly will get a system to be able to.

SPEAKER_00

40:27 - 40:59

Let's separate two stories. One much magical story that if you have predicate, you can do something. And another story how to get predicate. It is intelligence problem. and people even did not start understanding the intelligence. Because, to understand intelligence, first of all, try to understand what doing teachers. After your teach, why want one teacher better than another one?

SPEAKER_01

40:59 - 41:05

Yeah, so you think we really even haven't started on the journey of generating the paradox.

SPEAKER_00

41:05 - 41:38

You don't understand, you even don't understand this, this problem exists. Because did you feel? No, I just know a name. I won't understand why one teacher better than another. And have effect teacher student. It does not because he repeating the problem which is in textbook. He makes some remarks. He makes some philosophy of reasoning.

SPEAKER_01

41:38 - 41:48

That's a beautiful, so it is a formulation of a question that is the open problem. Why is one teacher better than another?

SPEAKER_00

41:48 - 41:51

What he does better?

SPEAKER_01

41:52 - 41:57

Yeah, what, what, what, why in, at every level?

SPEAKER_00

41:57 - 41:58

Uh, people?

SPEAKER_01

41:58 - 42:04

How did they get better? What does it mean to be better? Uh, the whole.

SPEAKER_00

42:04 - 42:24

Yeah, yeah, from, from whatever model I have, yeah, one thing you can give a very good predicate, like, picture can say, uh, swims like a dog. And then as I can say, jump like a dog. and jump like a duck, curry zero information.

SPEAKER_01

42:24 - 42:32

So what is the most exciting problem in statistical learning you've ever worked on or are working on now?

SPEAKER_00

42:32 - 43:15

I just finished this invariant story. And I'm happy that I believe that it is ultimate learning story, at least I can show that Zerano and Lazy mechanism only to mechanism. But they separate statistical part from intelligent part. And I know nothing about intelligent part. And if we will know the intelligent part, So it will help us a lot in teaching in the learning and learning.

SPEAKER_01

43:15 - 43:16

So we'll know it when we see it.

SPEAKER_00

43:17 - 44:01

So for example, in my talk, the last slide was the challenge. So you have, say, NIST digit recognition problem. And deep learning claims that they did it very well, say 99.5% of correct answers. But say you 60,000 observations. Can you do the same? You can't do it times less. But in corporate inventory, what it means, you know, digit one, two, three. Just looking at that, explain me which inventory I should keep to use 100 examples. So 100 times less examples to do the same job.

SPEAKER_01

44:03 - 44:16

Yeah, that last slide, unfortunately, you talk kind of quickly, but that last slide was a powerful open challenge and a formulation of the essence.

SPEAKER_00

44:16 - 46:31

It's not this exact problem of intelligence, because Every body, when machine learning started, we were able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to But now again, we came to the same story. Half the decrease. That is the problem of learning. It is not like in deep learning they use zealons of training. Because maybe zealons of not enough, if you have a good invariance, maybe you will never collect some number of directions. But now it is questioned to intelligence, have to do that. Because statistical parties ready, as soon as you supply us with predicate, we can do good job with small amount of observations. And the very first challenge is, we know digit recognition and you know digits. And please tell me the variance. I think it about that. I can say, for digit three, I would introduce concept of horizontal symmetry. So the digit three has horizontal symmetry, say, Moses, say, digit two or something like that. But as soon as I get the DL horizontal symmetry, I can match the vertical symmetry, diagonal symmetry, whatever, if I have a DL symmetry. But for tells, looking on digitize is it, it is meter predicate. which is not shaped. It is something like symmetry, like half-dark is whole picture, something like that, which can self-rise a predicate.

SPEAKER_01

46:31 - 47:19

You think such a predicate could rise? I don't know, something that's not general, meaning It feels like for me to be able to understand the difference between a two and a three, I would need to have had a childhood of 10 to 15 years playing with kids, going to school, being yelled by parents. all of that walking, jumping, looking at ducks, and now then I would be able to generate the right predicate for telling the difference between two and a three. Or do you think there's a more efficient

SPEAKER_00

47:20 - 49:00

I know. I know for sure, then you must know something more than digits. Yes. That's a powerful state. But maybe there are several languages of description of this elements of digits. So I'm talking about symmetry, about something. Properties of geometry. I'm talking about something abstract. I don't know that. But this is the problem of intelligence. So in one of our article, it is trivial to show that every example can carry not more than one bit of information in real. Because when you show example, and you say this is one, you can remove, say, function which does not tell you one. So it's a best strategy. If you can do it perfectly, it's a remove half of this. But when you use one predicate, which looks like a duck, you can remove much more functions than half. And that means that it contains a lot of data from formations, from formal point of view. But when you have a general picture of what you want to recognize and general picture of the world. Can you invent this predicate? And that predicate carries a lot of information.

SPEAKER_01

49:03 - 49:35

Maybe just me, but in all the math you show in your work, which is some of the most profound mathematical work in the field of learning AI and just math in general. I hear a lot of poetry in philosophy. You really kind of talk about philosophy of science. There's a poetry in music to a lot of the work you're doing and the way you're thinking about it. So where's that come from? Do you escape to poetry? Do you escape to music or not?

SPEAKER_00

49:35 - 49:39

Or not? There exists grand truce.

SPEAKER_01

49:39 - 49:40

That's grand truce.

SPEAKER_00

49:40 - 50:12

Yeah, and that gets you everywhere. The smart guy, Phil Oster, sometimes I surprise, had a deep sea. Sometimes I see that some of them are completely out of subject. But the grand troupe, I see music. Is it because of the grand troupe? I mean poetry, many poetry, they believe that

SPEAKER_01

50:16 - 50:32

So what piece of music as a piece of empirical evidence gave you a sense that they are touching something in the ground truth? It is structure. The structure fit.

SPEAKER_00

50:32 - 50:36

You have the same feeling. You have the same feeling. You have the same feeling.

SPEAKER_01

50:51 - 51:30

Yeah. And if you look back at your childhood, you grew up in Russia. You maybe were born as a researcher in Russia. You've developed as a researcher in Russia. You've came to the United States in a few places. If you look back, What was some of your happiest moments as a researcher? Some of the most profound moments, not in terms of their impact on society, but in terms of their impact on how damn good you feel that day and you remember that moment.

SPEAKER_00

51:30 - 52:07

You know, every time when you fall in something, It is great when it comes alive. Every simple thing, just like my general feeling that I must have my time was wrong. You should go again and again and again and try to be honest in front of yourself, not to my interpretation. But try to understand that it related to grant rules. It is not my blah, blah, blah interpretation and something like that.

SPEAKER_01

52:07 - 52:14

But you're allowed to get excited at the possibility of discovery. You'll have to double-check it.

SPEAKER_00

52:16 - 52:56

No, but how it related to the nasa grant rules is that just temporarily, it is full, forever. You know, you always have a feeling when you fall on something, have become that. So 20 years ago, when we discover a statistic alone in Syria, nobody believed. Except for one guy, doubly from MIT. And 720 years became fashion. And the same is supported by machines. That's a kernel machines.

SPEAKER_01

52:56 - 53:15

So with support of the machine is a learning theory. But when you were working on it, you had a sense that you had a sense of the profondity of it, how this seems to be right, this seems to be powerful.

SPEAKER_00

53:15 - 54:12

Right, absolutely, immediately, I recognize that it will last forever. And now, when I found this in Varian's story, I have a feeling that it is complicated because I have proved that there are no different mechanisms. You can have some cosmetic improvement you can do but in terms of invariance we need more sedvariants in statistical learning organization. Work together. But also, I'm happy with it. We can formulate what is intelligence with that. And to separate from technical part. That is completely different.

SPEAKER_01

54:12 - 54:16

Absolutely. Well, Vladimir, thank you so much for talking today. Thank you. That's an honor.