Transcript for Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence

SPEAKER_03

00:00 - 03:38

The following is a conversation with Vladimir Vaabnik, part two. The second time we spoke on the podcast. He's the co-inventor of support vector machines, support vector clustering, VC theory, and many foundational ideas in statistical learning. He was born in the Soviet Union, worked at the Institute of Control Sciences of Moscow, then in the US, worked at AT&T, and he see labs, Facebook AI research, and now is a Professor Columbia University. His work has been cited over 200,000 times. The first time we spoke on the podcast was just over a year ago, one of the early episodes. This time, we spoke after a lecture he gave titled Complete Statistical Theory of Learning, as part of the MIT series of lectures on deep learning and AI that I organized. I'll release the video of the lecture in the next few days. This podcast and lecture are independent from each other, so you don't need one to understand the other. The lecture is quite technical and math-heavy. So if you do watch both, I recommend listening to this podcast first. Since the podcast is probably a bit more accessible. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give it 5 stars on Apple Podcast, support it on Patreon, or simply connect with me on Twitter. At Lex Friedman, spelled FRIDMAN. As usual, I'll do one or two minutes of ads now and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by CashApp, the number one finance app in the app store. When you get it, use code Lex.cast. CashApp lets you send money to friends, buy Bitcoin, and invest in the stock market with his little is one dollar. Broker's services are provided by CashApp investing, a subsidiary of Square, and member SIPC. Since cache app allows you to send and receive money digitally, peer to peer, and security and all digital transactions very important. Let me mention that PCI data security standard, PCI DSS level one. I cache app is compliant with. I'm a big fan of standards for safety and security and PCI DSS is a good example of that, where a bunch of competitors got together and agreed that there needs to be a global standard around the security of transactions. Now, we just need to do the same for autonomous vehicles and AI systems in general. So again, if you get cash out from the App Store, Google Play and use the code Lex Podcast, you get $10 and cashable will also do an $10 to first. One of my favorite organizations that is helping to advance robotics and STEM education for young people around the world. And now, here's my conversation with Vladimir Vapnik. You and I talked about Alan Turing yesterday a little bit. And that he, as the father of Artificial Intelligence, may have instilled in our field an ethic of engineering and not science. Seeking more to build intelligence rather than to understand it. What do you think is the difference between these two paths of engineering intelligence and the science of intelligence?

SPEAKER_00

03:38 - 04:50

It's completely different story. engineering, the imitation of human activity. You have to make a device which has a human decay. You have all the functions of human. It does not matter how you do it. But understand what is intelligence about is quite different problem. So I think I believe that it's somehow related to predicate, we talk yesterday about it. Because look at the Vladimir Propz idea. He just wants 31 he predicates. call it units, which can explain humor, behavior, at least in Russian tales here, local Russian tales, and derived from that, and then people realize that it may avoid in Russian tales. It is in TV, in movie series and so on and so on.

SPEAKER_03

04:51 - 05:49

So you're talking about Vladimir Prop, who in 1928 published the book, Morphology of the Folk Tale, described being 31, predicates that have this kind of sequential structure that a lot of the stories narrative follow in Russian folklore and other content. We'll talk about it. I'd like to talk about predicates in a focused way. But let me if you allow me to stay zoomed out on our friend Alan Turing, And, you know, he inspired a generation with the imitation game. Yes. Do you think if you can linger in a little bit longer, do you think we can learn? Do you think learning to imitate intelligence can get us closer to the science to understanding intelligence? So why do you think imitation is so far from understanding?

SPEAKER_00

05:49 - 07:22

I think that it is different between you have different goals. your goal is to create something, something useful. And that is great. And you can see how much things was done and I believe that it will be done even more. It's self-driving cars and also business. It is great. And it was inspired by Turing vision. But understanding is very difficult. It's more or less philosophical category. what means and distance evolved. I believe in Him which starts from platter that's the exists vault of ideas. I believe that intelligence it is vault of ideas, but it is vault of pure ideas. And when you combines these reality things. It creates, as in my case, in variance, which is very specific. And that, I believe, the combination of ideas in way to construct and convert into intelligence. But first of all, predicate. If you know predicate and hopefully, Not too much predicate exist. For example, Sotyvan predicate for human behavior.

SPEAKER_03

07:22 - 07:49

Not a lot. Vladimir Prop used 31. You can even call him predicate 31. Predicate to describe stories narratives. Do you think human behavior how much of human behavior, how much of our world, our universe, all the things that matter in our existence can be summarized in predicates of the kind that prop was working with.

SPEAKER_00

07:49 - 08:35

I think that's behalf, a lot of form of behavior, but I think the predicates is much less, because even in this examples, which I gave you yesterday, you saw that predicate can be one predicate can construct many different invariants depending on your data. They are applying to different data and they give different invariants. But pure ideas may be not so much. I don't know about that. But my guess I hope that's why challenge about digital recognition. How much you need to

SPEAKER_03

08:36 - 08:54

I think we'll talk about computer vision and 2D images a little bit in your challenge. That's exactly That's exactly about, you know, that hopes to be exactly about the spirit of intelligence in the simplest possible way.

SPEAKER_00

08:54 - 08:59

Absolutely. You should start this simple story on the voice you're not able to do it.

SPEAKER_03

08:59 - 09:08

Well, there's an open question whether starting at the feminist digit recognition is a step towards intelligence over it's an entirely different thing.

SPEAKER_00

09:09 - 09:17

I think that to beat records using 100 to 100 times, less examples, you need intelligence.

SPEAKER_03

09:17 - 09:34

You need intelligence. So let's, because you use this term and it'll be nice. I'd like to ask simple, maybe even dumb questions. Let's start with a predicate. In terms of terms and how you think about it, what is a predicate?

SPEAKER_00

09:34 - 09:49

I don't know. I have a feeling formally as they exist. But I believe that predicate for 2D images. One of them is symmetry.

SPEAKER_03

09:49 - 10:17

Hold on a second. Sorry. Sorry to interrupt and pull you back. At the simplest level, we're not even, we're not being profound currently. A predicate is a statement of something that is true. Yes. Do you think of predicates as somehow probabilistic in nature or is this binary? This is truly constraints of logical statements about the world.

SPEAKER_00

10:17 - 10:28

In my definitions, the simplest predicate is function function and you can use this function to my inner product, that is predicate.

SPEAKER_03

10:28 - 10:30

What's the input to what's the output of the function?

SPEAKER_00

10:31 - 11:20

input is x something which is input in reality. Say, if you consider digit recognition, it pixel space. Yes, input. But it is function which in pixel space. But it can be any function for no pixel space. And you choose the, and I believe that there are several functions, which is important to understanding of images. One of the most symmetry, it's not so simple construction, as I describe this literary, the results of stuff. But another, I believe I don't know how many. is how world structureized is picture.

SPEAKER_03

11:20 - 11:21

Structurized?

SPEAKER_00

11:21 - 11:21

Yeah.

SPEAKER_03

11:21 - 11:23

What do you mean by structureized?

SPEAKER_00

11:24 - 11:38

It is formal definition say something happened heavy on the left corner, not so heavy, in the middle and so on. You describe in general concept of what you see.

SPEAKER_03

11:38 - 11:42

You concepts, some kind of universal concepts.

SPEAKER_00

11:42 - 11:45

Yeah. But I don't know how to formalize this.

SPEAKER_03

11:46 - 12:03

Do you? So this is the thing. There's a million ways we can talk about this. I'll keep bringing it up. But we humans have such concepts. When we look at digits, but it's hard to put them, just like you're saying now, it's hard to put them into words.

SPEAKER_00

12:03 - 12:38

You know, that is example. When critics in music trying to describe music, they use predicate. and not too many predicate, but in different combinations. But they have some special words for describing music and the same should be for images. But maybe Zara, critics who understand essence of what this image is about.

SPEAKER_03

12:38 - 13:13

Do you think there exists critics who can summarized the essence of images human beings. I hope so, yes, but that explicitly stayed them on paper. This the fundamental question I'm asking Do you think there exists a small set of predicates that will summarize images? It feels to our mind, like it does, that the concept of what makes it two and a three and a four?

SPEAKER_00

13:13 - 13:29

No, no, it's not on this level. It should not describe two, three, four. It describes some construction which allow you to create invariance.

SPEAKER_03

13:29 - 13:33

And invariance, sorry to stick on this, but terminology.

SPEAKER_00

13:33 - 14:37

It is property of your image. I can say looking at my image, it is more or less symmetric and I can give you value of symmetry. I say level of symmetry using this function which I gave yesterday. And you can describe that You image have these characteristics exactly in the way how musical critics describe music. So, but this is invariant applied. to specific data to specific music to something. I strongly believe in this plateau idea that there exists a world of predicate and wealth of reality, predicate and reality is somehow connected and you have to.

SPEAKER_03

14:39 - 15:12

Let's talk about Plato a little bit. So you draw a line from Plato to Hagel to Wigner to today. So Plato has forms the theory of forms. There's a world of ideas, a world of things, as you talk about, and there's a connection. Presumably the world of ideas is very small. And the world of things is arbitrarily big. But they're all what Plato calls them. It's a shadow. It's a shadow from the world of the whole.

SPEAKER_00

15:12 - 15:31

Yeah, you have projection. Oh, well, it was a deal. Yeah, right. In reality, you can realize this projection using the design variance because it is projection for on specific examples, which creates specific features of specific objects.

SPEAKER_03

15:35 - 15:44

So the essence of intelligence is while only being able to observe the world, the things try to come up with the world of ideas.

SPEAKER_00

15:44 - 15:51

Exactly. Like in this music story, intelligent musical critics knows this all this world and they're really feeling about what they're like.

SPEAKER_03

15:51 - 16:07

I feel like that's a contradiction, intelligent music critics, but I think music is to be enjoyed in all its forms. The notion of critic like a food critic.

SPEAKER_00

16:07 - 16:08

No, I don't want that too much.

SPEAKER_03

16:09 - 16:35

That's an interesting question. There's a certain elements of the human psychology of the human experience, which seem to almost contradict intelligence and reason. Like emotion, like fear, like love. All of those things are those not connected in a way to the space of ideas. I just want

SPEAKER_00

16:39 - 16:45

to be concentrated on very simple story on digital recognition.

SPEAKER_03

16:45 - 16:49

So you don't think you have to love and fear death in order to recognize digits?

SPEAKER_00

16:49 - 17:46

I don't know, because it's so complicated. It is involved a lot of stuff which I never considered. But I know about digital recognition. And I know that for digital recognition, to get a record from small number of observations, you need to predicate, but not special predicate for this problem, but universal predicate, which understand the world of images. But on the first step, they understand, say, the world of hundred and digits, or characters, or something simple. No, that's what I think one of the predicated related to symmetry.

SPEAKER_03

17:46 - 18:01

The level of symmetry. Okay, degree of symmetry. So you think symmetry at the bottom is a universal notion and there's the degrees of a single kind of symmetry or as there are many kinds of symmetries.

SPEAKER_00

18:01 - 18:32

Many kinds of symmetries. There is a symmetry, anti-symmetry, say letter S. So it has vertical anti-symmetry and it could be diagonal symmetry vertical symmetry. So when you cut vertically the letter S, then the upper part and low part in different directions.

SPEAKER_03

18:33 - 18:39

It's a long way access. But that's just like one example of symmetry, isn't it?

SPEAKER_00

18:39 - 19:21

Right, but there is a degree of symmetry. If you play all the slid-related stuff to do tangential distance, whatever I describe, you can have a degree of symmetry. And that is the describing reason of image. It is the same as you will describe this image. Same about digitists, it has anti-semitry, digital, symmetric, more or less look for symmetry.

SPEAKER_03

19:21 - 19:43

Do you think such concepts like symmetry, predicates like symmetry, is it a hierarchical set of concepts or are these independent distinct predicates that we want to discover, as some set of... No, there is a deal of symmetry.

SPEAKER_00

19:43 - 20:18

And you can do this idea of symmetry, make very general, like the degree of symmetry. The degree of symmetry can be zero, no symmetry at all, or the degree of symmetry, say, more or less symmetrical. But you have one of these descriptions, and symmetry can be different. As I told, horizontal, vertical, diagonal, and anti-semitrys, also concept of symmetry.

SPEAKER_03

20:18 - 20:25

What about shape in general? I mean, symmetry is a fascinating notion, but... No, I'm talking about digit.

SPEAKER_00

20:25 - 20:31

I would like to concentrate on all I would like to know predicate for digit recognition.

SPEAKER_03

20:31 - 20:36

Yes, but symmetry is not enough for digit recognition, right?

SPEAKER_00

20:36 - 21:22

It is not necessarily for digit recognition. It helps to create invariant which will be used when you have examples for digit recognition. You have regular problem of digit recognition. You have examples of the second class. Plus, you know that there exists a concept of symmetry. And you apply when you're looking for a decision rule, you'll apply a concept of symmetry of this level of symmetry which you estimate from me. So let's talk. Everything is continuity convergence.

SPEAKER_03

21:23 - 21:33

What is convergence? What is weak convergence? What is strong convergence? I'm sorry, I'm going to do this to you. What are we converging from and to?

SPEAKER_00

21:33 - 21:47

You converging, you would like to have a function. The function which, say, indicator function which indicate your digit 5, for example.

SPEAKER_03

21:47 - 21:48

A classification task?

SPEAKER_00

21:48 - 21:50

Let's talk only about classification.

SPEAKER_03

21:50 - 21:59

So classification means you will say whether this is a 5 or not or say which of the 10 digits it is.

SPEAKER_00

21:59 - 23:41

I would like to have these functions. Then I have some examples. I can consider property of this examples, say, symmetry. And I can measure level of symmetry for every digit. And then I can take average and I from my training data and I will consider only functions of conditional probability which I am looking for my decision rule which applying to digits will give me the same average as I absorb on training date. So actually this is different level of description of what you want. You want not just your show, not one digit. You show this predicate show general property of all digits which you have in mind. If you have in mind digits 3, it gives you property of digits 3 and you select as admissible set of function only function which keeps this property. You will not consider other functions. So you immediately looking for smaller subset of function.

SPEAKER_03

23:41 - 23:48

That's what mean by admissible functions. You are admissible function, exactly. Which is still a pretty large number three.

SPEAKER_00

23:48 - 24:39

It's a large number three large, but if you have one predicate, but according to the, there is a strong and weak convergence. Strong convergence is convergent and function. you're looking for the function on one function and you're looking for another function and square difference from them should be small. If you take difference in any points, make a square, make an integral and it should be small. That is convergence in function. Suppose you have some function, any function. So I would say, I say, some function converges to this function. If integral from square difference between them is small.

SPEAKER_03

24:39 - 24:45

That's the definition of strong convergence. That definition of strong convergence. Two functions, integral of the difference.

SPEAKER_00

24:45 - 27:25

Yes. It is convergence in functions. But you have different. convergence in functionals. You take any function, you take some function phi, and take inner product, this function, this f function, f0 function, which you want to find. And that gives you some value. So you say that set of functions, converge in inner product to this function, if this value of inner product converge to value F0. That is for 1, phi, but we converge as liquid, it converge for any function of Hilbert's place. If it converge for any function of Hilbert's place, then you would say that this is a weak convergence. You can think that when you take integral, that is the integral property of function. For example, if you will take sine or cosine, it is coefficient of say Fourier expansion. So if it converges for all coefficients of free expansion, so under some condition it converges to function, you're looking for. But if convergence means any property, convergence not point wise, but integral property of function. So the convergence means integral property of functions. When I talking about predicate, I would like to formulate which integral properties I would like to have for convergence. And if I will take one predicate function which I measure property. If I will use one predicate and say, I will consider only function. which give me the same value as with this predicate. I selecting set of functions from functions which is admissible in the sense that function which I looking for in this set of functions. because I checking in training data, it gives the same.

SPEAKER_03

27:25 - 27:29

Yes, it always has to be connected to the training data in terms of.

SPEAKER_00

27:29 - 27:38

Yeah, but, but property, you can know independent on training data. And this guy prop.

SPEAKER_02

27:38 - 27:39

Yeah.

SPEAKER_00

27:39 - 27:59

So the risk formal property. 31 property. You very tell Russian fairytale. But Russian fairytale is not so interesting. more interesting than people applied to movies, to theater, to different things and the same works, the universal.

SPEAKER_03

27:59 - 28:11

Well, so I would argue that there's a little bit of a difference between the kinds of things that were applied to which are essentially stories and digit recognition.

SPEAKER_00

28:11 - 28:13

It is the same story.

SPEAKER_03

28:13 - 28:16

You're saying digits, there's a story within the digit.

SPEAKER_00

28:16 - 28:50

Yeah. But my point is why I hope that it possible to beat record using not 60,000, but say 100 times less, because instead you will give predicates. And you will select your decision not from why it's set of functions, but from set of functions which keeps us predicates. What's predicate is not related just to digit recognition. Right.

SPEAKER_01

28:50 - 28:54

So like in Plata's case.

SPEAKER_03

28:54 - 29:23

Do you think it's possible to automatically discover the predicates? So you basically said that the essence of intelligence is the discovery of good predicates. Yeah. Now the natural question is, You know, that's what Einstein was good at doing in physics. Can we make machines do these kinds of discovery of good predicates or is this ultimately a human endeavor?

SPEAKER_00

29:23 - 30:08

Because According to theory about weak convergence, any function from Hilbert space can be predicate. So you have infinite number of predicate and before you don't know which predicate is good and which. But whatever prop show and what people call it breaks through, that there is not too many predicate which cover most of the situation happened in the world.

SPEAKER_03

30:08 - 30:18

So there's a sea of predicates and most of the only small amount are useful for the kinds of things that happen in the world.

SPEAKER_00

30:18 - 30:26

I think that I would say only small part of predicate very useful, useful all of them.

SPEAKER_03

30:28 - 30:44

Only very few are what we should, let's call them good predicates. Very good predicates. Very good predicates. So, can we linger on it? What's your intuition? Why is it hard for a machine to discover good predicates?

SPEAKER_00

30:44 - 30:52

Even in my talk, describe how to do predicates. How to find new predicates. I'm not sure that it is very good.

SPEAKER_03

30:52 - 30:53

What did you propose in your talk?

SPEAKER_00

30:53 - 31:47

Well, in my talk, I gave example for diabetes. One, when we achieve some percent, so then we're looking for an area where some sort of predicate, which I formulate, does not, tips invariant. So if it doesn't keep, I retrain my data. I select only function which keeps this invariant. And when I did it, I improved my performance. I can look for this predicate. I know technically how to do that. And you can, of course, do it using machine, but I'm not sure that we will construct the smartest predicate.

SPEAKER_03

31:48 - 32:11

But this is the, allow me to linger on it because that's the essence, that's the challenge that is artificial. That's, that's the human level intelligence that we seek is the discovery of these good predicates. You've talked about deep learning as a way to the predicates they use and the functions are mediocre. We can find better ones.

SPEAKER_00

32:12 - 32:31

Let's talk about deep learning. Sure. I know only Jan Slikun, convolutional network. And what else? I don't know any very simple convolution. There's not much else to do. I can do it like that. This one predicate.

SPEAKER_03

32:31 - 32:33

It is. Convolution is a single predicate.

SPEAKER_00

32:33 - 32:48

It's single predicate. Yes, it is exactly, you know exactly, you take the derivative for translation and predicate this should be kept.

SPEAKER_03

32:48 - 32:52

So that's a single predicate, but humans discovered that one or least.

SPEAKER_00

32:52 - 33:19

Note it, that is a risk, not too many predicate this. And that is big story because Jan did 25 years ago in nothing. So clear was added to deep network. And then I don't understand why we should talk about deep network instead of talking about piece-wise linear functions which keeps us predicate.

SPEAKER_03

33:24 - 34:40

that maybe the amount of predicates necessary to solve general intelligence, say in space of images, doing efficient recognition of hand-written digits is very small. And so we shouldn't be so obsessed about finding We'll find other good predicates like convolution, for example. There has been other advancements. If you look at the work with attention, there's a tensional mechanisms, especially used in natural language, focusing the network's ability to learn at which part of the input to look at. The thing is there's other things besides predicates that are important for the actual engineering mechanism of showing how much you can really do given such these predicates. I mean, that's essentially the work of deep learning is constructing architectures that are able to be given the training data to be able to converge towards a function that can approximate can can generalize well.

SPEAKER_00

34:40 - 35:02

It's an engineering problem. I understand, but let's talk not on the motion level, but on the mathematical level. You have set of piecewise linear functions. It is all possible neural networks. It's just a piece of what's linear functions. This is many, many pieces.

SPEAKER_03

35:02 - 35:04

Large, large number of pieces.

SPEAKER_00

35:04 - 35:17

Exactly, but very large. Very large. It's still simpler than, say, convolutional, then, reproduce and turn out Hilbert space, which have a Hilbert set of functions.

SPEAKER_01

35:17 - 35:20

What's Hilbert space?

SPEAKER_00

35:20 - 35:48

It's space with infinite number of coordinates, a function for expansion, something like that. So it's much richer. So when I talking about closed form solution, I talk about this set of function, not piece by linear set, which is particular case. It's a small part of the space you're talking about. A small set of functions.

SPEAKER_03

35:48 - 35:48

Let me take it.

SPEAKER_00

35:57 - 37:19

But it is fine. It is fine. I don't want to discard the smaller beak if you take advantage. So you have some set of functions. So now, when you're trying to create architecture, you would like to create admissible set of functions, which all your tricks to use not all functions. But some subset of this set of functions. Say, when you're introducing convolutional net, It is way to make this subset, useful for you. But from my point of view, on the evolution, it is something you want to keep some invariants. It's a translation invariant. But now, if you understand this, and you cannot explain on the level of ideas what Neon Network does, He should agree that it is much better to have a set of functions. And they say, this set of functions should be admissible. It must keep the invariant and that invariant. You know that as soon as you incorporate new invariants, set of functions because small and small and small.

SPEAKER_03

37:19 - 37:22

But all the invariants are specified by you, the human.

SPEAKER_00

37:23 - 38:30

Yeah, but what I am hope that the resistant that predicate, like prop show, that what I want to find for digital recognition, if we start, it is completely new area, what is intelligence about, on the level starting from flat as a year, what is the world of ideas. So, and I believe that it is not too many. But you know, it is amusing that mathematician doing something in neural network, in general function, but people from literature, from art, they use this all the time. It is great how people describe music, we should learn from that. And something on this level, but so why Vladimir Propu was just theoretical, who studied theoretical literature, he found that.

SPEAKER_03

38:30 - 39:01

You know, let me throw that right back at you because there's a little bit of a less mathematical and more emotional philosophical Vladimir Prop. I mean, he wasn't doing math. And you just said, another emotional statement, which is, you believe that this played a world of ideas is small. I hope. I hope. Do you, what's your intuition though, if you can linger on it?

SPEAKER_00

39:01 - 39:44

Yeah, you know, just small or big. I know exactly. Then when I'm introducing some predicate, I decrease set of functions. But my goal to decrease set of function match. By the match is passed. By the match is possible. Good predict which does this. Then I should choose next predict which does this decrease set. As much as possible. So set of good predict. It is such that the decrease amount of admissible functions.

SPEAKER_03

39:44 - 39:51

So if each good product, significantly reduces the set of admissible functions that are naturally should not be that many.

SPEAKER_00

39:51 - 40:08

Yeah. But if you reduce very well the VC dimension of the function of admissible set of function is small and you need not too much training data to do well.

SPEAKER_03

40:10 - 40:14

And VC dimension, by the way, is a measure of capacity of this set of function.

SPEAKER_00

40:14 - 40:55

Right. How roughly speaking, how many functions in this set? So you're decreasing, decreasing, and it might easier for you to find function you're looking for. So the most important part to create good admissible set of functions. And it probably there are many ways, but the good predicates, such that they can do that. So that for this duck, you should know a little bit about duck because... What are the three fundamental laws of ducks? looks like a duck, seems like a duck, and quack, like a duck.

SPEAKER_03

40:55 - 40:58

You should know something about ducks to be of course.

SPEAKER_00

40:58 - 41:27

Not necessarily. Looks like the horse. It's also good. It's not. It generalizes from a duck. And talk like, like, it makes sound like horse. Yeah, something. And run like horse and move like horse. It is general. It is general predicate. That's the applied to duck. But for doc, you can say play chess like doc.

SPEAKER_03

41:27 - 41:38

You can not say play chess. Why not? So you're saying you can, but it would not be a good. No, you're, you're, you're really not reduce a lot of fun. You will not do, you're, yeah, you would not reduce the set of functions.

SPEAKER_00

41:38 - 41:56

So you can, the story is formal story, with the magical story, is that you can use any function you won't let the predict it. But some of them are good, some of them are not because some of them reduce a lot of functions. So admissible set of some of them.

SPEAKER_03

41:56 - 42:09

But the question is, I'll probably keep asking this question. But how do we find, what's your intuition? What's your intuition? And handwritten recognition. How do we find the answer to your challenge?

SPEAKER_00

42:09 - 42:41

Yeah, I understand, it's like that. I understand, what do you find? What do you mean, I knew predicate? Like guy who understand music can say this is worth which he described when he listening to music. He understand music. He just not too many different, or you can do like prop. You can make collection. What he talking about music about this is about that. It's not too many different situations he described.

SPEAKER_03

42:42 - 43:37

Because we mentioned Vladimir proper bunch. Let me just mention there's a sequence of 31 structural notions that are common in stories. And I think you call to units units. And I think they resonate. It starts just to give an example of ascension. A member of the Heroes community, a family, leaves the security of the home environment. Then it goes to the introduction of forbidding edict or command is passed upon the hero. Don't go there, don't do this. The hero is worn against some action. Then, step three, violation. of introduction, you know, break the rules, break out in your own, then reconnaissance, the villain makes an effort to attain knowledge, needing to fulfill their plan. So on, it goes on like this, ends in a wedding. Number 31. Happily ever after.

SPEAKER_00

43:37 - 43:57

No, he just gave description of all suitations. He understands this vault of false tales. Yeah, not for not false stories. And this story is not in just four tales. The story is in the detective series as well.

SPEAKER_03

43:57 - 44:00

And probably in our lives, we probably live.

SPEAKER_00

44:00 - 44:15

At the end of the roads, this predicate is good for different situations. from movie, for movie, for theater.

SPEAKER_03

44:15 - 44:27

By the way, there's also criticism, right? There's an other way to interpret narratives from Claude Levy's straws.

SPEAKER_00

44:27 - 44:29

I don't know. I'm not in this business.

SPEAKER_03

44:29 - 44:32

No, no, it's theoretical literature, but it's looking at paradise.

SPEAKER_00

44:32 - 44:48

It's always, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, It's not too many units that can describe, but they try to lead us to the units.

SPEAKER_03

44:48 - 44:51

Exactly, another service itself.

SPEAKER_00

44:51 - 44:57

We need to set up predictions. It's not possible. But they exist, probably.

SPEAKER_03

44:58 - 45:16

My question is whether given those units, whether without our human brains to interpret these units, they would still hold as much power as they have, meaning are those units enough when we give them to an alien species?

SPEAKER_00

45:16 - 45:34

Let me ask you, do you understand? Did you dig it? No, I don't understand. No, no, no. When you can recognize this digital team, you just say, you understand. You understand. Caracas, you understand.

SPEAKER_03

45:34 - 45:46

No, no, no, no. It's the imitation versus understanding question because I don't understand the mechanism by which I am.

SPEAKER_00

45:46 - 46:00

I'm not talking about, I'm talking about predicate. You understand that it involves symmetry, maybe structure, maybe something. I cannot form a way, I just was able to find symmetries.

SPEAKER_03

46:00 - 46:43

That's really good, so this is a good line. I feel like I understand the basic elements of what makes a good hand recognition system my own. Like symmetry connects with me. It seems like that's a very powerful predicate. My question is, is there a lot more going on that we're not able to introspect? Maybe I need to be able to understand a huge amount in the world of ideas. thousands of predicates, millions of predicates in order to do hand recognition. I don't think so. So you're, you're both your hope and your intuition.

SPEAKER_00

46:43 - 47:13

No, I understand. Very frankly. You're using digits, you're using examples as well. So what it says is that if you will use all possible functions. From Hilderspace, all possible predicates. You don't need training data. You just will have admissible set of functions which contain one function.

SPEAKER_03

47:13 - 47:26

Yes. So the trade-off is when you're not using all predicates, you're only using a few good predicates, you need to have some training data. Yes, exactly. The more good predicates you have, the less training data you're seeing.

SPEAKER_00

47:26 - 47:29

Exactly. That is intelligent learning.

SPEAKER_03

47:30 - 47:54

Still, okay. I'm gonna keep asking the same down question, Henry and recognition to solve the challenge. You kind of propose a challenge that says we should be able to get state-of-the-art, emnist, error rates by using very few 60, maybe fewer examples predicted. What kind of predicates do you think you'll... Is the challenge?

SPEAKER_00

47:54 - 47:58

So, people who will solve this problem, they will answer. They will answer them.

SPEAKER_03

47:58 - 48:03

Do you think they'll be able to answer it in a human-explainable way?

SPEAKER_00

48:03 - 48:06

Is it just neutralite function?

SPEAKER_03

48:06 - 48:21

That's it. But, so can that function be written, I guess, by an automated reasoning system? Whether we're talking about a neural network learning a particular function or another mechanism?

SPEAKER_00

48:22 - 48:34

No, I'm not against neural network. I'm against admissible set of function which create neural network. You don't do it by invariance by predicate by by reason.

SPEAKER_03

48:41 - 49:25

But neural networks can then reverse to the reverse step of helping you find a function. The task of a neural network is to find a disentangle representation, for example. To find that one predicate function that's really captures some kind of essence. One, not the entire essence, but one very useful essence of this particular visual space. Do you think that's possible? Listen, I'm grasping, hoping there's an automated way to find good predicates, right? So the question is, what are the mechanisms of finding good predicates ideas that you think we should pursue?

SPEAKER_00

49:25 - 49:54

Are you on grad school listening? I give example. So find situation where predicate, which you're suggesting. Don't create invariant. It's like in physics, find situation where existing theory can not explain it.

SPEAKER_03

49:54 - 49:59

Find situation where the existing theory can not explain this. So you find in contradictions.

SPEAKER_00

49:59 - 50:14

Find contradiction. And then remove this contradiction. But in my case, what means contradiction? Do you find function? which, if you will use this function, you're not keeping converts.

SPEAKER_03

50:14 - 50:18

This is really the process of discovering contradictions.

SPEAKER_00

50:18 - 50:54

Yeah. It is like in physics, find situation where you have contradiction for one of the property, for one of the predicate. Then include this predicate. making him variance. And so again, this problem now you don't have contradiction. But it is not the best way probably I don't know to looking for predicate. That's just one way. Okay. That, no, no, it is brute force way. The brute force way.

SPEAKER_03

50:54 - 51:22

What about the ideas of some, what, uh, big umbrella term of symbolic AI? There's what in 80s with expert systems, sort of logic reasoning based systems. Is there hope there to find some sort of deductive reasoning to find good predicates?

SPEAKER_00

51:22 - 51:28

I don't think so. I think of just logic is not enough.

SPEAKER_03

51:29 - 51:42

It's kind of a compelling notion though, you know, that when smart people sit in a room and reason through things, it seems compelling and making our machines do the same is also compelling.

SPEAKER_00

51:42 - 52:44

So everything is very simple. When you have infinite number of predicate, you can choose the function you want. You have invariance and you can use the function you want. But you have to have not too many invariance to solve the problem. So, and have from infinite number of functions to select finite number. and hopefully small phone number of functions, which is good enough to extract small set of admissible functions. So they will be admissible, it's for sure because every function just decrease set of function and leaving it admissible. But it will be small.

SPEAKER_03

52:44 - 53:14

But why do you think logic? based systems don't can't help intuition not because you should know reality you should know life this guy like prop you know something and he tried to put in invariant his understanding that's the human yeah if you see you're putting too much value into Vladimir prop knowing something

SPEAKER_00

53:15 - 53:21

No, it is my decision. What means you know life?

SPEAKER_03

53:21 - 53:24

What do you mean, you know common sense?

SPEAKER_00

53:24 - 53:30

No, no, you know something common sense, it is some rules.

SPEAKER_02

53:30 - 53:31

You think so?

SPEAKER_03

53:31 - 53:54

Common sense is simply rules. Common sense is, every it's mortality. It's no, it's fear of death. It's love, it's spirituality. It's happiness and sadness. All of it is tied up into understanding gravity, which is what we think of as common sense.

SPEAKER_00

53:54 - 54:02

I don't think it's so bright. I want to discuss, understand, dig it, dig it, dig it, dig it, dig it, dig it.

SPEAKER_02

54:02 - 54:08

You need time to bring up love and death. You bring it back to dig your recognition.

SPEAKER_00

54:08 - 54:21

No, you know, it was doable because there is a challenge. Yeah, I see how to solve it. If I have a student, concentrating on this work, I will suggest something to solve.

SPEAKER_03

54:21 - 54:27

You mean handwritten recognition? Yeah, it's a beautifully simple elegant and yet.

SPEAKER_00

54:27 - 54:47

I think that I know invariance which will solve this. You do. I think so. But it is not It is maybe, I want some universal invariance which are good, not only for digital condition, for image understanding.

SPEAKER_03

54:47 - 55:16

So let me ask, how hard do you think is to the image understanding? So if we can kind of intuit handwritten recognition, How big of a step leap journey is it from that? If I give you good, if I solve your challenge for handwriting recognition, how long would my journey then be from that to understanding more general natural images?

SPEAKER_00

55:16 - 55:57

Immediately, you will understand this. As soon as you will make a record, because things it is not for free, as soon as you will create several invariants which will help you to get the same performance that the best neural net did using hundred and maybe more than hundred times less examples you have to have something smart to do that and you're saying that that is invariant it is predicate because you should put somebody you have to do that but okay

SPEAKER_03

55:58 - 56:38

Let me just pause, maybe it's a trivial point, maybe not. But handwritten recognition feels like a 2D to dimensional problem. And it seems like how much complicated is the fact that most images are projection of a 3 dimensional world onto 2D plane. It feels like for a 3 dimensional world, we need to start understanding common sense in order to understand an image. It's no longer visual, shape and symmetry. It's having to start to understand concepts of understand life.

SPEAKER_00

56:38 - 57:14

And potentially much larger number. You know, maybe, but let's start from simple. Well, yeah, but you said that if you don't understand, I cannot think about things which I don't understand. This is I understand, but I'm sure that I don't understand everything there. That's a difference. I do as simple as possible, but not simpler. And that is exact case. With handwritten.

SPEAKER_03

57:16 - 58:06

Yeah, but that's the difference between you and I. I, uh, I, uh, I welcome and enjoy thinking about things that completely don't understand. Used to me, it's a natural extension without having solved handwritten recognition to wonder how, um, how difficult is the the the next step of understanding to D3D images because ultimately while the science of intelligence is fascinating it's also fascinating to see how that maps to the engineering of intelligence and recognizing handwritten digits is not doesn't help you it might it may not help you with the problem of general intelligence. I would like to make a remark. I start not from very primitive problem.

SPEAKER_00

58:06 - 58:30

I start with very general problem. This plateau. So you understand and it comes from Plata to digital cognition.

SPEAKER_03

58:30 - 58:44

So you basically took Plato and the world of forms and ideas and mapped and projected into the clearest simplest formulation of that big world.

SPEAKER_00

58:44 - 59:02

I would say that I did not understand Plata. until recently, and until I consider the convergence and the predicate and the know, this is what Latter told.

SPEAKER_03

59:02 - 59:09

So you linger on that, like, why, how do you think about this world of ideas and world of things in Plato?

SPEAKER_00

59:09 - 59:13

No, it is metaphor. It is metaphor for sure.

SPEAKER_03

59:13 - 59:16

It's a poetic and a beautiful, but what can you

SPEAKER_00

59:17 - 59:46

But it is the way you should try to understand how attack a day is in the Lord. So from my point of view, it is very clear, but it is lying, all the time people looking for that. Say, Plato, Zen, Hegel, whatever reasonable exists, whatever existed as reasonable. I don't know what he had in mind reasonable.

SPEAKER_03

59:47 - 59:48

Right, there's philosophers again.

SPEAKER_00

59:48 - 01:00:08

No, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no

SPEAKER_03

01:00:11 - 01:00:20

there's abstractions, ideas that represent our world. And we should always try to reach into that.

SPEAKER_00

01:00:20 - 01:00:34

But what you should make a projection on reality. But understanding is it is abstract ideas. You have in your mind several abstract ideas, which you can apply to reality.

SPEAKER_03

01:00:34 - 01:01:33

In reality, in this case, sort of you look at machine learning as data. Let me put this on you because I'm an emotional creature. I'm not a mathematical creature like you. I find compelling the idea, forget the space, the sea of functions. There's also a sea of data in the world. And I find compelling that there might be, like you said, teacher, small examples of data that are most useful for discovering good, whether it's predicates or good functions, that the selection of data may be a powerful journey, a useful mechanism. You know, coming up with a mechanism for selecting good data might be useful too. Do you find this idea of finding the right data set? Interesting at all? Or do you kind of take the data set as a given?

SPEAKER_00

01:01:34 - 01:02:48

I think that it is, you know, my scheme is very simple. You have huge set of functions. If you will apply and you have not too many data. If you pick up function which describes this data, you will not worry well. I will randomly pick it up. Yeah, it will be overfitting. So you should decrease set of function from which you're picking up one. So you should go somehow to admissible set of functions. And this is what about weak conversions. But from another point of view, to make admissible set of function. You need just to do it. You just function which you will take in inner product which you will measure property of your function. And that is how it works.

SPEAKER_03

01:02:48 - 01:02:51

No, I get, I get an understanding of it. Do you, the reality?

SPEAKER_00

01:02:51 - 01:03:16

But let's, let's think about examples. You have huge set of functions and you have several examples. If you just try to keep, put a function which satisfies this examples, you still view over fit, you need decrease, you need admissible set of functions.

SPEAKER_03

01:03:16 - 01:03:36

Absolutely. But what say you have more data than functions? So, sort of consider though, I mean, maybe not more data than functions, because that's impossible. But I was trying to be poetic for a second. I mean, you have a huge amount of data, a huge amount of examples.

SPEAKER_00

01:03:36 - 01:03:40

But most of the function can be even bigger. I understand.

SPEAKER_03

01:03:42 - 01:04:02

There's always, there's always a bigger profile. Full healer space. Oh, I got you. But, okay. But you don't, you don't find the world of data to be an interesting optimization space. Like the optimization should be in the space of functions.

SPEAKER_00

01:04:02 - 01:04:24

Creating admissible set of font municipal set of functions. No, you know, even from the classical history. from structure risk minimization. You should organize function in the way that they will be useful for you.

SPEAKER_03

01:04:24 - 01:04:44

And that is the way you're thinking about useful is you give in a small, small, small set of function which contain function by looking Yeah, but as looking for based on the empirical set of small examples.

SPEAKER_00

01:04:44 - 01:05:15

Yeah, but that is another story. I don't touch it because I believe that this small example is not too small. So you 60 per class. Low, flat, numbers, works. I don't need uniform law. The story is that in statistics, there are two laws. Love large numbers have uniform law of large numbers. So I want to be in situation where I use law of large numbers but not uniform law of large numbers.

SPEAKER_03

01:05:15 - 01:05:18

So 60 is law of large numbers. So 60 is law of large numbers.

SPEAKER_00

01:05:18 - 01:05:47

I hope no, it still needs some evolutions, some bonds. But the deal is the following. If you trust that say, this average gives you something close to expectation. So you can talk about that about this predicate. And that is basis of human intelligence.

SPEAKER_03

01:05:47 - 01:05:52

Right. Good predicates is the discovery of good predicates is the basis of human intelligence.

SPEAKER_00

01:05:52 - 01:06:44

It is discovery of your understanding world, of your methodology, of understanding world. Because you have several functions which you will apply to reality. You have several functions, but they abstract. Then you will apply them to reality to your data and you will create in this way predicate, which is useful for your task. But predicate Not related specifically to your task, to this set of tasks. It is abstract functions, which being a line, applied to... When you task, you might be interested. It might be minute tasks, I don't know.

SPEAKER_03

01:06:44 - 01:06:48

On different tasks. Well, there should be many tasks, right?

SPEAKER_00

01:06:48 - 01:06:57

Yeah, there is like light in prop case. It was for fairy tales, but it's okay to interpret it.

SPEAKER_03

01:06:57 - 01:07:04

Okay, so we talked about images a little bit. Can we talk about known child ski for a second? I don't know him.

SPEAKER_00

01:07:04 - 01:07:15

I don't know him. Not torsional, I don't know. As ideas, these ideas.

SPEAKER_03

01:07:15 - 01:07:31

So let me just say, do you think language, human language, is essential to expressing ideas, is non-chomsky, so language is at the core of our formation of predicates. the human language.

SPEAKER_00

01:07:31 - 01:07:53

For me, language and all the story of language is very complicated. I don't understand this. And I'm not, I thought about nobody. I'm not ready to work on that because it's so huge. It is not for me and I believe not for our century.

SPEAKER_03

01:07:53 - 01:07:54

It's that 21st century.

SPEAKER_00

01:07:54 - 01:08:02

Not for 21st century. So, we should learn something a lot of stuff. from simple task like digital recognition.

SPEAKER_03

01:08:02 - 01:08:47

So you think digital recognition to the image? How would you more abstractly define digital recognition? It's to the image symbol recognition, essentially. I'm trying to get a sense sort of thinking about it now, having worked with MNIST forever. How small of a subset is this of the general vision recognition problem and the general intelligence problem? Is it... Yeah. Is it a giant subset? Is it not? And how far away is language?

SPEAKER_00

01:08:47 - 01:09:14

You know, let me refer to each team. take the simplest problem as simple as possible, but not simpler. And this is a challenge, it's simple problem. But it's simple, by a day, but not simple, to get it. When you do this, you'll find some predictions, which helps it to do it.

SPEAKER_03

01:09:14 - 01:09:24

Well, yeah, I mean, what I understand, you can You look at general relativity, but that doesn't help you with quantum mechanics.

SPEAKER_00

01:09:24 - 01:09:28

Who's left? And I was a story, you don't have any universal instrument.

SPEAKER_03

01:09:28 - 01:10:06

Yes, so I'm trying to wonder if which space we're in, whether the hand-written recognition is like general relativity. And then language is like quantum mechanics that you're still going to have to do a lot of mess to universalize it. I'm trying to see what's your intuition why handwritten recognition is easier than language. I think a lot of people would agree with that, but if you could lose today sort of the intuition of why.

SPEAKER_00

01:10:06 - 01:10:43

I don't know. No, I don't think in this direction. I just think in the direction that this is a problem which if you will, so it will create some abstract understanding of images. Maybe not all images. I would like to talk to guys who do encounter real images in Columbia University.

SPEAKER_03

01:10:43 - 01:10:46

What kind of images unreal? Real images. Real images.

SPEAKER_00

01:10:46 - 01:11:13

Yeah. What's their idea? Is there a predicate? What can be predicate? I still symmetry will play role in real life images in any real life images to images. Let's talk about to images. Because That's what we know, and narrow network was created for today images.

SPEAKER_03

01:11:13 - 01:11:34

So the people I know in vision science, for example, the people who study human vision, that they usually go to the world of symbols and like handwritten recognition, but not really, it's other kinds of symbols to study our visual perception system. As far as I know, not much predicate type of thinking is understood about our vision system.

SPEAKER_01

01:11:34 - 01:11:36

They do not think in this direction.

SPEAKER_03

01:11:36 - 01:11:40

They don't, yeah, they, but how do you even begin to think in that direction?

SPEAKER_00

01:11:40 - 01:11:56

That's a, I'm like to discuss with them. Yeah. Because if we will be able to show that it is, what's working. And so it's because him, it's not so bad.

SPEAKER_03

01:11:57 - 01:12:14

So the, the unfortunate. So if we compare to language, language has like letters, finite set of letters and a finite set of ways you can put together those letters. So it feels more amenable to kind of analysis with natural images. There is so many pixels.

SPEAKER_00

01:12:14 - 01:12:36

No, no, no letter. Langvich is much, much, much more complicated. It's a world, a lot of different stuff. It's not just understanding of very simple class of tasks. I would like to see list of tasks, where Langvich involved.

SPEAKER_03

01:12:37 - 01:13:00

Yes, so there's a, there's a lot of nice benchmarks now on in natural English processing from the very trivial like understanding the elements of a sentence to question answering to more much more complicated where you talk about open domain dialogue. The natural question is with handwritten recognition is really the first step of understanding visual information.

SPEAKER_00

01:13:03 - 01:13:13

but even our records shows that we go in wrong direction because we need 60,000 digits.

SPEAKER_03

01:13:13 - 01:13:20

So even this first step, so forget about talking about the full journey, this first step should be taking in the right direction.

SPEAKER_00

01:13:20 - 01:13:24

No, no, in wrong direction because 60,000, this unacceptable.

SPEAKER_03

01:13:24 - 01:13:28

No, I'm saying it should be taken in the right direction, the 60,000 is not acceptable.

SPEAKER_00

01:13:30 - 01:13:35

If you can talk it's great, we have half percent of it.

SPEAKER_03

01:13:35 - 01:13:48

And hopefully the step from doing hand recognition using very few examples, the step towards what babies do when they crawl and understand their first environment. I know you don't know what babies.

SPEAKER_00

01:13:48 - 01:14:11

If you will do from very small examples, you will find principles that we should do from what we're using now. And so basically it's more or less clear. That means that you will use the converges, not just strong converges.

SPEAKER_03

01:14:11 - 01:14:18

Do you think these principles will naturally be human interpretable?

SPEAKER_00

01:14:18 - 01:14:19

Oh, yeah.

SPEAKER_03

01:14:19 - 01:14:30

So like when we will be able to explain them and have a nice presentation to show what those principles are, or are they going to be very kind of abstract kinds of functions?

SPEAKER_00

01:14:31 - 01:14:39

For example, I talk here today about symmetry. Yes. And it gave a very simple example. So the same will be.

SPEAKER_03

01:14:39 - 01:14:41

You gave like a predicate of a basic for.

SPEAKER_00

01:14:41 - 01:14:42

For symmetry.

SPEAKER_03

01:14:42 - 01:14:45

Yes, for different symmetries and you have.

SPEAKER_00

01:14:45 - 01:14:55

For degree of symmetry. Exactly. This is important. Not just symmetry. Existent doesn't exist. The degree of symmetry.

SPEAKER_03

01:14:55 - 01:14:57

Yeah, for handwritten recognition.

SPEAKER_00

01:14:58 - 01:15:04

No, it's not for Henry, it's for him, he images. But I would like to apply to Henry.

SPEAKER_03

01:15:04 - 01:15:42

Right, it's in theory, it's more general. OK. So a lot of things we've been talking about, falls, we've been talking about philosophy a little bit, but also about mathematics and statistics. A lot of it falls into this idea, a universe idea of statistical theory of learning. What is the most beautiful and sort of powerful or essential idea you've come across even just for yourself personally in the world of statistics or statistics theory of learning?

SPEAKER_00

01:15:42 - 01:15:50

Probably the uniform convergence, which we did, was Alexey Chelvin-Empis.

SPEAKER_03

01:15:50 - 01:15:53

Can you describe universal convergence?

SPEAKER_00

01:15:53 - 01:18:07

You have large, large numbers. So for any function, expectation of function, average of function, conversation. But if you have set of functions, for any function, it is true. But it should converge simultaneously for all set of functions. And for learning, you need uniform converges. Just convergence is not enough. because when you pick up one which gives minima, you can pick up one function which does not converge and it will give you the best answer for this function. So you need to uniform convergent to guarantee learning. So learning does not really want to reveal long, large numbers. But a deal of the convergence exists in statistics for a long time. But it is interesting that As I think about myself, how stupid I was 50 years, I didn't see V convergence. I worked on loan strong convergence. But now I think that most powerful is V convergence, because it makes admissible set of functions. And even in all products, when people try to understand recognition about dark law, looks like a duck and so on. They use weak conversions. People in language they understand this. But when we try to create artificial intelligence, we want to invent in a different way. You just consider strong conversions. Are we not?

SPEAKER_03

01:18:07 - 01:18:18

So reducing the set of admissible functions, you think there should be effort put into understanding the properties of weak convergence.

SPEAKER_00

01:18:18 - 01:19:25

You know, in classical mathematics, in Gilder's place, there are only two ways, two form of convergence, strong and weak. Now we can use balls. That means that we did everything. And it's so happened. then when we use Hilbert space, which is very rich space, space of continuous functions. Each has an integral and square. So we can apply weak and strong convergence for learning and have closed form solution. So for computationally simple, for me, it is sign that it is right way. because you don't need any of these techniques, yes, whatever you want. But now, the only word left is this concept of what is predicate. But it is not statistics.

SPEAKER_03

01:19:25 - 01:19:35

By the way, I like the fact that you think the heuristics are a mess that should be removed from the system. So closed form solution is the ultimate.

SPEAKER_00

01:19:35 - 01:19:45

No, it's so happen. Then when you're using Right, instrument, you have close to one solution.

SPEAKER_03

01:19:45 - 01:19:59

Do you think intelligence, human level intelligence, when we create it, will have something like a close form solution?

SPEAKER_00

01:19:59 - 01:20:56

You know, I know, I'm looking for bonds which I gave bonds for convergence. And when I look for bones, I think what is the most appropriate kernel of this bone would be. So we know the thin-saished, allow businesses to use radial-based function. But looking at the bone, I think that I start to understand that maybe we need to make corrections to radial-based function. to be closer to what better for this balance. So I'm again trying to understand what type of kernel have best approximation, no approximation, best

SPEAKER_03

01:21:00 - 01:21:10

Sure, so there's a lot of interesting work that can be done in discovering better functions than regular basis functions for. Yeah, good bounds, but fine.

SPEAKER_00

01:21:10 - 01:21:17

It still comes from, you're looking to mass and trying to understand what?

SPEAKER_03

01:21:17 - 01:21:20

From your own mind, looking at the, yeah, but I don't know.

SPEAKER_00

01:21:20 - 01:21:27

Then I try to understand what, what will be good for that?

SPEAKER_03

01:21:28 - 01:21:43

Yeah, but to me, there's still a beauty, again, maybe I'm a descendant of volunteering to heuristics. To me, ultimately, intelligence will be a mess of heuristics. And that's the engineering as I guess.

SPEAKER_00

01:21:43 - 01:22:33

Absolutely. When you're doing, say, self-driving cars, the great guy who will do this, it doesn't matter what theory behind that. who has a better feeling of to apply it. But by the way, it is the same story about predicate because you cannot create rule for a situation as much more than you have ruled for that. But maybe you can have more abstract rule than it will be less than the rule. It is the same story about a geosund and a geosupply to specific cases.

SPEAKER_03

01:22:33 - 01:23:40

But still, you should... You cannot avoid this. Yes, of course. But you should still reach for the ideas to understand science. Let me kind of ask, do you think neural networks or functions can be made to reason? sort of what do you think, talking about intelligence, but this idea of reasoning, there's an element of sequentially disassembling interpreting the images. So when you think of handwritten recognition, we kind of think that there'll be a single, there's an input and output. There's not a recurrence. What do you think about the idea of recurrence of going back to memory and thinking through this sequentially mangling the different representations over and over until you arrive at a conclusion? Or is ultimately all that can be wrapped up into a function?

SPEAKER_00

01:23:40 - 01:24:19

Well, you're suggesting that let us use this type of algorithm. When they start thinking, okay, so it's the full starting to understand what I want. Can they write down what I want? And then they try to formalize. And when they do that, I think you have to solve this problem. Till now, I did not see a situation where you need.

SPEAKER_03

01:24:19 - 01:24:58

You need recurrence. Very good. But do you observe human beings? Yeah. Do you try to, it's the imitation question, right? It seems that human beings reason, this kind of sequentially. So does that inspire a new thought that we need to add that into our intelligence systems. You're saying, okay, you've kind of answered saying, until now I haven't seen a need for it. And so because of that, you don't see a reason to think about it.

SPEAKER_00

01:24:58 - 01:25:28

You know, most of the things that you don't understand in reasoning, human, it is for me to complicated. For me, the most difficult part is those questions, good questions, how it works, how people are asking questions. I don't know.

SPEAKER_03

01:25:28 - 01:25:52

You said that machine learning is not only about technical things speaking of questions, but it's also about philosophy. So what role does philosophy play a machine learning? We talked about Plato, but generally thinking in this philosophical way. Does it have how does philosophy math fit together in your mind?

SPEAKER_00

01:25:52 - 01:26:57

So, studies on the implementation. It's like predicate, like, say, admissible set of functions. It comes together, everything, because The first iteration of the theory was done 50 years ago, it will get this theory. So everything is there. If you have data, you can, and your set of function is not, has not be capacity. So obviously the mention, you can do that. You can make structural risk minimization control capacity. He was not able to make a possible set of functions. Now, when suddenly realized that we did not use another idea of convergence, which we can. Everything comes together.

SPEAKER_03

01:26:58 - 01:27:08

But those are mathematical notions, philosophy plays a role of simply saying that we should be swimming in the space of ideas.

SPEAKER_00

01:27:08 - 01:27:55

Let's talk about philosophy. Philosophy means understanding of life. So understanding of life, so people like Plata, then understand and very high abstract level of life. So in whatever I do in just implementation of my understanding of life. But every new step, it is very difficult. For example, to find the idea that we need weak convergence was not simple for me.

SPEAKER_03

01:27:57 - 01:28:06

So that required thinking about life a little bit. Hard to trace, but there was some thought process.

SPEAKER_00

01:28:06 - 01:29:47

You know, I've worked in my thinking about the same problem for 60 years somehow. Again, again, again, again. I try to be honest and that is very important, not to be very enthusiastic, but concentrate on whatever was not able to achieve, for example, and understand why. And now I understand that because I believe in mass, I believe that in ignorance, but now, when I see that there are only two ways of convergence. and we're using boss that means that we must as well as people doing. But now exactly in philosophy and what we know about predicate, what we understand life, can be described as a predicate. I thought about that and that is more or less obvious level of symmetry. But next, I have a feeling, it's something about structures. But I don't know how to formulate, how to measure, measure, structure and all this stuff. And guy who will solve this challenge problem, then when we will look in, how he did it, probably just only three not enough.

SPEAKER_03

01:29:49 - 01:29:51

but something like, so much will be there.

SPEAKER_00

01:29:51 - 01:31:27

Oh, I'm sure it will be there. Oh, I'm sure it will be there. Level of symmetry will be there. And level of symmetry, anti-semitry, yoga, no vertical. And I, I even don't know how you can use in different direction the day of symmetry is very general. But it will be there. I think that people vary sensitive to the day of symmetry. But as a overall ideas like symmetry. As I would like to learn, but you cannot learn just thinking about that. You should do challenging problems and then analyzing why it was able to solve them and then you will see. Very simple things, it's not easy to find. Even with talking about this, every time. I also try to understand. These people describe in language, strong convergence, mechanism for learning. I didn't see, I don't know. But with convergence, this dark story, and story like that, when you will explain to you, you will use the convergence argument. It looks like it does like it is it. But when you try to formalize, it's just ignoring this. Why? Why 50 years? From start of machine learning.

SPEAKER_03

01:31:27 - 01:31:28

And that's the role of last.

SPEAKER_00

01:31:28 - 01:31:58

I think that might be I don't know. Maybe this is your role, so we should blame for that, because empirical risk minimization don't stop. And if you read now text books, they just about bound about empirical risk minimizations, they don't look for another problem, like admissible set.

SPEAKER_03

01:31:58 - 01:32:19

But on the topic of life, Perhaps we, you could talk in Russian for a little bit. What's your favorite memory from childhood? Okay, I've actually been in my palm, it's just a... Oh, music. How about, can you try to answer in Russian?

SPEAKER_00

01:32:19 - 01:32:23

Music. Not below, what you've done, what you've done.

SPEAKER_01

01:32:25 - 01:32:31

This music is a classic music. It was a great composer.

SPEAKER_00

01:32:31 - 01:32:54

At first it was an idea that it was possible. And then when I was in the Bahia, I was like, well, it's a shame. By the way, I don't think that this is a predicate of the structure in the Bahia, but of course.

SPEAKER_01

01:32:54 - 01:32:58

Because there is just a sense of the structure.

SPEAKER_00

01:32:58 - 01:33:04

I don't think that Now, they were talking about a bag.

SPEAKER_03

01:33:31 - 01:33:35

Let's switch back to English because I like Beethoven and Chopin.

SPEAKER_00

01:33:35 - 01:33:39

Chopin is another music story.

SPEAKER_03

01:33:39 - 01:33:48

If we talk about predicates, Bach probably... has the most sort of well-defined predicates and the life.

SPEAKER_00

01:33:48 - 01:34:45

You know, it is very interesting to read what critics writing about Bach, which words they using, they trying to describe predicates and then shopping. It is very different, vocabulary, very different predicates. And I think that if you will make a collection of that. So maybe from this you can describe predicate for digital mission. Well, from Buckling Chopin. No, no, not from Buckling Chopin. From the creation of the music. They try to explain music what they use. As a use, they describe high level ideas of plot of ideas, what behind this music.

SPEAKER_03

01:34:45 - 01:34:56

That's brilliant. So art is not self-explanatory in some sense. So you have to try to convert it into ideas.

SPEAKER_00

01:34:56 - 01:35:20

It is useless problems when you go from ideas to the representation. It is easy way, but when you're trying to go back, it is you'll post problems, but nevertheless, I believe that when you're looking from that, even from art, you will be able to find predicate for digital recognition.

SPEAKER_03

01:35:20 - 01:35:31

That's such a fascinating and powerful notion. Do you ponder your own mortality? Do you think about it? Do you fear it? Do you draw insight from it?

SPEAKER_00

01:35:34 - 01:35:38

No, yeah.

SPEAKER_03

01:35:38 - 01:35:42

Are you afraid of that?

SPEAKER_00

01:35:42 - 01:36:28

Not too much. Not too much. It is pity that you will not be able to do something cliché. I have a feeling to do that. For example, I didn't be very happy to work this nice tradition from music. to write this collection of description, what have they described in the music, how they use it and from art as well, then take what is in common and try to understand predicate, which is absolutely everything and try to stand for visual recognition.

SPEAKER_01

01:36:28 - 01:36:29

There's still time, we got time.

SPEAKER_00

01:36:35 - 01:36:42

You've got time. It's take years and years and years.

SPEAKER_03

01:36:42 - 01:36:52

It's a long way. Well, see, you've got the patient, mathematical mathematicians mind. I think it could be done very quickly and very beautifully. I think it's a really elegant idea.

SPEAKER_00

01:36:52 - 01:37:06

Yeah, but also some of many years. Yes, you know, the most time, it is not to make this collection. Don't understand. What is the common to think about that once again and again and again?

SPEAKER_03

01:37:06 - 01:37:34

Again again, but I think sometimes especially just when you say this idea now, even just putting together the collection and looking at the different sets of data, language, trying to interpret music, criticize music and images. I think there will be sparks of ideas that will come. Of course, again and again, you'll come up with better ideas, but even just that notion is a beautiful nation.

SPEAKER_00

01:37:34 - 01:39:26

I even have some example. So I have friend who was specialist in Russian poetry. She is a professor of Russian poetry. He did not write points, but she know a lot of stuff, she make book several books in one of them, is collection of Russian poetry. She has images of Russian poetry, she collect all images of Russian poetry. And I ask her to do following You have nips, digit recognition. And we get 100 digits. I don't remember my 50 digits. And try from political point of view, describe every image you see using only words of images of Russian poetry. And she did it. And then, we tried to, I call it learning using privilege and formation. I call it privilege and formation. You have on two languages. One language is just image of digit and the null, the language poetic description of this image. And this is privilege and formation. And there is an algorithm when you're working using privileged information, you're doing well, better, much better.

SPEAKER_03

01:39:26 - 01:39:29

So there's something there. Something there.

SPEAKER_00

01:39:29 - 01:39:43

And there is an NEC, she, unfortunately, direct the collection of digits in poetic descriptions of this digits.

SPEAKER_03

01:39:46 - 01:39:49

There's something there, and that poetic description.

SPEAKER_00

01:39:49 - 01:39:57

But I think that the abstract idea is a plateau level of a need.

SPEAKER_03

01:39:57 - 01:40:01

Yeah, that there's there that could be discovered. And music seems to be a good answer.

SPEAKER_00

01:40:01 - 01:40:06

But as soon as we start this, this is a challenge problem.

SPEAKER_03

01:40:07 - 01:40:09

The challenge problem. No, listen.

SPEAKER_00

01:40:09 - 01:40:12

You immediately connected to all this stuff.

SPEAKER_03

01:40:12 - 01:40:46

Especially with your talk and this podcast and I'll do whatever I can to advertise it. It's such a clean, beautiful, Einstein-like formulation of the challenge before us. Right. Let me ask another absurd question. We talked about mortality. We talked about philosophy of life. What do you think is the meaning of life? What's the predicate? for mysterious existence here on earth.

SPEAKER_00

01:40:46 - 01:41:32

I don't know. It's very interesting, huh? You have in Russia, I don't know, you know, the guy Strugatski. They are, I think, fictitious thinking about human, what's going on? And so you have a DM that's that developing to type of people, common people and various smart people, they just start it. And this two branches of people will go in different directions very soon. So that's what they're thinking about.

SPEAKER_03

01:41:35 - 01:41:43

So the purpose of life is to create two paths. A human society.

SPEAKER_00

01:41:43 - 01:41:47

Yes, simple people and more complicated.

SPEAKER_03

01:41:47 - 01:41:51

Which do you like best? The simple people are the complicated ones.

SPEAKER_00

01:41:51 - 01:43:34

I can know that he is just his fantasy, but you know every view you have got, who is just writer and also so let's go to literature. And he explained, have he understand literature and human relationship, have he see life. And I understood that I am just small kids comparing to him. He is very smart guy in understanding He knows this predicament, he knows big blocks of life. I am used every time when I listen to him. And he just taught me about literature. And I think that I was surprised. So the managers in big companies. Most of them are guys who study English language and English literature. So why? Because they understand life. They understand models. And among them, maybe many talented critics, which just analyzing them. And this is big science, like, property. This is this blocks. Yes, there is.

SPEAKER_03

01:43:34 - 01:43:39

It amazes me that you are and continue to be humbled by the brilliance of others.

SPEAKER_00

01:43:39 - 01:43:45

I'm very modest about myself. I see so smart and grace around.

SPEAKER_03

01:43:45 - 01:45:03

Well, let me be in modest for you. You're one of the greatest mathematicians that additions of our time. It's true in honor. Let's talk. It is not. Yeah, I know my limits. Let's, let's talk again when your challenge is taken on and solved by grad student, especially when, uh, using scripting. Maybe musical be involved. Vladimir, thank you so much. Thank you very much. Thanks for listening to this conversation with Vladimir Vapnik and thank you to our presenting sponsored cash app. Download it, use code Lex Podcast, you'll get $10 and $10 good of first. An organization that inspires and educates young minds to become science and technology innovators of tomorrow. If you enjoy this podcast, subscribe on YouTube, give us five stars in Apple Podcast, support it on Patreon, or simply connect with me on Twitter at Lex Friedman. And now, let me leave you with some words from Vladimir Vaabnik. What solving a problem of interest, do not solve a more general problem as an intermediate step. Thank you for listening. I hope to see you next time.