Strachey Lecture: Integrating Logic, Probability and Neuro-Symbolic Reasoning using Probabilistic So

00:00

So welcome to this strategy lecture. This is a terribly serious of distinguished lectures named for Professor Christopher Strange Chris. The strategy founded Oxford Programming Research Group in 1965, and with Dana Scott, he founded the Field of Occasional Semantics, which provides a fun mathematical foundation to programming languages. Before introducing today, Speaker, I'd like to thank Oxford Asset Management for who generously support this great few lectures.

00:36

I have been supporting this theory since 2015 and without the support we would be unable to host this series of exciting lectures. There'll be coffee and doughnuts after the talk, and this'll give students and researchers the opportunity to meet representatives of Oxford Asset Management. You'll be able to find you near the registration desk. Okay. So with respect to today's talk, it's my very great pleasure to introduce today's speaker, who is Professor Lisa Ghetto.

01:12

Lisa Guitar is a distinguished professor in the University of California, Santa Cruz, Department of Computer Science and Engineering, and a founding director of the UCSC Data Science Research Centre. Lisa has been the recipient of many honours and awards in recognition of her distinguished research contributions, including fellowships of the ICI and I Triple A, Triple II, numerous paper awards and many invitations to give keynote talks and distinguished lectures such as this one today.

01:52

Lisa is perhaps best known for her work on statistical relational learning, which is a subfield of II that combines logic and machine learning to deal with data that exhibits both complex structure and inherent uncertainty. She's also made important contributions in fields as diverse as data management, visual analytics and social network analysis, as well as being a passionate advocate for fairness.

02:19

And I think in today's Cool Place that's going to tell us about her work on probabilistic Soft Logic, which is an SRO framework for large scale probabilistic reasoning and relational domains. So please join me and giving this a very warm welcome today. First. It's a huge honour to be here to give this talk. And Ian didn't give the back story that this talk has been in the making for quite a while. I think we first started talking in 2019 and I was supposed to give the talk in May of 2020.

03:03

Just when things were kind of shutting down. And I'm really happy to be here and actually be here in 3D rather than 2D. And I'm hoping that my topic, which is about kind of mixing logic, probability and neural reasoning, I think is a really good fit to the kind of research that's being done and the computer science department here. So I hope you'll agree by the end. And then finally, I am going to talk about actually a programming language, a probabilistic programming language.

03:39

So I hope that's kind of in honour of Christopher is fitting. But before I get into the kind of meat of the talk, I want to kind of pop things out a little bit and put a bigger picture on what I'm doing. And so, if you will, a subtitle or super title for my talk is The Power of Relational or Statistical Relational Thinking. And you know, what do I mean by relational thinking?

04:14

And relational thinking is interesting because there's all these different fields that use the term, but they mean kind of different things in these different areas. So let's start with psychology. In psychology, there's this notion that it's about being able to talk about objects and attributes of objects and their relations,

04:40

and that's part of intelligence. And within sociology, not surprisingly, it's about the ties between individuals and the memberships in organisations and those kinds of structures within mathematics. It's interesting, it has a kind of different interpretation and it's much more about the ability to go from the specific to the more abstract. So oftentimes the example given is a distinction between just doing calculations versus, you know, understanding how to manipulate things using algebra.

05:25

And then finally, in philosophy A, the term is used in a lot of different ways. One of them is around a can of even notions of reality, of a relational ism around arguing against non duality between mind and body and subjectivity and objectivity. But. What does this mean from a perspective of computer science or machine learning? Ah, I, I'm going to argue that there's kind of two components. So one component is not surprisingly dealing with structure.

06:12

So dealing with structure either in the inputs. So, you know, oftentimes in machine learning, you just have this flat attribute value vectors and you're trying to learn things from them. But, you know, so many things are much richer. Like even a graph or a multi relational graph and so on as common. And then also for the outputs, the outputs, the decision variables that one's trying to predict.

06:46

Oftentimes there's a fair bit of structure in them and there's kind of dependencies between them, and you need to reason jointly about them. So the most obvious examples are natural language processing when you're doing something like parsing or computer vision, when you're doing scene understanding. So there's a structure that we want to be able to make use of.

07:11

But then there's also uncertainty. And again, there's uncertainty in the inputs, you know, noise and uncertainty and the outputs, you know, quantifying the uncertainty over various different predictions that one's making. So what I argue is that we need Method Z are scalable and are able to deal with both structure and uncertainty. And in this talk, what I want to do is provide you with some kind of patterns and some methods for doing relational thinking.

07:57

And these methods are going to use a mix of. Logic and probability or kind of hard and soft reasoning and also neural and symbolic reasoning. So the talk is divided into kind of three different components. The first part is I'm just going to go through some patterns, and these patterns are actually pretty basic, but we've found them over and over again to be super useful in building AI and machine learning systems. Then the bulk of the talk is going to be the tool section.

08:44

It's going to go into a little more technical detail. And then, you know, we're computer scientists, so we always like to talk about scaling. So there'll be a section on getting this to work at scale. So for patterns, the way that I'm going to be representing these patterns is as logical rules. And so these have advantages. You know, they're good at representing structure. They tend to be interpretable.

09:15

In about 10 minutes into the lecture, I'm going to back off on that and talk about also some probabilistic interpretations for them. And the. Canonical pattern said I'm going to go through are these structured prediction patterns that are super simple but still powerful. And the first ones, the simplest the simplest one is just collective classification. So you're doing classification problem where you're trying to predict labels for nodes in a graph.

09:59

You know, you can't get much simpler than that. And, and the thing that I want to highlight is how to represent the reasoning pattern behind this. And the reasoning pattern is captured in these two simple rules. So one is, okay, I'm going to just use local information at the node. So attributes of the node to try and infer a label. But then I'm also going to use things about what the labels of its neighbours are. And the interesting thing to note here is the thing that I'm trying to infer label.

10:40

You see this one rule where label is on both sides of the rule. So that's the thing that gets this dependence between the decision variables, the fact that I have unknowns potentially on both sides of the rule. So I need to give a little toy example of this. The one that I usually give is I have a social network and I'm trying to infer political parties. Okay, so these are the US political parties, so I'm going to switch them and keep the colours the same and the British parties.

11:24

But so again, this is a toy example, but the idea is most often what happens is you're given a network where you observe the labels for some of the nodes and then it's just these other ones where you're trying to infer them. And so how do we instantiate that little pattern that I gave a minute or two ago? Well, I'm going to have a collection of local rules where the local rules say things like, oh, you know, if they donate to a particular party, they're likely to vote for that party.

12:09

If they tweet about something, maybe this isn't a Labour thing that people tweet, but they're likely to be at that party. But then the more interesting rules are these collective rules where the collective rules take into account, you know, if. Their friends vote for a party. They're likely to vote for the party. If their spouse votes for a party, they're likely to vote for a party. And again, this is something that allows you to propagate the information.

12:47

And using something like this, you can go through and end up doing a labelling. There's a bunch of different algorithms. I'm going to tell you one way that we do it at scale in the middle of the talk. Now, there is an important aside here, which is related to privacy. So it's exactly this issue that you can infer so much about an individual based on their network. That's important. I'm going to talk a little bit more about that towards the end of the lecture.

13:27

So the next pattern again is that was about nodes. This is about edges. So inferring the existence of edges in a graph. And again, we have this super simple pattern that says if there's a edge between X and Y, there's a Z that's similar to Y, then there's an edge between X and Z. And because I have this thing that I'm going to infer, the link is what I'm trying to predict on both sides of the rule. That's the thing that's going to allow this dependence to happen.

14:06

So what's a pattern for an example of this? Well, one of the things that there's a lot of work in is recommender systems. And recommender systems. Exactly. You're trying to predict what items someone will like. There's these kind of structural patterns that one can do where you can say, okay, if a user likes one item, there's another item that's similar to that item, then the user will like the second item.

14:42

Also you can say, okay, if there's a user that likes an item, there's another user that's similar to that user, then the second user will like that item. And so, you know, this is a simple way of capturing what's done and a lot of recommender systems. The key challenge, not surprisingly, and note that these solid lines are the things that are observed. And then I'm making this prediction as a test. One of the challenges.

15:17

How to represent the similarity. And then the kind of third important sub problem is entity resolution. And so what is entity resolution? This is something that's ubiquitous across computer science where you are essentially trying to determine when you have two references and they're really referring to the same underlying thing. Again, we can set this up where we use local information.

15:57

So, you know, if they have similar names, they're likely to refer to the same, let's say, person, if they're similar, linked to similar things. But then the kind of collective pattern that is. Important is if X and Y are the same and Y and Z are the same, then Excellency are the same. So this is a transitivity that allows you to infer additional same as links.

16:36

And this is important, but there are some domains where you actually have a different kind of rule and understanding your domain to know when you do this kind of clustering versus some domains. What happens is let's. What happens is it doesn't go black. You have a more of a linking kind of problem. So it's said if X and Y are the same. And Z is not the same as Y, then X can't link to Z.

17:18

So think about this. When you have two databases that you're matching across and they've been duplicated and you're only allowed to match across one, and a distinction of knowing when you're in the clustering setting versus when you're in the mutual exclusion setting as important. But now those are all these kind of micro pattern that my favourite pattern is one where you do all of these at the same time.

17:50

And so this is a problem of graph identification we first kind of introduces like ten years ago. And let me again kind of define it by example a. A setting where we have a kind of noisy input graph, but then we have to do all these different inferences to infer an output graph. And then this output graph is the thing that we then want to go on and do our analysis on or do our science on. And an example would be you have a communication network.

18:31

In this case, it's an email communication network, but you have email addresses and the edges are communications and you have some. Content information. And then what you're trying to do is infer out an organisational network or social network and here the nodes are different, they're not email addresses, they're actual people. The edges have a different semantics there. Who's in charge of who and the node labels maybe are, you know, what are their roles in the company?

19:07

And so, again, what does this problem look like? I have as input this graph I'm trying to get out this so I don't have this observed. What do I need to do? I need to, you know, first. Go through and do entity resolution where I figure out, you know, who are the people? Then I have to do the link prediction of, you know, who manages who. And then I have to do the inference of their kind of roles in the company. So I need to go through and do the labelling of this.

19:49

The thing that's kind of interesting about this, in my view, is how there's all these interesting dependencies. And most of the time people look at these problems in isolation. So they either do collective classification or they do link prediction or entity resolution, but trying to do them all at once where there's dependencies between. You know, obviously on the observed graph. But there's also what I was emphasising before that in tread dependence across the predictions in each of the task.

20:30

But then there's also this kind of. Interdependence across them. And it's really these kinds of complex reasonings with graphs that kind of motivates much of my work. So. Now let me get into methods for doing this. And I'm going to be talking about tools and. I'm first going to talk about tools that combine the statistical and relational, so the logic and probability. And then afterwards I'm going to talk about methods that combine the neural and symbolic.

21:17

So for the statistical relational, you know, this is an area, sub research area, as you mentioned, that combines logical representations, handles uncertainty, and is especially designed to do these kind of collective problems where these rich dependencies and there's been a lot of different languages that have been proposed. And we're going to add one to the mix. PSL. PSL stands for probabilistic soft logic.

22:01

It's an open source toolkit. There's a bunch of examples and tons of resources at this website. But at a high level, going back to the fact that I represented all these patterns as these logical rules. Well, logical rules have advantages, but they also have disadvantages. So in general, doing full reasoning with them is intractable. If there's inconsistencies, that's going to be a significant problem. And, you know, one of the things that I mentioned is dealing with similarity.

22:43

You know, they're not great at representing similarities. So what we try and do with PSL is basically turn these disadvantages into advantages and have a tractable method that can handle the inconsistencies and can represent. Similarities are a degree of similarity. So what does this look like? It's a probabilistic programming language for these collective problems, and it is represented as weighted rules.

23:30

The. Key thing with these weighted rules is the kind of ground atoms that we'll get to are going to be not standard zero one binary, but continuous value between zero and one. And a cell programme is just, you know, these rules plus some input data. So. We make reasoning scalable by converting this to a convex optimisation problem. And so what I want to do is show you how we come up with this particular optimisation problem.

24:15

And this is work very much by my former Ph.D. student, Stephen Buck, and Bert Wang, a post-doc that worked with me. And I'm going to flash up the particular relaxation that we introduce. And then I'm going to show you how we get to it under three different ways. So for now, I'm going to unpack this in a minute, but kind of see if you can kind of memorise this pattern. And. This pattern. The interesting thing is it comes up under three different interpretations for this way to logic programmes.

25:00

So one comes from randomised algorithms, theoretical computer science. One comes from probabilistic graphical models and one comes from soft logic. Let me do the first from randomised algorithms. So. We're going to take this way to logical rules, and I'm going to rewrite them so that we have non-negative weights. I'm going to write it in this normal form. And then I'm indexing over the positive literals and the negative literals. So it's a little. A messy bit. And then what I want to do is.

25:49

I want to. Find that assignments to the x variables that maximises this weighted sum. And so, you know, all of you guys go back to your algorithms class and you say, Professor Gitau, I remember that. That is why you did Max that and that is amp hard. So that doesn't seem like you got anywhere so far. But now what we're going to do is take this MP hard problem and.

26:27

We're going to convert it to a problem where we instead interpret the random variables as probabilities, rounding probabilities, so the probability they round up to one are down to zero. And doing this, we can make use of this old result of Gomez and Williamson. They actually have to really famous results from the same year. But this gives a way of bounding the expectation for that expression with a linear programme.

27:04

And you can go further and say that this gives you this three quarters optimal rounding guarantee for this relaxation. And I know everybody remembers what I put two slides ago. If you look at this main part of the term here, it has the form that I showed a minute ago. So this is one interpretation where I get a relaxation that is nice and is basically a linear programme. Now there's a second interpretation. And the second interpretation comes from graphical models.

27:45

And so what I can do with my weighted logical rules is I can basically form a big giant graphical model where I have my random variables and then I have these potential functions that are the rules. So the logical rules. And. They're kind of the potential functions in the graphical model. So they're kind of weird potential functions because they're just our logic. But still, I can do this.

28:26

And then the standard thing that people do in graphical models is then define a distribution where the distribution is just over all of the rules, the sum of the weighted potentials. And the classic problem is to then find the assignment of variables, then maximises this exponent. It will be the most probable. So at this point I've done nothing rather than just rename weighted max that into graphical models. But we can go from here when one's trying to find the assignment to the variables.

29:13

One of the things that one can do is to solve this is to find a globally consistent set of marginal distributions. Again. This is still hard because there's exponentially many constraints. So what people and graphical models have often done is to do these local consistency around relaxation. So rather than enforce all of the constraints, just enforce a few local ones. There's a whole cottage industry in different ways of doing this.

29:55

We can make use of this in our setting and introduce some kind of pseudo marginals over some joint potential states, but not all of them. Then we can take. Our expression, put in these constraints for consistency with just the marginals and the pseudo marginals. And then, you know, do a little bit of not terribly complex analysis to kind of rewrite this and get rid of the pseudo marginals projected onto the mews. And we can get out this expression.

30:46

And again, if you look at this expression, it has the same form as the max set relaxation. So we got to the same relaxation, but through to pretty significantly different interpretations. And this result actually between these two is interesting just in and of itself, because the people that were doing the local consistency, relaxations and graphical models would do them, but they didn't have any bounds. So now we can take this down from the max set relaxation and apply it.

31:26

But that's two of the three interpretations. So the third interpretation comes from soft logic, and this is where we just start off and say, you know, the random variables are continuous between zero and one, you know, interpreted as either similarity or degree of truth. And, you know, there's different soft logics. We use Fukushima's logic and then through doing some manipulation to find the exact max solution. So no approximations happening here at all.

32:13

You get this expression. And this expression. Now, you can kind of have a little bit of an interpretation for it of where it came from, because this is really like either the rule is satisfied so it's one or it's not satisfied. So I'm going to add up the truth values of the positive literals and the truth values of the negative literals. And talk about that is the degree of satisfaction. And again, if you look at this expression. It matches the other two. So we've gotten to.

32:54

The same relaxation through three different interpretations, which always seemed like a good sign of consistency. With pencil, we actually ended up going further than this. So we introduced a notion that we called hinge loss mark of random fields. And so the hinge, less marka random fields have this kind of same general form. But now. Rather than this expression that we had before, we're going to allow kind of arbitrary linear expressions here. This allows us to add in constraints as well.

33:44

And then, you know, what is a pencil programme? Well, a pencil programme, you know, this is an actual pencil programme. I guess you can't read it that well, but for that simple collective classification problem that we had before. So, you know, you have interpretable. And you have a cell programme, you have some data and you. Have a distribution, a conditional distribution that you can solve for. Now. How do you learn a cell programme from data?

34:21

There's kind of two issues. One is where do you get the weights from and where do you get the structures of the rules? So if I had 2 hours for the talk, I would tell you all about that that I don't. So you guys will have to trust me or invite me back or talk to me after the lecture about ways of learning this. But there are ways of doing them. But. Let's talk instead about. Well. Does it work? Empirically.

34:53

And I'm going to show some kind of simple examples of it working, but they're they're kind of representative. So the first example is an example of doing activity recognition in video. And in this setting, we compared with using some. Kind of different feature representations that at the time are commonly used. And then we'd add in these peers our rules to the features. And these rules were like super simple.

35:31

So they just captured little teeny patterns analogous to the ones that I gave at the beginning. But you see over and over again that they kind of give you a significant bump in performance. Another collection of results for some collective classification problems and a link prediction problem. Here we're comparing to a discrete MRF. And we see that we do a little bit better. But even on these small problems, we're significantly faster. I'm going to drill and expand on that shortly.

36:16

Then with a kind of state of the art approach that did a kind of similarity propagation so completely different kind of algorithm we were able to outperform them on. Kind of biological setting. And then a third setting where we're doing emotion recognition in dialogue, comparing to a variety of neural net approaches, including some pretty sophisticated neural net approaches. Even with a relatively simple PSR model, we were able to significantly outperform.

37:00

So. This was about combining logic and probability. Now let's talk a little bit about combining neuro and symbolic reasoning. So for neuro and symbolic reasoning, similarly, there's a whole community, the NSC community, that works in this area where they want to make use of the advantages of sub symbolic representations and include symbolic reasoning and do this in a way that integrates them properly. And just like before.

37:47

There's a ton of different languages out there that people have proposed and we're going to add one into the mix. New PSL. So. What is new. So neuro probabilistic, soft logic. And again, there are some foundations where the foundations I think one of our contributions here is to introduce a generalised form for a neuro symbolic reasoning. And then instantiate it in a particular way. So the we're going to do this by introducing neuro symbolic energy based models.

38:37

And it's a general family of energy based models. And. They highlight the boundary for where the kind of neural reasoning and representation is and where the symbolic is. In a nice way, I'm going to kind of illustrated by kind of cartoon where, you know, a standard energy based model is just can be represented kind of generically like this. And what we're going to do for a neuro symbolic, energy based model is we're going to break it down where we have the wires, which are the output variables?

39:23

We have the symbolic input variables. We have the neural input variables. We have the parameters of the symbolic system and we have the parameters of the neural system. So we're just kind of explaining the notation. And then for doing the integration, the boundary is really captured by this interface between the neural and the symbolic. And exactly how that feeds in will depend on the different language and interpretation. And, you know, this is pretty generic and it's able to represent.

40:21

I think more than this. But the ones that we've gone through and looked at, you know, deep problem logic tensor networks, deep stock plug, neural answer, set programming, all of these can be fit into this neural symbolic energy based model framework. How did we then do new so well for new PISA? It is basically that. We had this from before. And we are going to.

41:03

Kind of have this. Function, the potential function, the hinge loss have this form that matches the factorisation of across a neurone symbolic. So with this in mind we can go to our you know. Architecture. And, you know, the way that you do inference is you just kind of go feed. Forward through. These systems. If I point out the right thing to do that. Animation. And then to do learning. So to learn the parameters of both the symbolic and the.

41:54

The Neuro you go through and you do the inference and then you calculate the loss, and then you can propagate the loss through the system. So the nice thing about New Pixel is that it gives us this kind of expressive language. It has all of the things from pencil. It's scalable and as a general. So, you know, you can use it with a variety of different neural packages and we have support for this. Again, this is kind of pretty fresh work.

42:36

I'm going to show just a little experimental evaluation that I have a nicer one, but it takes longer to explain. So just a. It's one set of experiments where we were doing a collective classification and we're comparing to like a simple label propagation algorithm, existing NESI approaches and GCN. And we can clearly do way better than the other neuro symbolic approaches for the GCN. And yeah, we're doing a little bit better. But you know, the PSL model, the new PSL model is super simple.

43:22

So the simplicity is part of the attraction. And. Let me now turn to a topic of scaling. So. We're interested in doing these big graft problems, but we want to make sure that we can do it in a reasonable amount of time. And they're key. Problem in this is we have our way to logical rules. We have some data. There's two processes that happen. One is the instantiation of the model, and then there's doing the inference in the model. So if we want to scale this, we can scale two aspects of it.

44:24

One is the inference, which I'll talk about first, and the other is the grounding, which I also talk about. So inference. Remember, one of my claims was, you know, we went from a combinatorial optimisation problem to a convex optimisation problem. So already, yeah, that's pretty good in terms of scalability, but it's still potentially a huge linear programme. So what we can do is we can make use of the fine grained structure in these models and.

45:03

We exploit this by decomposing, and we've looked at a bunch of optimisation techniques, but the first one that we looked at that was really beneficial was Adam's alternating direction multi method of multiplier. So it's something that's used a lot in computer science or in machine learning. And the simple idea, it's very simple idea, but you decompose it, you solve the problems, and then you do some message passing. And the nice thing for a convex problem is it's guaranteed to converge.

45:40

The trick is how do you make sure the sub problems are fast? And this is the place where we have these hinge losses. And if you break it down and look at these hinges. A lot of them are just trivial. So you can just ignore them and you can get to a much smaller set of constraints and then solve that. And so the first set of results I'm going to show are just comparing to using an off the shelf convex optimiser. A commercial grade optimiser. And then our admin implementation.

46:24

And the first thing you see is for something with 150,000 potentials, you know, the convex optimiser took 3 hours, the commercial grade one took 3 minutes and it took a few seconds for something that was even larger with 800 million potentials, you know, so it could do it in a minute. But the others just, you know, can even handle it. So that was good. However.

47:00

Of the total time to solve so that 800 million potential one one minute was doing the optimisation and 5 minutes was doing the grounding of the problem. So. The next thing that we looked at was how do we speed up grounding? And this is very much work with my current Ph.D. student, Eric Augustine. And you can see why grounding is an issue. Because even.

47:35

In the simplest case where I'm potentially doing pairwise similarity, you know, if I have n things that I'm comparing, I'm going to get and squared and you know, that's just a simple setting. It can get much worse than that. So. What can we do to speed this up? Well, the first thing that one can do, and it's done in a lot of different areas and of course, goes by a lot of different terms.

48:05

But is blocking and blocking is the simple idea that if you have, you know, say, a simple case where you have the pairwise comparison to say, okay, I'm not going to do all comparisons. I'm going to somehow break it into blocks and only do the comparison across blocks. And, you know, within databases, there's approaches with in data mining, there's approaches within theoretical computer science, there's approaches, you know, we'll use these approaches.

48:40

But then in the implementation with PSR, we can implement that blocking as a blocking predicate. We can form all of these rules as conjunctive queries. Give them to the database and give a kind of this as a hint to the database for optimising. And this can make a huge, huge difference in performance. So. Here's a couple of different settings.

49:17

So first off, notice that this is a log scale where we're comparing performances and, you know, making use of this can give you, you know, significant speed ups. So this was the first thing that we looked at. And just to give you a little sense of the community and where the community and S.r.l. has been in terms of the scale of the problems that they've done, you know, some of the works from MLS ends in alchemy. In 2007, they were doing problems that had a million potentials and doing it.

49:59

And, you know, 400 minutes Tuffy came along, which was an implementation that sped things up quite a bit and scaled things. 2015 Fox Pixel, which was a distributed implementation of parcel, sped things up. And then, you know, in 2018 the results that use the admin were able to speed this up and do larger problems. This is version 2.1. Version 2.2 which this version the main thing was smarter memory management and kind of systems issues is able to do 150 million potentials in 10 minutes.

50:57

But we can go further and so we can go further by being even smarter about how we do our query generation for grounding. And this is nice work that makes use of ideas in the database community on multi query optimisation.

51:20

And this idea is, you know, we have all these different rules, we can generate potential query plans for them and then we can look for opportunities to reuse the queries and you know of the different set of queries that cover them, find the one that is expected to perform the best. Again, you know, finding that is a challenging computational problem.

51:51

And here's some representative results. I know these are too small to see, but one of the things that's kind of interesting is, you know, we just added a bunch of overhead of searching over different query plans. You would think that you might pay a price for doing that. And it turns out that price isn't so much. And then on the larger data sets, you get a significant speed ups. And you know, this is going to depend on the problem. It could end up being even more so from this.

52:29

We add a new line into our. You know, benchmarks. And we're able to do these 150 million potential problems in 2 minutes for grounding. But wait, there's more. So we can again make use of ideas from database theory to go beyond this and something that we called tandem inference and. Tandem inference. The idea is, you know, this picture that I showed. It really has this grounding problem that we're blocking on. It's like, first we have to ground it and then we start doing inference.

53:14

Well, what if we. Due streaming, grounding and streaming in France. So as we start grounding, we start solving the problem. Well, we can do that. And this black line is a tandem in France. And then there's two different inference methods the admin that I showed before, and another one stochastic gradient descent, but across a bunch of different canonical data sets, you know, we're able to get solutions before they've even started completed the grounding process.

53:57

So that's exciting. But in addition, we're to the point that we can now do problems that are too big to fit into main memory. So we're able to essentially take these huge problems. If you were to put it in two main memory, you give it a small blur amount of space. You page things in and out. And with this, we're able to now scale to things that have, you know, billions of potentials and do them in the same time, the things that were, you know, several orders of magnitude smaller or before.

54:45

So this whole progression we find exciting and we're pretty sure we're the people that are doing the largest problems of this kind in this complexity. Through, you know, people that we've talked to at various companies as well. So I am not going to talk about online PSL. We also have methods for kind of changing the models on the fly. What I do want to talk about briefly is.

55:22

Some of the applications that we've done, because I'm actually really proud of the work that my group has done using pencil on kind of a rich collection of problems. So we've done things on computational biology and health informatics. We have done a fair bit of work on natural language processing, not surprisingly, recommender systems. We've also done work in computer vision, energy and the environment. Computational social science. A fairness, which I'll mention a bit.

56:09

Information integration and extraction. And predicting malicious behaviour. The one of these that I want to go into a bit is the information integration, because I think it's kind of connects with research being done here and in kind of interesting ways is knowledge graph construction. And Knowledge Graph Construction also connects with the original graph identification pattern that I showed earlier where the idea is. Then you have a bunch of facts from these facts.

56:55

You're trying to extract out a knowledge graph. In order to do this, you need to perform a collective classification linked prediction entity resolution, but you also have to enforce ontological constraints. You potentially want to reason about the quality of the information and more. We've done this using KSL for a couple of different knowledge graph extraction problems, and the key takeaway from this is that we're able to do this. And all of the pieces of information are useful.

57:37

So the statistical features and the semantic features kind of putting those together really helps in another kind of knowledge graph task where we're trying to do, um, knowledge completion. So you're kind of predicting links. Uh, the key takeaway from this is. Well, the approach is when you have a lot of clean data, a lot of data, and the data is clean work. But when you have noisy data and it's sparse, things like PSL can really help.

58:20

So. I want to close by talking about kind of opportunities and challenges. And so. I'll start with the challenge. So. The perils of ignoring structure. And. I alluded to one of these before that privacy and privacy and growth. And I think this is a really important area. And a lot of the work in privacy still looks at this kind of independent setting. They're starting to be work more work in privacy and graphs.

59:04

And some of the work this was work of my former Ph.D. student, Elena Leiva was some of the earliest to look at how much information is leaked by the groups that you're a member of. Not surprisingly, it's a lot actually more than friendship links. Another important area is fairness. And with my former post deck garnish for Noddy and colleagues, we're pretty sure we're the first to look at relational fairness.

59:41

And because so much about fairness is taking into account structural inequities that if you don't have a fairness mechanism that can represent the structure, that's going to be problematic. And then finally, looking at causality and causality in graphs is important. And this is some joint work with colleagues in the database world where we developed a relational causal system. I really think that if this conjunction of all three.

01:00:27

Privacy, fairness and causality and understanding how they relate is important for methods that are able to kind of mitigate these impacts. And it's important to have the background and the social theory to be able to do the interpretation. In terms of opportunities. Yeah, all of the things that we look at these days are systems. So you think biological systems, ecosystems, social systems, things that are all these things combined. And this idea of kind of going from observational data.

01:01:17

Doing some sort of graph identification to infer out a structure, then doing, you know, metrograph ID or knowledge graph ID to infer something, but then to close the loop where you use that, to go back and extract more information out and potentially, you know, keep doing this is important and. To the extent that this ties to these four different interpretations that I had at the beginning.

01:01:56

So I think from their reasoning about entities and relationships and social ties, it's kind of obvious then this ability to go from specific to abstract and learning these things. And yeah, the philosophy was a little bit dicier, but, you know, it is, I think, going from a certain perspective sometimes in AI and machine learning, which is very atomistic to a more kind of not quite multi-agent but a systems view is important and. The last thing is the most important acknowledgements.

01:02:43

My students that's the best thing about being a professor is our awesome students get to work with. I also am fortunate to work with a variety of different companies and organisations and. Just to wrap it up to say, you know, I hope that somewhere between the more tutorial or technical part, you come away with some ideas and ideas for things that either integrate probabilistic and logical inference and are on symbolic or data driven and knowledge driven.

01:03:23

And I think there's tons of opportunities and a space that I think.

Transcript source: Provided by creator in RSS feed: download file

Strachey Lecture: Integrating Logic, Probability and Neuro-Symbolic Reasoning using Probabilistic Soft Logic

Episode description

Transcript