I think the sort of the kindest way of describing me in the context of this room is very much as a non expert, at least when it comes to ethics. I am a statistician, am a geneticist and I worked in the field of using genetics to understand human disease and to identify sort of opportunities for new therapies or to better sort of predict where people are on their trajectory.
I've done that for many years and and over the last few years, my role has shifted a bit from purely being someone who was doing research to someone who has been thinking about the kinds of infrastructure and kinds of communities that you need to build in order to make sure that this growing part of biomedical research is sort of done well at the university level and translated well and ethically,
of course, into practise. And so behind that is my very much my role as the director of the Big Data Institute, which is one of these recent now not quite the latest and new centres to to have appeared within University of Oxford. For those of you who don't know us where we are a physical thing, we're a new institute and new building up the hill in the old campus and there's a somewhere between 250 and 500.
And when you at how you can people within this institute but who are united by this, the desire to use a data driven approach to understanding the causes of human disease and identifying risks to intervention. So we're entirely driving that. We're entirely computational as it were. And really, what we are about is creating this fuel for air.
Now we've heard an awful lot about this of the ethical issues of how you actually use A.I. in context and understand the regulatory or the legal aspects around that, perhaps. What we haven't had quite so much about is the process by which you can acquire what is the fundamental part that has to go into a process which is very much the data itself. Within the institute, just very briefly, there are four types of things that we do.
First is about how we measure things, the sort of measurement technologies. The second is about how we bring all those data together to create the research ready. The analysis ready data sets that our researchers and others can come in here into and try and identify the structure that ultimately leads to these new insights. We have people from statistics, computer science, engineering, epidemiology, genomics, et cetera, developing methods.
That is, if you like the A.I. algorithms, which are going to peer into this, these kinds of rich datasets. And then finally, and probably why I'm here is that the fourth pillar of what we do with this institute is to think hard about the much more wider societal aspects of this data driven. So issues around consent issues, around privacy, security issues, around governance issues, around sharing, intellectual property and so on.
And we made a decision right at the start of putting this institute together that this was something that we wanted to go on actually in the building.
It's such an integral part of doing biomedical research these days, and the issues that come out of this kind of research are so deep that if we don't train people in how you should think about conducting this kind of research and you're going to build the right practises into how people are doing, prosecuting the programmes actually at the point of implementation, then you've kind of you're starting on the wrong foot.
So we very much put that at the heart of the institute. Much of ethos is based within and the the Big Data Institute, and we very recently we got funding from the IPCC to set up a new centre for doctoral training in health data science. One of the key pillars of that programme being that these data scientists and machine learners and so on would be trained very much alongside all the other skills in what they need in the skills to think about the problems from that standpoint.
So it really, really is central to how we, we think and if anyone's interested in use cases, you know, coming up to us and talking about the types of problems that we're working on, the types of dilemmas that we're faced with, then please do get in touch who be more than happy to talk.
So I just wanted to say just a couple of things about at least my sort of personal perspectives on why the types of research that we're doing now, which is very much within the tradition of biomedical or medical research, why those are a bit different and why they're raising new challenges from from the sort of the ethical perspective.
And I think a really important point to stop, which is perhaps not so well understood, is actually that the growth of AI and machine learning technologies within biomedical research has really led to something of a shift in how medical research itself is conducted. And this comes back to this question of how do we how we get the data. So used to be that in medical science, you had a hypothesis, you you decided I wanted to test some particular question.
And off the back of that, you designed an experiment. That experiment gave you some data, you analyse the data and on the back of that, you made some conclusions and maybe you came up with the new hypothesis. Importantly, those data were collected specifically for that purpose. And you went Typekit. You explain to people why you were going to collect that data and what you hoped to learn about. That's a very clean way of doing science, but clearly it's not massively scalable.
There's one question that you could ask of those data and essentially one. Now, about 10, 15 years ago, biomedical research sort of took a side step. It changed direction a bit in how it collected data, and a lot of that came out of the world of genomics where people realised and who had been studying how genes affect diseases. They'd been studying sort of their favourite gene and their favourite disease and a particular combination.
And the literature was full of incredibly bad results that never repeated and we're massively underpowered. But 15 years ago, what happened was a change in technology. So was changes in technology that start things. That led to our us being able to experiments, not just on one gene and a handful of individuals, but the entire genome in tens of thousands of individuals.
And that led to this idea that rather than going in with your specific hypothesis, actually the most powerful thing is to go in without a hypothesis. You go in and you just collect data and you let the data tell you what the answer is. And that idea has very much percolated from just thinking about, well, let's study the whole genome and one disease.
The genome wide association studies what essentially that idea to the idea that you go in and you collect genome and you collect everything that you possibly can about an individual's health, environment and lifestyle finances, you just collect everything you can. And later on, you decide what your research question is. Now.
The success of this programme is sort of made real by something the UK Biobank, which many of you will probably know about, but about somewhere between one and two percent of the adults within the UK have consented to have that entire medical data, their entire genome sequence and huge amounts of subsidiary information about them. The lifestyle that can cognition their parents, sometimes the children.
Huge amounts of information made available to people like me and people like you and people in companies and people in China and people in the U.S. All you have to do is to to sign up to a very few sort of restrictions about what you're going to do with the data. You have to say roughly what you're going to do with the data. You have to say that you're not going to try and identify these people. But beyond that, it's really not very much that you have to say that you're going to do.
And as a consequence of that, there are people all over the world probing the tiniest details, the most intimate information, about half a million people within the UK, some probably indeed within this room. So it's an example of how our way of doing research is really shifting. This shift is exactly what enables the whole A.I. revolution in medicine and health care. But it, of course, brings up all sorts of questions about what it means to be informed about research project, which has no end.
What it means in terms of can you ever can you ever comprehend the sorts of things that I might learn about you? If I bring together lots of sorts of information that you would never have had and what what would you like to know if I could, for example, predict whether you're going to get this disease in the next 10, 20, 50 years, huge amounts of new challenges arising from it, which we're only, I think, just beginning to describe the top of that. I shall shut up.
