Hello, everybody. Welcome to this term strategy lecture. We're going to be waiting just a few minutes because I can see a lot of people joining, there was a technical difficulty at the beginning, so I'm sorry about that. But please, let's just wait a minute and let the people who are trying to join get in. OK. So, on behalf of the Department of Computer Science at Oxford University, I'd like to welcome all of you to this term straight to lecture.
So this is a series of distinguished lectures that we have one a term in memory of Professor Christopher Straightly, who is actually the first professor of computer science at Oxford Street, who founded Oxford Programming Research Group in nineteen sixty five. And together with Dana Scott, he founded the field of Denotation with Semantics, which provided a firm mathematical foundation for programming languages. Before I get the pleasure of introducing today's speaker, I get another pleasure.
I'd like to just really strongly thank Oxford Asset Management, who are really generously supporting this sequence of lectures. They've actually been supporting the series since 2014, and without that support, we wouldn't be able to bring this really, really high calibre series of speakers. So great thanks to them. It's great pleasure today to welcome Professor Cecilia Mascolo.
Cecily is a professor of mobile systems at the Department of Computer Science and Technology of the University of Cambridge. At at Cambridge, Cecilia is actually the head of the Mobile Wearable Systems and Augmented Intelligence Group. She's also holding NERC Advanced Grant at the moment on the topic of audio based mobile health diagnostics. So this explain both her research area and the area of the talk.
The Sisulu's got a huge string of awards, so I had to kind of shorten it so that you would have time to hear from her. Before her ERC advanced grant, she had an network advanced research fellowship.
There's been a fellow of the Turing Institute shows a huge number of exciting keynote talks, so I just looked for this year and amongst this year I found AI Tripoli Health Care Summit, IWC and Smart Comp, and from last year ACM welcomed that hot mobile addition cited this year for the I Talk and has a whole also the best paper awards, including recently a 10 year impact award at ACM.
Giving up her talk today is entitled Mixed Signals Audio and Wearable Data Analysis for Health Diagnostics, so we're really looking forward to that. Before I welcome Cecilia, I just want to make one technical comment since we're still in pandemic mode and that's that during the talk, what you can do is you can type questions into the chat, and at the end of the talk, I will be reading those out so that Cecilia can answer some of our questions.
OK, Cecilia, it's a huge pleasure to welcome you virtually to Oxford. Leslie, thank you very much. So before I share my slides, I would like to thank you and. A. A, I guess you can hear it, so, Leslie, thank you very much. I was muted for a while and I would like to thank you and your predecessor, Michael Wooldridge, for this invitation. It's a great honour to be here. I'm sorry that I can't meet you all in person and have the live interaction that we could have.
But I will now show my slides and hopefully this will be. Some interaction was still happening, so I will now assume you can see my slides and I thought my talk. And so there is now in our daily lives, a constellation of wearable devices that are sensing our behaviour and impartially in a perhaps a more indirect way.
Our health. And so one would imagine if you have some of these like phones, watches and durables that this area is kind of dumb, that our health is kind of transformed and there is no research for us in academia to do around this anymore. Well, in this talk, I would like to really highlight that what we're doing at the moment and what we see in these devices that go into consumers hands is that we are really playing with the sensing and with the data that comes out of it quite superficially.
And we really need to go through a number of breakthrough to really transform health. And so in this talk, I will first talk about the challenges that we're facing in the exciting possible opportunity that this and this could could innovate. So these are only some. So I have this challenge. I will introduce them a bit and I have two reputation, two examples from my research that in which I try to explain some of this.
So the obvious first one is that sensory modalities, of course, new and new sensors. My colleagues in engineering are coming up with new ways of sensing our behaviour. So I have a colleague who works on EEG. Portable EEG sensors are becoming smaller more or less and that they are being weaved into a fabric, possibly even tattoos.
Pills are being developed so they can be ingested, and contactless communication between ingested sensors and external devices can happen so that you can can sense since our health in a less invasive way or disruptive for our activities. But at the same time, existing sensors that are already on devices that we wear generate amounts of data that we are not quite at the stage of being able to being good at modelling the kind of final aims that we have.
And this can can be taken further by saying that these devices generate data. Often the granularity we haven't seen before and can be placed because of the type of data they are. They can be placed in in in parts of our body sometimes that we have never thought of sensing. So some interesting conversations I've been having with them with clinicians are often of the style. But what if I could give the long term sense and continuous sensing from perhaps your abdomen?
What kind of things would you be able to do? And this is so far from what they're used to see that even that kind of conversation of what can can be done with the sort of data is missing.
In my examples today, I will talk about longitudinal sensing, the fact that now it is much easier not just to have fine grained data continuously, but also for a long, long time, which means that we can assess differences from past and present, present and possibly future when that comes and look at predictions with this sort of longitudinal data. One thing that is important here is that the studies and the techniques that have been used are often until now used on small scale trials.
Small small cohorts and often the free living aspect of the analysis is somehow missing you. You have more control over the labels of the data sometimes in this study. And you know there is less ability to adapt to unforeseen and noisy data comes out of that. And I'm sure for the fourth bullet point, I'm preaching to the choir in this department and that but I will I will talk about this nevertheless.
And perhaps it leads to an interesting conversation where we are talking about clinical and diagnostic. Six aspects of uncertainty. So the ability to go beyond the concept of accuracy of a prediction is important, but so the angle perhaps that is new to this department is the fact that in I will, I will show that in in mobile.
In the case of mobile data, it is possible to weave this uncertainty in the pipeline on how the data is collected and recollected, because the entry point of rerecording and getting more data is so low that if it drops a prediction of certain data is uncertain, then maybe we can easily collect more data. I hope this is clear to a point that it is not clear.
I have some examples on this, and the last point is obviously one that if I don't talk about amassed at the end of the lecture and is related to privacy. My perspective on this is that privacy is somehow could be embedded in the process that we develop to make sure that a lot of this can be done closer to the users.
And as a systems researcher, I will show at the end some examples of how perhaps we can bring this further closer to the users by perhaps developing models as a clinical trial with the side effect is the, you know, the lack of privacy for their users in the trial. But once we develop more and more models, the models can be deployed at scale, and the privacy of the users is respected because those models will run on devices close to the users and on their data locally.
And so there is more of this. I wanted to give you an anticipations because I know your attention, especially online, is limited, so I will try to start with the first example. So as I said, I have two on two different types of diseases and data syncing data that we use for that.
So the first one is about cardiorespiratory fitness. I don't know how many of you know, and I certainly was surprised by the fact that cardiorespiratory fitness is a very important factor that is inversely associated with cardiovascular diseases. And, interestingly enough, is much more indicative than cholesterol, diabetes, hypertension and even smoking. So it's very important to assess cardiorespiratory fitness, and this is a project we're doing with MRC epidemiology.
And so if I was in a live lecture theatre, I would now ask you to raise your hand if you have ever done one of the strenuous tests and with on tests in which you go on a treadmill or bike. And then you have a mask. And this test is cumbersome, is very strenuous because they often push it to the end of your abilities.
And this is to measure your cardiorespiratory fitness as a means to by by, by measuring your VO2 Max, which is the maximum volume of oxygen that you can breathe in and then that is transported through the bloodstream and then eventually transform into energy by your muscles. Now, as you can imagine, this test is is is actually not very scalable.
You need equipment and it is strenuous. So what could be a geologist and and people who study this sort of relationship and been doing for for the moment is to proxy this with other measures such as anthropometric measures, demographics, height, weight, BMI, as well as questionnaires about how many times to exercise, what type of type Typekit exists. And this is this is a good proxy already, but you can imagine, you know, you probably know where I'm going now.
There are lots of wearable data that could be used as a proxy of that sort of questionnaire data and turns out that also resting heart rate, which can be measured quite easily at least more easily than exercise, is a very good proxy indicative proxy for that. So where are we with bringing wearables into the detection of VO2 Max?
Well, if you have one of these most modern devices, you know that some of them, when you tell them what exercise you're doing, are giving you already an estimate of VO2 Max. And so this is happening in real life. There are very few studies that are showing the effectiveness of this promising of the wearable data measuring activity as well as heart rate and is a proxy for VO2 Max. And most importantly, there are essentially none that do this in free living conditions.
If you remember, my first life free living condition was one of the important thing because you really don't want to have all having to have to label the data so much from from the perspective of the user. So. The free living aspect is very important. And so now I have a few slides on the study we are doing as this is a keynote, I hate to present research that we have already always published, so I always tried to push myself to present something that is something that we are doing.
And so this is not yet something out. This is something we are working on. And so this is the measurement of cardiorespiratory fitness through wearable data in free living. And this is way this works with data. The MRC epidemiology is collected. It's a study called Fenland. It's a data set and they have a number of participant 11000 in the first cohort and then seven years later, and they have another cohort.
Then in this particular case, we're using a subset of that. Of all these people, they do two max tests on them. So they do do the test. They measure anthropometric measures, as I say, demographics as well as height, weight, BMI and they ask questions them. But they also ask and this is this is, I think, invaluable them to wear an accelerometer on their wrist, as well as an EKG chest strap to measure their heart for six days, essentially very much continuously.
So this data is a lot of data, and I remind everyone that this is in free living, we know nothing about what they're doing in those days, but this six continuous days is not too much of it either, but it still generates a lot of data. So what we are doing with this is by using his input, their heart rate and the movement data on which we calculate a bunch of features which you can imagine just to, you know, aggregate all this data.
We feed this into, you know, two layers of a connected neural network, and we use this to do a few things now in the next slide and show you some results. But essentially first thing is that we try to use this data to proxy the prediction, the fitness levels and the VO2 Max test. And the fact is that we have the Granton. So we can you can. You can tell how well we're doing and which which sensor which aspect of the data is more important.
We've also tried to see if the models are robust enough. So if we train on the original cohort, how does the model do on the later cohort? And we're trying to understand these sort of aspects as well as looking if perhaps maybe movement is also a good indicator of heart, you know, heart rate, for example, and this is what you're going to see next.
And so here, I guess let's go straight to the figure, which is the the easiest thing to interpret here on the x axis, you have the VO2max of the users and the the two different distributions are the predictive versus the to the ground through distribution as you see, they match reasonably well. We are still, I would say, under predicting for a portion of them, as you can see the purple coming out to the back there.
And on the table, for those of you who do like tables, we break down the different results on the arrow, the arrows, the Arabs and the Romans square in the column there. And then you see if I can point to it of the virus just using anthropometric, excuse me, resting heart rate mix in them and then adding the wearable data. As you can see, of course, the wearable data is interesting. It does help the prediction.
And while I was discussing this predictions with the Finnish biologist, I, you know, I always ask, Well, is this improvement reasonable? And their also is quite interesting because they say, well, it depends what you're trying to do. So sometimes we really need even just this one point more to to really be precise and so that, you know, they're striving to get this obviously down as much as possible.
As I said, this is this is really, you know, at the beginning that we think it's the right direction and interesting to look at this. The thing to remember that this data is from people that are not all athletes, which is something you find often in in some of the studies. These are normal people in which this information is very interesting and could lead to good outcome prediction along the same lines.
We are trying now to also see if the wearable data, which we input in some other machine learning framework, the details. I just put some people down at the bottom. You can look on my web page if you want the details of the technical stuff. I don't think this talk is particularly about all the details of the neural network architecture used. So we use the the data from the wristwatch and trying to see if we can forecast heart rate from that.
So now this is the the main task that is being performed and the next slide, we'll have some results in full of where we are with that. But also in the meanwhile, what is interesting I think of this technique is the fact that the later representation at the penultimate layer of the network, the network is learnt from this data of the activity is quite predictive of other clinically relevant information,
such as BMI, age, sex and energy expenditure. And so we have all of a sudden, interesting relationship between what the network is learning and what what the characteristics of these individuals are. And here are your numbers, essentially. And so in the first table here, if you see my mouse, I'm not sure you see that.
We have tried to stick to what is our technique and we've tried to use it just with acceleration, data acceleration and the temporal features that are embedded into the data that we have, as well as adding additional resting heart rate. Now, as I said at the beginning, resting heart rate is a measure that one can conceivably think. To get reasonably in a reasonably discreet manner is something that you can ask can be checked in a very quiet moment of your time.
Maybe when you lie down and and not so frequently so it's something that is conceivable as a measurement that doesn't cost much to add. And as you can see, you know, I see I see is substantially decreasing, I would say reasonably decreasing when you use, when you start using the rest of our trade in addition to an acceleration. So clearly, I mean, epidemiologists know about this.
They know that this is an important feature of your your fitness and clearly in the obviously very correlated with your general heart trade in general. And here we have the acceleration indicating the amount of activity that is indicative also and and drives can be a proxy for your heart rate variability and not mobility level.
And the other table, we see this outcomes that we have and, you know, with various principal component analysis reductions from the from the features we have that we we have reasonable prediction of some of the demographics height six, somehow even age, BMI and weight. And so so here, I guess with this first part of the work, what I wanted to highlight amongst the original bullets of things that we have are a few things one three living data is more difficult to deal with,
possibly more confusing. You often don't get this beautiful results, but we need to work with this because if you talk to epidemiologists or people that do this kind of large scale work, they are interesting in monitoring the population and they cannot afford to do this with something that is very controlled. So we need to find techniques to do it. The second aspect is the continuous and the longitudinal. Now I probably in the result, have really just scratched the surface of you.
So we have Finland one, Finland two. We have, you know, people monitor a difference of seven years. The only thing we have done in that respect is to monitor, you know, monitor. Our models were robust at the moment in time. But there is a lot more that we think we can do now for the rest of the talk.
I will I will bring back this, but for the moment, I will just move into another example in another sensor that we have been used using quite a bit and that's related to the microphone that is in our devices. So it's about the application of this to auscultation and auscultation in general. So perhaps auscultation that you as an audience know most is related to how it auscultation or respiratory auscultation to stay that what turns out and what what I've been told also in person.
But then I come. The citation is that auscultation is very difficult for a human ear, and often junior doctors are not skilled and and this these are from what I hear, not so train in this auscultation because this can easily be proxy by other devices. So for cardiac echocardiograms, how substituting auscultation by it and just the the the stethoscope. On the other hand, machines and microphones are in our hands and they are cheap.
And most importantly, they are with us all day, which means that with respect to the discrete auscultation that the doctor could do on us, this thing can listen to us continuously. Now this has advantages well of its challenges, but also opportunities. And so I will start with with an example of of of audio that you might be familiar with.
And that's, you know, voice in 2017, this MIT tech review highlighted that voice could be indicative not just of perhaps what these to you more intuitive psychiatric and psychological diseases, the voice and the fact that perhaps you can hear stress from the voice is kind of something that you might have heard those, but also even heart disease. And the intuition behind that is the the vocal cords and the respiratory tract is somehow very intertwined with the cardiovascular tract.
So perhaps a hardening of the arteries might make changes in your voice more prominent. So that's kind of, you know, I can do the time computer scientist interpretation and in lay terms of what the situation is. So it's not just about data that comes out from our vocal respiratory tract. It could also be data that comes from our heart. We know we have EKGs in our watches already.
There's a one lead in the in my Apple Watch, but there are pathologies that can only be heard of or seen through an echo cardiogram. So auscultation is important. There are the start to be collections of data set from digital stethoscope that can be used for auscultation of heart pathology and below. If you're interested in again, a reference of one of the one of the work, and we're not the only one working on this and on on on how this can, can be can be done.
The problem in general is that while for speech and there are very many datasets available and people are really concentrating on the techniques here, there's very limited data and in some cases there is really no data. I was talking to a colleague who is a respiratory clinician, and I was asking them how they trained their doctors. And she was telling me that the main technique is to listen to the same patient.
So the consultant listen. The trainee listens, and then they learn how to understand respiratory. But, you know, having data banks and which, you know, I'm sure you know where I'm going with this. So the collection of these data is as important as the analysis of it. This is a review from 2017 and one of the many that can be found, so people are really clear on the fact that having data can be useful in creating models.
And here are the examples of things that can be detected using this data asthma, COPD pneumonia are three in this particular abstract. And so while I was mulling over this as part of my IAC Advance Grand Corbett, it started to happen and a couple of colleagues been in touch. They knew about my project and restarted collection of data through an app that we pushed for.
Now I can get a separate talk about how difficult it was to push out an app to collect sounds had COVID in the name at pandemic time. This was March, April 2020. Yeah. You know, you can ask me at the end and I have thoughts about how this can be changed because the time we're really trying to do something useful.
But the result of this collection, and I'll talk a bit more about what the data we're collecting is is contained into a large scale dataset that we've just pushed onto the new dataset track and will be released momentarily. We already released subsets of the data in this data is private and very sensitive, and therefore we are releasing this phase with data transfer agreement between institutions. I can say more if you have more to ask in the end, what does that do?
I'm spending a little bit more time on this because it's very timely and I think we've learnt a lot by doing this and we're still we're still on it. You know, in addition to record demographics, medical history and symptoms which many other apps are doing, we are recording songs for recording breathing songs or recording costs on some rare recording voice sounds.
And again, I can't give another talk about how we decided to go for this sentence that you see on the third screen about what the user needs to read and perhaps what we should have told them to read. Once I talked to better experts, you learn by doing. Why? What's the holy grail here? Well, we have all this very cheap lateral flow tests.
We have more precise speakers, but we think for these diseases, perhaps having additional scalable, contactless, affordable and I should add sustainable ways of testing. Even that lower precision would be very valuable. And after working for more than a year on this, the conclusion I came through is that this is really a very valuable tool when you're looking at respiratory disease progression because, you know, the licence is a digital device could be really, really, really valuable.
And so, again, just because this sort of I like grass and this is interesting, this is the data we have collected, so we do ask for some ground truth to the user. We asked them to report if they have tested for COVID. All of this is crowdsourced so they can lie. People have been miaowing into the app, so we have lots of noisy, dirty data that we have to clean and look at. And so mainly most of the data is in fact negative, as you would imagine, and we have some COVID positive data as well.
We ask the users where they're from. You can see the bumps into the data collection down in the when we did a press release or someone court heard about our app. And, you know, age and gender and smoking status information are also in this graphs. One thing I should say, because I'm sure you're you're asking yourself this is, well, why would you be able to see this? Why, why is COVID different? Well, we don't have that information yet.
I would start by saying that other researchers have been in touch with the researchers, obviously have been trying to do similar things. And I just point you to one which I thought was particularly useful by the group in CMU by Rita Singh, who is as done analyses of characteristics of COVID voices.
And since then, we've been contacted by many clinicians who are essentially saying, I think I can hear it when a patient comes around and sort of people are really on the ground in the front line and they thought they could hear something. The reality is that, you know, depending on what data we have, we can perhaps do a different set of predictions. So will we be able to distinguish COVID from the flu?
Well, we have absolutely no data from the flu, so this is something that we, you know, we are very interesting in trying to to understand. But let let's not get ahead of myself too much. This is just one slide that shows you. Well, we're done to get to the task. Can I distinguish from the sounds of a person if they're COVID positive?
And we have used the pre-trained model of each model, which was strained on previous audio like large scale audio datasets and then extracted the features and then concatenated them and used them in a for it. That then was used for the prediction. And then you have to task. One is really the diagnostic task trying to say, is this sample yes or no? And one that we're still working on more longitudinal. I'll leave it there for for later.
Now I give you only one one one information, one piece of information. I mean, these are three papers we published. The last one, which is under review, is this one that explores the realistic performance of audio based digital testing because we realised that there was a lot of hype at some point people claiming performance of 90 plus percent, which we didn't really believe. And we think a realistic tool that perhaps says data not yet about colds and flu could be around 0.7 percent performance.
However, as I said, every time we tried to integrate our dataset with other dataset that had other diseases, the machine learning framework was too smart and would detect the dataset rather than the disease. If you have questions on this, I'm happy to take it and we're still exploring what this sort of thing can be useful for and having a better ground truth and having better data of other diseases is important.
One test we have done is to try the model on data that we have of people with asthma, and you seem that there. The model wasn't easily confused by that, but I'm sure it would be confused by other diseases. So it's a matter of deciding what this could be useful for is more of a, you know, a public health question than a methodological machine learning question at that point. Now the final important aspect here for me is that the.
After we reflected over this for a year, I think the sort of tools will become invaluable to keep patients out of hospital and look at their onset, as well as progression and recovery. And we're asking our users to give data every couple of days. So we start having samples. And one of the volunteers has given us more than 250 samples and every talk I give. I'm essentially thanking them for this. This is very valuable data at the bottom.
Here you see a graph there shows how we could possibly be able to see the progression of someone with the disease before they test negative the Green Party's where they have a negative test. And the other part is where our data and our model starts to decline already in the probability of this prediction. So the ability of and this is not personalised or anything is just using and, you know, sequential modelling technique. But but the idea is to get here and this is possibly not just for COVID.
This is something that we're trying to think more generally and scale up to other diseases. Now the last part of this is a reflection of of how, especially in this case, you know, the idea of having this prediction, this one number that gives us COVID non-COVID is is useful. And you know, the the we looked around. And of course, if you look at uncertainty that I found, for example, this paper that says that essentially the computer measure of uncertainty allowed.
So this is an example of diabetic retinopathy and images, essentially. And it was it was saying that computing the uncertainty of the prediction allowed them to refer the subset of difficult cases to further inspection to perhaps refer them back to the clinician. So this this is obviously one use of uncertainty. What I would like to highlight and I will go through this complex graph in steps and I have time for that, is the fact that with digital intervention, uncertainty could be used to.
You know, being integrated in in the process of not just where the clinician comes in, but also where the need for more samples come in. And so this is even more useful and lowers and has a very low entry point because the data is digital is easily sampled, at least in this particular case. So in this paper that you see at the bottom, we are essentially solving two problems and one at once.
The first problem is the generation of uncertainty over the prediction value of the COVID prediction, and we do that by using and single mothers are not just using my models, by using multiple models and aggregating the prediction variance and deciding when you know, there wasn't, let's say, certainty about their prediction, then declaring that that is an uncertain prediction. The fact of using different ensembles also was solving our problem that our data was mainly mainly negative.
People declare that have been tested negative. So we use just one positive set and to balance it with multiple negative sets in the different kind of ensembles that we have different instances of the samples that we have.
One interesting piece of information is that the graph at the bottom where we noted that the uncertainty tended to be higher when we had the sort of wrong prediction and so indicating that, oh, you know, we don't know if this is in general, but this is certainly an indication that perhaps you know the wrong predictions retaking the data. Perhaps the data was noisy and was therefore predicted in a certain way.
And so this is, you know, I will stop here on this. But if you want to read more again and there is a paper there and essentially the last couple of slides for you is related to the privacy argument that I've made before. Now, machine learning device is an open area of work. Many researchers have made strides into compressing the models into using various techniques to make that happen.
We're still not there on a number of things, including perhaps training on device, but I think the agreement from the community is that perhaps more than training on device, we are interested in incremental learning. So having a model and then perhaps adopting it on device, that's more interesting. What I found particularly interesting is the bringing this idea of how if we have uncertainty estimation in the models,
can we then also bring that on device? And this is again, if you want to read into this area, we are by no means alone in in this quest, but that's one reference of works that we have been doing on this. And as I am, this is my last slide before the questions that, as I said at the beginning, there are other new types of devices on which we can start doing these things and obviously having having sensors around your head in your ear like I'm wearing right now is very interesting.
We found the use of microphones to perhaps monitor breathing and heart rate already from with the inner microphone. These things have and again, this is an initial work that just does activity recognition use of that microphone, but that's the direction we're going. So doing all this, you know, data collection and perhaps then analysis on device is the general picture of here. And then here is my slide of things.
Obviously, I can't do this. I've done none of this. In fact, all of these people are the people who have done it all. And if you want to contact us, here are the details and thank you very much for listening. Even if just online. Thank you very much. Thank you so much, Cecilia, for really stimulating lecture. I've got some questions that people have been asking here, so I'm going to read those out. Also, people can feel free to send more in as we're going so well.
Actually, the first one I want to read is not a question, but a comment. It just says, thanks for the lecture. Your work is so interesting. So I think that many people wanted to write this comment, so I thought I'd read it first. Thank you. OK, so then some more technical questions. The first one asks about the work for measuring heart rate, and it said, Why are we predicting heart rate rather than measuring it so?
So what's going on there? OK, so so it's a it's a very good question, and we are also working on trying to measure it. But as a researcher, you know, we're also interested in trying to see what is the right proxy. There are cases where at the moment, measuring heart rate is not precise. So we have PPC sensors on these devices that have been proven to have all sorts of biases movement on the wrist. I heard the talk from from an expert well, in Google, in fact.
And you were saying the breast is really the fun place to have a heart rate sensor because we move it so much for many other reasons. So we are trying to see what else so you can measure it from here. You can measure from the places. But another line of research and you know, I'm I'm big on the question why not? And I guess this I would like to answer this question is why not? So this challenge that one sensor is always the best. It's something I like to do this.
I hope that that answers the question. Great, thanks. We have another person who's interested in well, and actually I'm sure many people interested in what what types of neural nets did you find more effective for making the predictions from the wearable data? That's that's a very good question. I mean, it depends in the sense that if you see the two presented and one of them is using a CNN plus Joe use,
the other one is using just two dense layers. And so in in the in the ones using the twins layer, we actually were using it. We're essentially condensing the features. We're using features instead of the raw accelerometer input because that was essentially too much data.
So I'm sure I'm not answering this question. If you're looking for, but I can point you to literature, of course I was teaching, in fact, where I know people in Georgia Tech have looked at the best techniques for accelerometer activity recognition data. And I think there were big on Elysium's, for example. So, yeah, but I think the jury's out and it really depends how much data you have and
what you're trying to do if you're trying to combine multiple sensors or not. I'm probably not even the right person to ask this. I will ask one of my students, maybe you can send me an e-mail and I put you in touch with the people on the ground with this way. OK. So another person's asking about the COVID predictions, and the question is, you've got a model that's trained on a kind of population and what the accuracy?
How would the accuracy increase if you had a model that was trained per user and actually also is that even feasible? I mean, could you do that over time? Well, that that is, I think the next step, the problem is that we're missing data. So so the moment we use the the general model, because that's where the data is, you only have mainly one sample per user. But if you were to start collecting personalised sample day after day, I even had people that said, we are so different.
Our voices are so different that I don't expect this model to be more precise. The next, because you know what you sound is different from what I saw. And then you found mass MOCA is even more true. So, yeah, personalised models are really the way to go. And I think this is even more important for progression when you're trying to monitor progression, knowing the baseline of yourself is where we're going. And I think the lack of data is stopping all of this research at the moment.
OK, actually, these questions are coming at a huge rate, you've obviously stimulated loads of people. Let me try another so faster than I can even read them, which is excellent. The one I want to ask you next is about the difference between in-ear sensors and wrist sensors in terms of if in terms of noise. So what's more noisy? And actually then the question ask, you know, is this more sound based or yeah, OK. Basically, that's the question which is more noisy.
OK, so the device is essentially virtually there's virtually no research on your able heart rate and respiratory sensing, and we will look into it. We have nothing published. At the moment, we only have the paper that they refer to the most, especially if at the at the bottom where we do activity recognition, we are now monitoring heart rate. What I can say is that the head is a much more stable place to monitor, you know, activity and possibly even physiological things.
So, you know, the wrist might not be the best place. There are very few comparisons for heart rate monitor on average ear durable at the moment. So I guess I'm not. Yeah. Maybe next year we can talk about this with more data. Certainly promising. And here's somebody who's asking about the difficulty that in the real world, you don't have labelled data and what are some of the effective methods that you can use to try to get around the lack of labels?
That's a very good question. So this is something that the community is really looking very much into. Obviously, transfer learning has been tried, so supervision has been tried and transformation. People are trying to use ancillary tasks as well as in the way that we've also tried to to. And I think there are techniques that have been applied to other data that can be tried here. But the problem of labelling in wearable is really perhaps bigger than in other domains.
So, yeah, I think I mentioned the techniques that we've been using, but it's an if you if you're not working on that and you want to work on that, it's definitely, I think, interesting. OK, OK. And there's someone asking about something that you alluded to, but maybe didn't quite have time to tell us enough. So this person was saying that what were the biggest challenges with the app deployment and converting the data into results?
You said you had some thoughts on how the process could be improved. What are they? So our problem was that we were blacklisted for about a month from Google and Apple because the app had the Corvette in the title and it was considered kind of an exploitation of a large scale event. And so we had to I had to plead to the head of public health in Cambridge to send the letter through. So we through the normal forms that Google has and and to send this letter, we are not we are not playing around.
We are trying to do a research study on this and the day we have all the ethics, we have all the data transfer agreement in place. And so there is the seemed to be a missing pass to connect academic research and this sort of large scale deployment for this sort of, I would say, maybe excluding mine, but generally very important studies that can happen through the deployment and the large scale collection of this data.
Obviously, you know, privacy is up there as as a big banner, but it is already finding application that is published in.
And temps. Well, yeah, I was alluding to this when I mentioned that, but definitely another another important lesson is people have asked about our data why you not only coughs, why you're not releasing this data publicly to everyone in some groups have and in consultation with experts in the university, we have decided that this data is actually more dangerous than people think.
And we also had this conversation with the chairs of the new reps, data and data track where we released our data because their good for paper. The beginning was this was asking us to release the data public if we were to submit to that track. And I wrote to them and said, Well, it is wrong to release this data publicly because someone could reengineered the identity of someone's voice or even cost, in fact, just by correlating it with something else, public and publicly available data.
So this process have taken time, but I think we got that right. Thank you, sir. I'm just going to ask one more question, and then I'm just expressing because we can't do it live so much thanks from all of us listening. But let me ask one more question. So this person says, could you say a couple of things on current research on mental health analysis using wearable data?
Do you have any general thoughts or directions that you might find interesting? So we have worked on collecting data for mental health. Five, six, seven years ago, in fact, the one of the papers that got the the pen name, but your award was in fact the one that was doing emotion detection from voice on device on a very old Nokia phone that had wonderful battery. That's why we could do it. We Gucci mixture models. So that was on the voice.
We also collected data from accelerometer and, you know, questionnaires and then tried to correlate those with mood. We had mood reports, so we have a large dataset that had the sort of information. So I think these days these things are still ongoing. I think the finding that the finding of the phone was striking at the time was that someone having a light.
So having this syndrome with on their phone, having some sort of activity that didn't mean exercise, it just meant that they were going somewhere all the time where they were using the phone was actually positively correlated with mood. So, so definitely I mean, this this is really important, but there are various aspects of mental health the expanding to Alzheimer.
I have a project on monitoring memory and Alzheimer correlation with the ability to navigate, which is apparently one of the first thing that disappears with your assignment is also another area where these devices could make a difference. I could go on given our talk, but Leslie would stop me. Okay, thank you so much, Cecilia, for a marvellous talk and thanks again to our sponsors, Oxford Asset Management, and thanks to all of you for attending online.
So, so that's a day lecture. Thank you very much. Keep.
