Applying Federated Machine Learning To Sensitive Healthcare Data At Rhino Health

Unknown

00:10

Hello, and welcome to The Machine Learning Podcast. The podcast about going from idea to delivery with machine learning. Your host is Tobias Macy, and today I'm interviewing Ittai Dayan about using federated learning at Rhino Health to bring AI capabilities to the tightly regulated health care industry. So, Itay, can you start by introducing yourself? Hi. Nice to meet you, I'm Tobias. My name as you said is Itay Deyan.

00:37

I'm a physician scientist in background and management consultant and then turned AI development leader at Mass General Brigham. In my current role I'm the co founder and CEO of Rhino Health, a company for distributed compute driving capabilities to the edge in healthcare, along them among them, federated learning. And do you remember how you first got involved in the machine learning area? Well, when I was, conducting clinical research, over 5 years ago, we were looking to quantify disease,

01:16

using the digital biomarker. And it seemed like a very attractive notion especially in neuropsychiatric disorders, which are often how to, really quantify and be sure if treatment is helping the patient isn't helping the patient, compelling treatments and such. And that concept looked incredibly attractive.

01:39

The difficulty we had then is getting enough patients, which is still a very challenging thing in many cases, And being able to capture, record, validate these biomarkers and, you know, take them out to the clinical world where others will be able to use them. And so I got really excited about the concept of not just, doing 1 study that, you know, might take 3, 4 years then wait another 5 years to the biomarker is used by somebody and maybe another 5 years until it's actually propagated.

02:11

But actually turbocharging the entire process and making all of that happen much faster and with many more algorithms. And that took me to Mass General Brigham and later on to start my company Rhino Health, as a ways to accelerate scientific creation and translation.

02:30

And in terms of what you're building at Rhino Health, can you give a bit more detail about kind of where your focus is and some of the story behind how it got started and why you decided that this was where you wanted to spend your time and energy?

02:43

Sure. So a few years ago, I led an international consortium called the exam consortium 20 many constitutions worldwide, in order to retrain and validate an algorithm for predicting outcomes of patients who appear to the emergency department with suspected COVID and the goal is to help physicians manage these patients by introducing another kind of score clinical score or kind of a COVID biomarker that can help predict whether these patients would need mechanical

03:19

ventilation and what the outcome of a patient is going to you know potentially be. The reason we used federated learning for that consortium was due to the fact that getting disseminating algorithms in different institutions takes a long time. You lose control over the algorithm. So it isn't really a joint platform you can work on And on the other hand, receiving data from institutions around the world was highly impractical.

03:50

You know, you have some institutions that are subject to GDPR, some to HIPAA, some to you know what and you know other regulations and just too high barriers and too slow for process to centralize data. We ran the study. The study actually happened much faster than I would have imagined.

04:10

I think we got all sites signed up maybe 6 weeks and a few weeks later we already had an experiment running. Also due to our partners from NVIDIA we're using the early platform for federated learning called Clara FL and so clearly we were very excited about it. We were even more excited when the results came in and we could see that the algorithm itself actually generalized much better than any algorithm created in 1 institution

04:38

and demonstrated capabilities across the board. Meaning, wherever we took the global algorithm, whichever institution, algorithm of algorithm at all using federated learning and that this was really a way to accelerate these kind of collaborations. And so after this study, I went to, NVIDIA and to others and said, you know guys maybe somebody should turn this into like a full blown feature rich commercial platform. And, you know, people telling me, yeah. You know, we like building SDKs

05:17

and chips and infrastructure, you know, stuff like that. But this sounds like a much more kind of like up up the stack solution. But we said maybe you should build it. And so I thought, yeah, maybe I should. And that was the early inception.

05:31

I've never been a deep tech CEO before. I wanted to work with somebody who's you know, done this before and somebody who understands not just health care and life sciences and research, which is areas which I'm I'm fairly knowledgeable about and not just, you know, development of AI, which is also an area I'm knowledgeable about, but it's actually built enterprise software and predictive analytics software.

05:55

And so I teamed up with my cofounder Yuval was a group engineering manager at Google, driving the development of a conversational AI product at that time. And we, raised the seed round for vinyl. That was kind of how things came together. Very cool. And you mentioned that

06:14

federated learning is 1 component of what you're building and that more broadly, you're looking to build out a distributed compute platform for the health care industry. And I'm wondering if you can kind of talk through some of that nuance, and then we'll dig into a bit more into kind of what federated learning is and how it's being applied here. Yeah. So so starting from kind of the smaller details of how exam consulting worked, we can see that federated learning is definitely

06:40

a very cool technology that can help you do stuff. But what you really need is a platform, which you can do more things with. You can manage the data. You can harmonize the data. You can validate the data. You can pre process the data and you can also federate learning with data.

07:00

But but not and can create an algorithm off. After that, you have an algorithm, you you wanna deploy the algorithm, you wanna monitor it, and so there's like a full end to end life cycle. And if you just solve a small critical but narrow problem of federated learning, you don't give users sufficient value. And so,

07:20

you know, it's kind of like people say, you know, data is garbage in garbage out and all that kind of stuff. So like you want to go much more upstream and you also want to go more downstream in order to make sure that the algorithms are continuously supported. And so that took us into developing a much broader platform than just federated learning which I I think you know, has people have attempted building federated learning platforms before

07:50

in the last few years. And I think that's very where we typically miss the mark. We decided in difference of that to having the full blown edge compute platform that takes you a new development journey end to end which also has ultimately felt like maybe the real name for something like this would be an edge cloud meaning taking all the capabilities of of a cloud computing and putting them on an intricate system of inch of inches and that's what trineworld does.

08:27

We're a platform for builders we're not a platform that is kind of like a no code or you have to get your hands dirty in order to use it similar to how it was using AWS or GCP or Azure, you

08:41

know, up to to date. But we definitely lower the learning curve required in building distributed systems substantially to the level where if you have, an algorithm and you generally know what the pipeline should be and what, you know, what kind of data you need, you can, convert the model into a federated learning framework and deploy it to Rhino, run the right data for it and see results within

09:08

a day or 2 from beginning your project, which anyone who's ever actually done a federated learning project if it actually went beyond 1 institution and required to manage infrastructure in multiple places and kind of like do a lot of engineering work which is not germane to the data scientist profession necessarily. So that's actually a fairly complicated process, which also runs the risk of it being non repeatable and non sustainable.

09:37

And you know going a bit back to the exam consortium while it was a very large achievement, it was not a sustainable achievement. We finished the project, everybody folded up the stuff and, you know, was a good project. The data today can't be made readily available easily to our researchers who want to validate algorithms on a on a global rich data set. You know it was an ad hoc project like many science projects.

10:06

We we endeavor to change that and also be able to turn some of these ad hoc consultants into something that actually propagates knowledge and propagates data. So we've you run at once and you keep an edge in your data center or on your workstation or wherever you, know, your data and compute line and you're able to tap into a vibrant ecosystem of developers both from industry as well as academia as well as hospitals and, you know, jointly try the future faster.

10:41

And digging a bit into the terminology of edge compute, it's probably also worth kind of detailing what that means in this context

10:51

of distributed compute for health care data, where, in some cases, edge compute means working on low power IoT style devices. In other cases, it's an edge where that edge is actually a full blown data center. I'm just wondering kind of what level of computational capability you're focused on targeting with Rhino Health for being able to push these capabilities out to these so called edge locations?

11:16

Yeah. So we we benchmark the platform based on different model types and use cases. And today we have configurations from a MacBook Pro to an Nvidia d g x server to bigger, capabilities on cloud. Will we fit whatever you have? The question is if if you want to train a convolutional neural net on half a terabyte of imaging data, maybe your laptop won't be the best apparatus for that.

11:48

And so now digging into the federated learning aspect of what you're offering, I'm wondering if you can just start by giving a baseline of what federated learning is and some of the trade offs that it introduces in terms of the types of ML algorithms that you can apply and the types of data that you're able to work with and some of the kind of constraints and capabilities that it offers? So federated learning is, I guess, a meta framework to machine learning frameworks.

12:20

It is a an iterative process in which you train algorithms locally. You share the weights from these algorithms, with a global orchestrator. The global orchestrator blends these weights using averaging or other techniques, Then it in turn updates the algorithms that are local. It runs an additional epoch or however you defined until you reach an optimized model. That's in the simplest,

12:49

I guess, way for a technical audience. And maybe for nontechnical audience, it sits like an analysis of different local algorithms that extracts the best most performant parts of them and creates a global model based on that. In the context of health care and pharmacological research, I'm wondering what are some of the motivations for using federated learning and some of the unique opportunities that it offers to this very tightly regulated and controlled space.

13:22

So first of all in order to when you do federated learning you don't need to show data you only need to show data insights. So that already alleviates a big part of the need to take liabilities over to come from data sharing. In general, federated learning is very helpful when you have very private data or very heavy data.

13:44

And so unsurprisingly, some of the early users of federated learning have been in autonomous vehicles and have been with cell phones. The cases where the data is heavy and you can't really stream it quickly to a cloud environment or when the data is private and, you know, you just don't wanna show whatever's on your phone. Healthcare data has both of these properties. It's very heavy and it's also private.

14:08

And using federated learning opens new collaborations between different hospitals, between hospitals and industry, and between different industry partners. And it was, you know, good examples for each 1 of these.

14:20

Between hospitals, things like the exam study or another project we're doing now, we're supporting now for the detection of pancreatic cancer between different academic medical centers Our prominent examples of that, another example is some projects that we do with pharma companies who wanna collaborate with AMC's hospitals etcetera in order to create biomarkers for disease prediction or detection. And another example is a pharma to pharma consortia, with Melody being 1 of the early examples.

14:58

For it, Melody with 2 l's and 2 d's, if I'm not mistaken, in Europe, whereas many other pharma companies are setting up similar capabilities or consults as a service so to speak and we're supporting these efforts. And to the point of the data that you're working with, you also mentioned that some of the capabilities you're offering are some of the data prep, data cleaning, data validation, and

15:25

the federated learning aspects. You mentioned that you don't need to send any of the actual data back to the kind of core model, which is 1 of the reasons that it empowers the health care industry.

15:37

And I'm curious, what are some of the challenges that you have had in kind of building confidence with your customers, ensuring that all of the data that they're processing is being done so safely and according to whatever their local regulations are, as well as the fact that whatever data is processed and turned into features or vectors within the model space are properly de identified and robust to any kind of reidentification attacks or

16:08

ensuring that there isn't any sensitive information that is encoded within the model that can be extracted after the fact? Yeah. So federated learning is still fairly new technique. It was envisioned by Google in 2016. So not too many years ago, people need to still, you know, understand how to work with it. And, you know, similar to the general struggle that many have with AI in general and machine learning in general.

16:35

Parts of this was around getting people comfortable, to install an edge device within their firewall and make sure that no data leaks out that, you know, we'll save FirePoint the data within the firewall, which helps because, you know, because they still have full control over the data and full control over the compute environment. Some of this but you you like the work question in the past. I send model weights from 1 place to another. Can they be intercepted in those technologies

17:08

to fill the safeguard that? Another question was what once I've taken away these model weights, can I use them, to create a model that would then reproduce the data? And there's been a lot of research on that topic as well. The vendors, I guess, questions of how good is the data that was used to generate these weights, which is something that's kind of our part of our core platform offering is the ability to test the level of harmonization and quality of the data.

17:38

So that solves for that. And I'd say in terms of, how what is the purpose the global model is going to be used for and what can it be used for? There's a question of, transparency, People knowing which models you're building for data,

17:54

you're using the data to build and based on that you can really assess risk much better. There's a question of where the model itself sits. Can it be hacked into, I guess, which is 1 of the reasons why our global orchestrator is in a cloud environment, which is encrypted and kind of like the most state of the art security. Ultimately, federated learning creates models. AI developers also create models. The same conditions and issues apply to both of these groups.

18:26

There's definitely a certain element of educating the market about federated learning and getting people to really understand what it what it is, what it isn't, and trying it out for themselves and seeing that how it can actually solve for business problems so that it will then create new problems.

18:44

And so in terms of what you're building at Rhino Health, can you give a bit of detail as to some of the kind of design of the platform, some of the user experience aspects that you focused on to simplify the onboarding of customers, to empower them to be able to start building some of these federated models and understand what are the capabilities, what are the the limitations, and just some of the overall kind of infrastructure that you've built up around this problem?

19:13

Yeah. So, some of the design principles we have is, central keep the data and compute. And so it means that you never need to actually move your data outside your compute environment, which already gives the user much more confidence in working with us, and we don't persist any data outside the edge. So that's 1 design principle we've used. In addition to that, we've built up system fairly generic kind of like using primitives.

19:53

So the purpose is ultimately not for us to create the Autoflight very, you know, customized end solutions, but rather create tools that can be reusable and repurposed and, you know, relevant to a broad range of users. And everything we do is both in the graphic user interface and can be, invoked by an SDK. And so it gives usually as your user experience, you'd start from using the GUI and get used to it, understand the workflow, etcetera.

20:26

And then often users would switch to a programmatic interface where they can really have more power in, getting the system to work for them, optimize for experiments, integrate with additional software, and stuff of that nature. Another principle is we we use we ourselves are not open sourced, This is also, you know, safety and, privacy consideration, but we leverage a large host of open source software that can be used for building, containers, running

20:59

experiments, creating AI, and all that kind of stuff. So we don't typically require a user a huge learning curve in terms of, learning a new framework. We use flare under the road for for the federated learning part and we use container orchestration that you can run custom code and for all the rest. And so you can bring your own experiments, you can bring your own code, your own software, and use that on vinyl.

21:28

So, you know, like every every software with certain learning, but we don't require it to kind of, like, start using the Rhino proprietary federated learning framework which is only going to be relevant to you if you use Rhino. And this is actually a good place to know for shout out for the NVIDIA team who have been great partners to us in terms of responding to needs, requests, providing more examples, supporting, you know, software bugs and issues over time.

21:55

And, you know, we've been part of our consideration in open sourcing flow because we recognize that the only way to really scale a framework and achieve broad public adoption is by having something where on 1 hand, you have a big player supporting and pushing it. And on the other hand, the users can understand

22:16

the software well enough know when it doesn't work and iterate on it. So and and in general you know open source communities today very powerful way in driving adoption and also in driving progress in development.

22:31

As far as the federated learning aspects, are there certain kind of baseline models that you are offering so that, the different so so that your different customers are able to run those models on their locations and then contribute back to that core model so that everybody's able to benefit from the distributed training and distributed learning that's happening?

22:53

Or is it a case of your customers are the ones driving the model development, and then once they build a certain model, then that gets pushed into Rhino Health, and it can then be applied to other customers' use cases. And just wondering what are some of the kind of common data types or data formats that you're, focused on supporting. So, we're we're pretty broad in terms of data

23:19

types. We're not we're not single. It's it I actually find it curious. People always ask me that question and what kind of data can you use on Databricks or on S3 bucket, so stuff like that. A lot of kinds of data. We have added some tooling which was more medical imaging relevant like medical imaging fuel also an open source software

23:40

in order to do that and today we actually allow you to deploy pretty much whatever kind of software you want as a data fuel. We have some stipulations but it needs to be dockerized and aligned with security principles of our system. In regards to your question about do we kind of drive the development of foundational models or models which show iterating that, we don't. We live that to our users.

24:09

Do create tooling that makes that development easier. Like, we recently supported a large language model conversion into a federated model. So we help with the aggregator and the set of experiment and, you know, some of these toolings can be used across different users for different users. The actual model itself is proprietary, and we try to not get into that space too much because you don't wanna ultimately be come or be seen even as a competitor of your developers.

24:45

I'd also say in medical the medical world in scientific world is filled with collaborations and consortia and big dreams and big aspirations and all that. There's a lot of people who can build these models. Our purpose is really to be the supporting platform for for that, not to try and kind of, like, be in the gold rush for medical AI intellectual property of that nature, if that makes sense.

25:14

Yeah. And in terms of your experience of going from idea to where you are today, I'm wondering what the overall evolution has been in terms of the goals and design of the system? So, we started focusing

25:32

on federated learning per se, and over time, we've expanded, as I think I said earlier, more upstream and downstream of that. And that's made us need to have a bit to work with multiple datasets at the same time, keep versioning of datasets, transforming the datasets, and kind of, like, doing a lot of more

25:53

orchestration of that. We've also, over time, expanded our, API capabilities in order to support enterprise software where you really wanna use Rhino as a middleware and you don't want to use contract Rhino end to end. So that's also made our design more open and more flexible to integrate with other solutions. And recently we've also started expanding our ability to do integrations and data ingestion, locally. Since ultimately once the scale of user final started to go up, people said, okay.

26:36

Actually, I would wanna not just get you the full curated data set but also use the platform end to end to QA to that dataset. So integrations with DICOM servers, SQL databases, with different routers using h27 and stuff of that nature. It's now been a focus of ours, recently and we'll continue pushing on that. And then as far as getting onboarded onto Rhino Health, sort of adopting it into the workflow of a given customer, you mentioned that computationally,

27:12

you're very flexible and are able to operate on a broad set of runtime environments. But from a organizational capacity and capability standpoint, what are some of the common skills or team sizes or, kind of just general technical knowledge that are necessary to be able to most effectively take advantage of what you're offering at Rhino?

27:36

So for the project the project lead perspective, it's a data science and sometimes you need some support engineering like, localizations and some integrations with the existing systems. We offer that as well for, you know, as an adjunct service. But ultimately, some corporates would would like to do more of that on their own.

28:02

From a data contributor side, not really a lot. You need to be able to have like a Ubuntu operating system and be able to scale your compute as much as needed for the running of the workloads. And in that regards, we have today fairly high flexibility. We have, engines running on Azure, AWS, GCP, private class, some other smaller clouds, VMs on local hardware and bare metal. So that's you need to have something And if you have nothing, then we can, resell your server. And as far as the

28:46

kind of onboarding process where somebody says, okay. Rhino Health solves this problem for me. I'm going to start using it. I'm curious what the overall workflow looks like going from kind of idea to initial implementation and then the capabilities that you offer for ongoing development and maintenance of the particular kind of model development and training and just kind of the the ongoing support for being able to keep a project in motion? So, that's a big question.

29:16

I don't know. We need to liaise with the information security team and clear the line of architecture with them. We need to work with your data center team if that exists in order to get the hardware set up or with your cloud team if you need to get a cloud instance set up. Once we do install itself is 7 to 10 minutes. It's, not a big deal. But once you have installed, you need to be trained on the system. How you ingest data? How you export data? What what does, you know, all these things mean?

29:49

We've had some users in in the beginning kind of like not understanding why they were trying to export data but we're not seeing it on the laptop. And for example, whereas we always keep the data in the same environment which we imported from we also export to. So you can't really use this as like a host to pull data from 1 place to another. We actually keep data very local to where it it was received. The query itself is I I would say is pretty

30:17

simple to understand. There's some questions about like, you know, the whole federated framework, federated workflow where you import data or you import data from multiple users and then how do you check it across? And how do you know where all the data has been imported from the different, you know, contributors? Stuff of that nature which sometimes, you know, users wanna understand it a bit more. But I'd say most users become pretty independent within a few weeks of use.

30:47

And so at that point some some of them get more creative and wanna say hey I'd like to integrate this solution to this or I'd like to try to do this or I'd like to change some models on different dockers or different this and that. And we usually support that. It's still, you know, we're still an early company so it's cool to us to learn much more about our users and how we can,

31:11

you know, delight them in the best way and how we can push that envelope and expand the user for Rhino. We also have bigger customers which are less independent just because they don't have to be. And they have sometimes bigger kind of bigger ambitions and kind of like set up a whole blown consult here from the get go and add all sorts of capabilities and add more federated aggregators to support more things or add more functions

31:38

in order to get kind of a better quality of life for the hands on user and when we also offer all of these things. And in general, if I've learned something from being an early stage deep tech CEO is you need to be flexible. You need to do what your customers need not what you want to do. And so, you know, we we are very flexible and are very accommodating. All of that comes with, you know, a certain price. We have to stay focused on things that are not just impactful but are also valuable

32:10

can be quantified by users. Having that said, a lot of things we do today is academic collaboration support which are really very cutting edge, very exciting research for, you know, eradicating cancer or for identifying patients at risk with an adverse outcome, stuff like that, which I'd say we still do as a very

32:31

part of the benevolent mission and not as a way to make money quickly. Because if if we were just trying to do that, that would probably not be the right venue. But we are I don't know. Many of us are MDs, academics, and still very excited, very, you know, mission driven. And so we continue doing that. Once in a while, we kind of need to remind ourselves, you know, we're also running a early stage startup and have to do things beyond that, but it's still in very much in the DNA of company.

33:04

Given your focus on health care and that industry, wondering how much of the ethical and regulatory burden falls on you as a company operating in this space and how much work you've had to do to be able to kind of be in compliance or ensure that you are considering some of the ethical ramifications of the capabilities that you're offering and how much of that is the responsibility

33:30

of your customer as the owners and stewards of the actual data and the ones who are doing the actual processing and learning on that? So we are HIPAA compliant, GDPR compliant. I don't know. We've complied with that. We're SOC 2 certified, ISO certified, and a bunch of other things certified. So it definitely had a lot of, that kind of burden.

33:55

We save a lot of other kinds of burden by the sheer fact that we don't move data anywhere. So we don't actually need to maintain data on another environment outside the hospital.

34:06

That side that's solving the hospital ownership and privacy question. It's solving the company's ownership and privacy question. And it's enabling stuff that couldn't be done otherwise. Like, today, there's almost no collaborations that involve data from the European Union and from the United States, because of issues around GDPR compliance in the states. So that's a problem that we can solve. Same things with the PIPL in China and pretty much any region today has their own

34:38

regulatory schemes. And by being able to not move data, you solve a lot of that issue. Any other way customer would work same as they work on their own data. We have to comply with whatever practices required of them. And in your experience of building Rhino Health and working with your customers, you mentioned some of the ways that it's being applied that you have been personally excited about, wondering if there are any other interesting or innovative or unexpected applications

35:09

of your platform and technology that you've seen. That's a really good question. I think we expected a broad use of the technology and there is a broad use of the technology. We've seen that over time, a lot of our users have really been building applications on top of a platform, and using us as a way to integrate data from multiple places, but not as kind of like the only software at play.

35:36

So the American College of Radiology, for example, built distributed registries for mammography studies in order to promote research regarding women's health and so we really integrated us into the cloud environment called DART whereas they kind of control the workflow and user experience. They also integrated us with a ACL connect software, which is a more of like a local data management apparatus behind hospitals firewall, and then we could turn that into 1 coherent product.

36:14

It was called Verano sandwich. I think that was pretty creative. I didn't we didn't expect to become a sandwich so quickly. So I guess that was an example for that. And given the fact that, as you mentioned, you and a number of your colleagues actually have medical backgrounds, I'm curious how you see that as having empowered you or enabled you to be able to enter this particular industry so rapidly and be able to build up the capabilities that are useful to your target customers?

36:49

I'd say as people generally operated in a clinical and academic environment, we have a better appreciation for what our how it is working with medical data, how it is working in a clinical environment, what matters more and what matters less. I've seen a lot of technologies and tools people have tried to bring into health care, which sometimes, were an overkill

37:16

on some areas and then underwhelming in others. I consider it, like, for example, a lot of the blockchain implementations for health care, I felt were a bit premature and not really providing as much value as the actors were, looking for. I've also seen some of the encryptions where you kind of like, hey. Move all my data to the cloud, but it's, like, super encrypted and you keep the key and stuff like that.

37:44

I've also seen it as a kind of a bit of an overkill in some ways, but also a bit not really what the users are looking for in regards to something that's understandable by a user. A a user can understand if a data stays well. He can't understand and fully analyze the level of rigor that the system that's supposed to protect his data while outside his reaches. We also had the appreciation of the need to iterate and work with medical data in order to

38:13

certain the quality of it. Whereas some of the additional, you know, things I've seen in the market kind of, like, assumed your data is, like, ultra curated and pristine. And now, you know, do whatever we want with it, which is it's nice, but it doesn't really solve a problem of scaling up collaborations between different stakeholders. Like, if I need to set up a consult here with a 100 health care providers and I expect each and every 1 of them to, clean, harmonize,

38:43

preprocess. I don't know. Do do whatever is needed for the data to reach point in which I'm gonna work on them. That is not a sustainable model. But if this if this talent pool doesn't exist, you can't hire a 100 CROs or, I don't know, how many, data processors in order to prepare the data. You need to have an ability to do much more of these things in a central in a centrally orchestrated, not in a centralized, but centrally orchestrated fashion.

39:08

And so I think just having the ability to have run projects like Exam and before that, other earlier federated learning efforts and in general driving collaborations between different sites, between different geographies and all that. It gives myself and many people on the team the understanding of what would be a good, product with a good fit for the customers, I guess, I'd say in in summary.

39:35

And in your experience of building this business and working with these technologies and customers, what are some of the most interesting or challenging or unexpected lessons that you've learned personally? So I think a lesson we've learned is about the need to run the process end to end as much as we got to. I I think we realized we're gonna have to have, like, a compound product that solves a lot of problems. I don't think we realized how many problems need to be solved with the product.

40:08

And I'd also say that the idea of really working on, bespoke hardware, from different sites and our ability to actually, be multi cloud and truly hybrid from the get go. Some people thought that's an unfeasible option that's gonna take us back to to too much time. And I've actually been pleasantly surprised with a lot of medical institutions today and companies, of course, that use virtual private clouds.

40:40

We've been able to run and manage Kubernetes clusters on v edge, kind of like solve a lot of, you know, problems of an unprecedented scale in a hospital environment and the hospital. IT and information security teams are actually have ramped up and are really geared to, meet these new, and emerging needs for the use of health data and have also been a a good partner to learn from and work with.

41:11

And for people who are interested in these distributed training and distributed data operations capabilities, what are the instances where Rhino Health is the wrong choice? That's a good question. If you wanna build your own aggregator, ultra custom code only for a very specific use you have.

41:34

You you you wanna academically validate which the mesh it works but you don't have any aspirations on deploying it elsewhere or actually scaling it to actual different users, Rhino is probably an overkill. You can just like build your own framework, build your own aggregator, you know, separate your data to a few different libraries, folders on your computer or your workstation and just do it. No need to do all the rest of the stuff.

42:04

And we have, in fact, had some of, like, like, the hospital based physicians researchers or researchers who've worked with us have always been people who have, like, huge ambitions in terms of deploying algorithms for diagnosis elsewhere in terms of scaling innovation, collaborating around the world, and kind of like really changing the big health care, you know, system. Folks who have been much more on the kind of like a seminal science

42:34

part and wanted to just like test out new methods, pull them out and stuff like that. At times benefited from working with us and at time I've said, yeah. This is like this this solves, like, I don't know, 25% of my personal pain, but maybe not enough for me to become kind of like a repeat customer.

42:51

And as you continue to build and iterate on the product and platform at Rhino Health, what are some of the things you have planned for the near to medium term or any particular projects or capabilities that you're excited to dig into? So, I'll keep my road map for myself right now. But in terms of areas that we're interested in actively pursuing, I'd say we are definitely interested in working more in the EU and working more in areas where we can solve big, regulatory issues.

43:26

In in the North America and the Middle East, in APAC, we already have, like, a pretty substantial presence. We have some presence in the EU, but we're planning to go with it. And in terms of solving problems, we started out working with much more with hospital based researchers because these were the guys who we had, you know, empathy and the most,

43:47

unmet needs. Our platform has grown a lot. Our user base has grown a lot, and we find ourselves working increasingly with the biopharma industry, which is also great because these are the guys who are also ultimately going to take a lot of this innovation and, translate it into something that can scale and can affect patient care. And so a lot of growth in the biopharma space and, you know, solid biopharma customer pains.

44:17

I'd say most of our work in biopharma has been around r and d. And so there's actually quite a bit of similarity between our earlier customers and biopharma customers. But, definitely, as we continue to grow and work with more commercial and with other, you know, groups, there'll be different challenges.

44:34

Many of them are around building, business partnerships and, integrating with additional sole systems for data, in order to help really scale some of these collaborations into the 100 and 1, 000 range. Are there any other aspects of the work that you're doing at Rhino Health or the specific applications of distributed compute and federated learning for the health care industry that we didn't discuss yet that you'd like to cover before we close out the show?

45:03

There's a lot of exciting new AI and applications of AI. I think we're very excited about the use of AI as a way to detect rare diseases and kind of scan very large amounts of data often with kind of like patient by patient case, which most of the deployment of AI in hospital workflows has been to date. We're also excited about the the increased leverage of multimodal relay data for modeled creation.

45:31

And that that really gives you the, you know, potential of bringing a true benefit to humans. So that would be something that can be almost as good as a human. It can really give you more data points in making decisions. And I think in general, the whole impact of cloud compute on the health care system is also something that's very interesting and meaningful to us. And we do find ourselves much more as a metacloud that connects between different clouds

45:59

rather than just like the way to connect on prem and hardware. We also recognize that on prem hardware is not going away, and we need to continue supporting that in the mid to even longer future.

46:10

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest barrier to adoption for machine learning today. Validation and understandability.

46:28

Not not just validating the actual algorithm, validating the impact of it on your clinical workflow or on your patient or, you know, the outcomes of that patient and commercial outcomes and all that. I I think there's been a lot of really cool stuff going out to market. I think the the actual, ability to quantify value of that is not progressed as fast as the, business. And in a deeply regulated and deeply deeply risky area like health care, it's not enough to show a good,

47:14

fragmented nature of health care, it's also harder to prove that value. You change an algorithm on Amazon Marketplace, and you can see if most stuff is bought or isn't bought. You change an algorithm to predict the outcome with cancer patients or help you, pick the right medication, right intervention. Getting to actually understand what what ultimately happened is

47:38

not trivial effort at all. It it it's not done. If if I look at some of the most impressive validation studies I've seen on AI prospectively with prospective data, it's been on a few dozens of cases in most in in most of, cases. And it's been with manual data curation. So there's clearly, like, you know, in terms of building and integrated platforms and integrated capabilities in order to track that longitudinal view of a patient and of different outcomes.

48:07

That's a long way to go. That's definitely been a thing in in terms of, transparency and explainability. Many algorithms are still black box. You don't understand exactly why it made a certain prediction and I I suspect it's it's part of the reason why much of AI today is restricted for,

48:27

operational use. And I I I'd include patient trials as kind of like a semi operation vision rather than, you know, an actual autonomous diagnostic, or for administrative purposes per se in hospitals, and is only now starting to scrape the surface of a true potential of AI in health care.

48:49

Definitely. Very interesting. Thank you for that. And thank you for taking the time today to join me and share the work that you're doing at Rhino Health and some of the ways that your platform is being applied in the health care industry.

49:00

Definitely great to see folks helping to empower those capabilities and help to push that forward. So I appreciate all of the time and energy that you and your team are putting into that, and I hope you enjoy the rest of your day. Absolutely. Thank you so much for having me, Ovo. Thank you for listening. And don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management,

49:25

and podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machine learning podcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at themachinelearningpodcast.com

49:44

with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript