Abstracts: September 30, 2024 - podcast episode cover

Abstracts: September 30, 2024

Sep 30, 202419 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

The personalizable object recognizer Find My Things was recently recognized for accessible design. Researcher Daniela Massiceti and software development engineer Martin Grayson talk about the research project’s origins and the tech advances making it possible.

The Find My Things story is an example of research at Microsoft enhancing Microsoft products and services. To try the Find My Things tool, download the free, publicly available Seeing AI app.

Learn more:

Transcript

AMBER TINGLE

Welcome to Abstracts, a  Microsoft Research Podcast that puts   the spotlight on world-class research in  brief. I'm Amber Tingle. In this series,   members of the research community at  Microsoft offer a quick snapshot—or   a podcast abstract—of their new and  noteworthy papers and achievements.

[MUSIC FADES]

AMBER TINGLE

Our guests today are Daniela Massiceti and  Martin Grayson. Daniela is a senior researcher   at Microsoft, and Martin is a software development  engineer with the company. They are members of a   team creating technology that can be personalized  to meet individual needs. Their research project   called Find My Things enables people who are blind  or have low vision to train an AI system to find  

their personal items based on a few examples  of the objects. Find My Things has now shipped   as a new feature within Seeing AI, which is a  free app that narrates a person's surroundings,   including nearby people, text, and objects.  The team was also recently recognized by the   US-based business media brand Fast Company  as an Innovation by Design Awards finalist  

in both the accessible design and artificial  intelligence categories. Daniela and Martin,   congratulations and thank you so much  for joining us today for Abstracts.

MARTIN GRAYSON

Pleasure, thank you.

DANIELA MASSICETI

Thanks very  much, Amber. Nice to be here.

TINGLE

So, Daniela, let's start with  a Find My Things overview. What is it,   how does it work, and who's it for?

MASSICETI

I think the best way I can  describe Find My Things is a personalizable   object recognizer. So when we think about object  recognizers in the past, they've, kind of, been   what I would call generic object recognizers. So  they can only really recognize generic things like   maybe chairs, desks, tables. But for the blind and  low-vision community, who are really key users of   object recognition apps and technologies, for  them, that's not quite enough. They need to be  

able to recognize all of their personal objects  and items. So things like their sunglasses,   their partner's sunglasses, um, perhaps their  house keys. So a range of these really specific   personal objects that generic object recognizers  cannot recognize and help them find. And so   Find My Things aims to tackle that by being a  personalizable object recognizer. A user can   essentially teach this object recognizer what  their personal items look like, and then the  

personalized feature can then help them locate  those objects at any point in the future. The   experience is divided into two phases: a teaching  phase and a finding phase. So in a teaching phase,   a user would capture four short videos of each  of their personal objects, and those videos are   then ingested into the app, and the machine  learning model that sits underneath that app  

learns what those objects actually look like.  And then in the second, finding phase a user   at any point in the future can, kind of, say,  hey, I want to find my partner's sunglasses or   my sunglasses. And that will initiate this 3D  localization experience, which will help guide   them with sound and touch cues to that specific  object, wherever it is in the room around them.

TINGLE

I’ve heard Find My Things described as  a teachable AI system. Daniela alluded to this,   but, Martin, break it down a bit more for us. What   do you and your collaborators mean  when you use the term teachable AI?

GRAYSON

Something you can say about every person  is that we're all unique. Unique in the things   that we like, whether that's music, movies, food;  the things we do, whether it's at home, at work,   or in your hobbies; and of course, the things  that we have and own and keep with us. The same   applies to accessibility. Everyone has their own  unique sets of skills and tools that help them   get things done, and we have them set up in just  the way that matches us. The other day, I, like,  

came into the office and I sat in my chair, and I  realized immediately that it wasn't right. And of   course, somebody had borrowed my desk the previous  day and changed the height of my chair, but it was   no problem because I could just re-personalize the  chair back to my liking. When it comes to tools   for accessibility, we think that people should  have the same ability to personalize those tools  

to work the very best way for them. Typically,  these have been settings like text size, speech,   and color display, but AI has become a more and  more important component in those tools. And  

one way we're really excited about how to enable  that is through teachable AI. So I think for us,   teachable AI means that we can take some already  really smart AI technology that might have some   great general skills, but with a tiny amount of  time from a person, that AI can be taught what   matters to them and what works for them and become  an even better AI to help them get things done.

TINGLE

Describe the origins  of this work for our listeners,   Daniela. What influenced or inspired the  Find My Things pursuit? And how does your   work build on or differ from previous  work in the accessible technology space?

MASSICETI

Yeah, great question. And this is going  to require me to cast my mind back to around four   years ago. Our team at Microsoft Research was  developing a system called the PeopleLens. So   this was a head-mounted camera device that could  be worn by people who are blind or low vision,   specifically children who are blind or low  vision. And it would help them identify or   it would describe to them all the people that are  around them in their social scenario—where those  

people were, were those people looking at them.  And I think the team realized very quickly that,   as Martin was saying there, each person has  a really unique need or a unique view of   what they actually want described to them about  the social environment around them. And so that   got us thinking, well, actually, being able to  personalize this system is really important. But  

in complex social environments, personalization  is a really hard problem. And so that prompted   the team to think, OK, well, we want to study this  idea of personalization; let's try and find almost   the simplest possible example of an AI technology  with which we could actually deeply explore this   space of personalization. And that led us to  object recognizers. Object recognizers are,   as I mentioned, a very commonly used technology  across the blind and low-vision community,  

and we know that there is a need  for personalization there. And so   that really prompted or started this journey along  personalizable, or teachable, object recognizers,   which we then have been working on for the  last three or four years to eventually get   us to a point now where we're seeing  this feature available in Seeing AI.

TINGLE

Your team identified few-shot learning  and the availability of new datasets as keys   to this work. Martin, how have those particular  advances helped to make Find My Things possible?   And are there other approaches you've  incorporated to make sure that it's both   practical and valuable for people  who are blind or have low vision?

GRAYSON

So AI loves data. In fact, data  is essential to make AI work. But for AI to   work for everyone, it needs to learn from data  that somehow represents everyone. The earliest   challenge for Find My Things was that people  who are blind or low vision don't often use   their cameras to take lots of photos and videos.  And this actually gives us two big data gaps. The   first is that we don't have lots of image data  that is representative of their own lives,  

their environments, and their things. And the  second is that if you're someone who's blind,   you may hold your phone differently, or you may  use your camera in different ways. And that's,   too, missing from the data, certainly in the  established datasets that exist. So new datasets,   like ORBIT, have collected thousands of images  and videos by members of the blind and low-vision   community, specifically of objects and  environments that are really important to them.  

And this means that we've now addressed those two  big data gaps. And the few-shot part is really   important, too. Find My Things is not a general  object recognizer. It's a find my things. We want   Find My Things to be able to recognize anything  you throw at it—whether it's your fluffy keyring,   your colorful tote bag, or your favorite gadget  or toy. However, traditional object detectors,   they often need hundreds or thousands of images  to learn how to recognize something accurately.  

Few-shot learning is a super-smart approach  that means you only need to trouble our users   for a couple of short five-second videos, and  then our app will take it from there. Find My   Things can use that tiny amount of data and still  be able to spot your object from across the room. Maybe one more thing we did, and this also became  so important, was to build and try prototype   experiences as soon as we possibly could. And  we would try so many models and designs out and  

then iterate. The team has definitely seen so  many videos of me trying to find things around   my house. But it's actually one of the things  we're most proud of in the project, is this,   kind of, graveyard of interactive prototypes  that have all led us to the final experience.

TINGLE

Daniela, what have you learned  from the Find My Things journey that may   help the broader research community create more  inclusive and more human-centric AI experiences?

MASSICETI

The first one I would say is the  importance of doing participatory research.   And what that means is really working with the  communities that you are developing technologies   for. And the second is really learning how to  balance this tension between developing something   in a research environment and actually deploying  that technology in a real-world environment. To   jump to the first learning around participatory  research, Martin mentioned the ORBIT dataset.  

The ORBIT dataset was collected in partnership  with users who are blind or low vision across   both the UK and Canada over the years 2020  to 2021. And it was really important for us   to actually engage directly with users who are  blind as we were collecting that dataset from   them to really understand what they wanted from  a personalizable object recognition technology,   how they would use their cameras, how they would  hold their phones, what kinds of objects they  

would use this technology to find. And all of that  was really, really critical in helping us shape   what that dataset ended up being. That dataset  became such a pivotal part of the ultimate Find My   Things experience. To the second point around this  tension between building something in research   and deploying something in the real world, I  think often as a researcher, we don't really  

have to engage with real-world constraints. But of  course, when you build a machine learning model or   a machine learning system and you want to deploy  it in the real world, suddenly those constraints   really hit you in the face. And that was exactly  the case with Find My Things. I remember quite   distinctly in the model development process,  we had a number of different models. They were,  

sort of, ranging up in size in terms of how much  memory they would take on a phone to run. And of   course, the larger the model was, the more  accurate it was. But when we deployed these   models of varying sizes onto a phone, we saw  that they each had vastly different reactions   to being on this phone. And I think if I recall  from memory, some of our largest models ended up   basically draining the phone's battery in a  couple of minutes, which would mean that the  

experience would be totally unusable to the user.  And so one of the key things we had to do there is   really find this sweet spot, or this balance,  between what is good enough performance that   does not end up, kind of, degrading the actual  experience of running this model on a phone.

TINGLE

You mentioned participatory research,   and your team's version feels a little  different from what we typically   encounter. Talk a little bit more about the  citizens who helped you build out this app.

MASSICETI

So these were a group of perhaps  eight to 10 users who are blind or low vision   who we hosted at Microsoft Research a number of  times over the course of the development of the   Find My Things experience. And they were …  perhaps the best way I can describe them is   they were co-designers; they were really  helping us design—co-design—what the Find   My Things experience ultimately turned out  to be. We weren't coming to them as simply  

testers of our system. We, kind of, went  to them with a blank slate and asked them,   well, we have these ideas of what we want to  build; what do you think? And from there, we,   kind of, iterated upwards and ultimately  crafted, co-crafted, the ultimate design   of the Find My Things experience, both  the teaching part and the finding part.

TINGLE

One of the members of that citizen  design team, Karolina Pakėnaitė, visited the   Microsoft Research Podcast back in December  with your colleague Cecily Morrison. Martin,   talk a bit more about how influential citizen  designers like Karolina are to this effort.

GRAYSON

There were so many key ideas and  innovations that came from the workshops   with Karolina and the rest of the citizen  design team. Maybe I can share a couple   that have had the biggest impact on the app and  the experience. So the first was the process of   teaching an object. Our testing of AI models  showed that collecting videos of objects from  

different sides and on different backgrounds  was critical. So we developed this thing called   the drawback technique, where we leaned on the  phone's augmented reality capabilities to make   it possible. We'd ask the user to start with the  phone right next to their object and then slowly   draw it away. This meant that we could track  all of the different distances the images were,   and the user could really comfortably create a  video without leaving their seat. And what's more,  

you can do this so easily without even needing  to look at the camera. It's really natural. The   second big design innovation came later  on when you were actually looking for the   thing. We called it the last yard. So many of  the lost-item scenarios that we learned about   from the citizen designers … they shared  with us that they had dropped something   in a public space. Their wallet fell out of  their pocket as they took their phone out,  

or they knocked their earbud off the table onto  the floor of the train on their way to work. And   in both of those moments, the last thing anyone  wants to be doing is feeling around on the floor,   especially on public transport. So we tested these  early versions of Find My Things with the design  

team, and they would get close to their object,  overstep it, and then reach down. And they'd still   be feeling around the floor before they found  their object, which mostly ended up back behind   them. So our last yard design completely changed  this. As the user got close to their object,   within the last yard, we change the sounds, and  the app actually tells them to move down. The  

phone then responds to the distance to the object  exactly like a metal detector. And this meant   that when they reached down just at the right  moment, they found their object on the floor   and it was much easier. No more overstepping. We  spent lots of time exploring how the experience   and the phone capabilities like AR and AI  could work best together, and our citizen   design team gave us all of the key insights  that led to us coming to these approaches.

TINGLE

So what's next for Find My Things? I'd  like you to share a bit about the opportunities   or even the obstacles that exist for more  widespread adoption of the teachable AI approach.

GRAYSON

So Find My Things was such a great  project to work on. It sat right in the center of   the triangle of AI innovation, designing with your  community, and of course product impact. We're   taking so much of what we've learned during this  project and building it into our research going   forwards—how we build and evaluate AI, how to  engage with the communities that we want to build  

for, and of course the value of building lots  and lots of prototypes. Teachable AI, I think,   is going to be a key approach in addressing  the challenge for AI working equally well for   everyone. The challenge is how do we ensure  that we build these new fantastic models on   data that gives representation to all that’ll  use it. And so often, the people that might  

benefit the most from innovations in AI might  have the smallest representation in data. And   our work with people in the blind and low-vision  community have really brought that into focus for   us. AI can and will be transformational for them,  so long as we can make it work just as well for   everyone. And then that creates the opportunity:  ensuring that these systems and technologies that   we design can learn from and build in all of the  diverse and wonderful uniqueness of being a human.

MASSICETI

I think one of the things  I'm most excited about is unlocking this   power of personalization. Hopefully,  we've convinced you how impactful having   personalized AI technologies would be for  not only the blind and low-vision community,   but for you and I. And so one of the things  I'm most excited about is seeing how we can   transplant some of these learnings and ideas that  we've had in building Find My Things into now  

the generative AI era. And so, yeah, I think I'm  really excited to, kind of, bring together these   ideas of teachable AI with these new generative  models to help really bring to life more useful   AI technologies that service not just a small few  but all the people across the user distribution.

TINGLE

Daniela and Martin, thank you  so much for joining Abstracts today.

MASSICETI

Thank you, Amber.

GRAYSON

Thank you for having us.

[MUSIC]

TINGLE

And thanks to our listeners,  too. If you'd like to learn more about   Find My Things and teachable AI,  visit aka.ms/TeachableAI. Thank   you for tuning in. I'm Amber Tingle.  Join us next time for more Abstracts.

[MUSIC FADES]

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android