Building A Business Powered By Machine Learning At Assembly AI - podcast episode cover

Building A Business Powered By Machine Learning At Assembly AI

Sep 09, 202259 minEp. 9
--:--
--:--
Listen in podcast apps:

Episode description

Summary
The increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product.
Announcements
  • Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
  • Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!
  • Your host is Tobias Macey and today I’m interviewing Dylan Fox about building and growing a business with ML as its core offering
Interview
  • Introduction
  • How did you get involved in machine learning?
  • Can you describe what Assembly is and the story behind it? 
    • For anyone who isn’t familiar with your platform, can you describe the role that ML/AI plays in your product?
  • What was your process for going from idea to prototype for an AI powered business? 
    • Can you offer parallels between your own experience and that of your peers who are building businesses oriented more toward pure software applications?
  • How are you structuring your teams?
  • On the path to your current scale and capabilities how have you managed scoping of your model capabilities and operational scale to avoid getting bogged down or burnt out?
  • How do you think about scoping of model functionality to balance composability and system complexity?
  • What is your process for identifying and understanding which problems are suited to ML and when to rely on pure software?
  • You are constantly iterating on model performance and introducing new capabilities. How do you manage prototyping and experimentation cycles? 
    • What are the metrics that you track to identify whether and when to move from an experimental to an operational state with a model?
    • What is your process for understanding what’s possible and what can feasibly operate at scale?
  • Can you describe your overall operational patterns delivery process for ML?
  • What are some of the most useful investments in tooling that you have made to manage development experience for your teams?
  • Once you have a model in operation, how do you manage performance tuning? (from both a model and an operational scalability perspective)
  • What are the most interesting, innovative, or unexpected aspects of ML development and maintenance that you have encountered while building and growing the Assembly platform?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Assembly?
  • When is ML the wrong choice?
  • What do you have planned for the future of Assembly?
Contact Info
Parting Question
  • From your perspective, what is the biggest barrier to adoption of machine learning today?
Closing Announcements
  • Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Transcript

Unknown

Hello, and welcome to The Machine Learning Podcast. The podcast about going from idea to delivery with machine learning. Predabase is a low code ML platform without low code limits. Built on top of their open source foundations of Ludwig and Horovod, their platform allows you to train state of the art ML and deep learning models on your datasets at scale. The prediabase platform works on text, images, tabular, audio, and multimodal data using their novel compositional model architecture.

They allow users to operationalize models on top of the modern data stack through REST and PQL, an extension of SQL that puts predictive power in the hands of data practitioners. Go to the machine learning podcast.com/predabase today to learn more. That's predibase. Your host is Tobias Macy. And today, I'm interviewing Dylan Fox about building and growing a business with ML as its core offering. So, Dylan, can you start by introducing yourself?

Yeah. Thanks for having me on. My name is Dylan. I'm the founder of a company called AssemblyAI, and we are building out an API platform for transcribing and understanding audio with AI models for tasks like speech recognition and speech understanding. Do you remember how you first got started working in machine learning? Yes. It goes back to when I first got into programming. I think we had talked about this on another episode too.

So forgive me for boring you. If I give you the full backstory, but back when I was in college, I got into web design and web development, and I was a econ major. So I wasn't a computer science major, but I had started doing web development and web design and gotten really into just programming. And then I kinda gravitated from there to learning Python. I got this free book. I think I might have paid for it. Learn Python the Hard Way,

which I don't know if that's still popular or not. But back then, it was great. And then I remember from there, I started learning about different web frameworks like Django at the time was super popular. So got into that. And from there, I think my curiosity kind of stayed really strong. And I wanted to keep learning new things because, like, web development, I felt like got pretty easy. So where I kinda

went to next was learning about NLP. So I bought this book. I think it was about the NLTK library and started digging into NLP. This is like a decade ago or or or more. This is a long time ago, but started getting into NLP tasks. And from there, started to dig into more classical machine learning concepts at the time, like support vector machines. And that really solidified, like, okay. I'm really interested in machine learning. It's really interesting area of programming.

And it was really challenging, and there were a lot of new things to dig into. So from there, this is, like, you know, few years after college and I had had a couple of different programming jobs, I joined a team. I was living in Washington DC at the time, and I moved out to San Francisco to join a machine learning group at Cisco, the large tech company, Cisco Systems. And on that machine learning group, I was a machine learning engineer.

And there there, I got more into neural networks and worked on a lot of NLP and NLU problems, started working on some speech recognition related projects. I think at that time, neural networks were getting a lot more popular. I mean, you know, this is, like, 2015, 2016. Neural networks were getting a lot more popular. There was like, TensorFlow had come out recently where I was attended, like, TensorFlow's first developer conference down at Google's campus.

So it was all just really exciting, and I could totally see the potential. And I knew it was something that I wanted to just be really involved in because the ceiling looked like it was so like, you couldn't see it. You know? It was just continuing to open up. And every week almost, there was, like, a new paper coming out, and that's still happening. You know? That's still happening. Where on Reddit, you know, every day I kinda scroll through Reddit,

these different subreddits, like machine learning, and I'm I'm sure ones that you follow too. And there's always just so many new things coming out, interesting models, interesting papers. And for me, I find that just fascinating. I think I overlook this now because I'm, you know, just thinking about, like, the start up and growing it, but it's awesome to be able to work on, like, in this field full time. So

that's, like, the the full backstory. But I kind of stumbled into it, like, a decade ago, and it really stuck with me. I'm interested in digging a bit into your perspective of the state of the industry and the state of the capabilities for machine learning from when you first got involved a decade ago to where we are now, and you're running a business where ML is this core offering. And we'll get into that piece a bit later, but I'm just curious what your sense is

about the kind of state of the ecosystem, state of the industry for machine learning. And from the perspective of as somebody who's an early adopter of a given technology and is 1 of the early contributors to a community, it can be very easy to have an outsized impact as that community expands and grows where, you know, for instance, with Python, if you're the first person who writes an HTTP library, then it becomes the most popular 1, and everybody knows your name after a while. But once you

are writing the 15th or the 30th HTTP library, then it's just okay. Great. There's another 1. And I'm wondering if you can with that framing, if you can talk a bit to where you see the machine learning ecosystem today as compared to when you got into it 10 years ago. Yeah. Yeah. It's like I always see these, like, on developer Twitter. I always see these new front end frameworks for, like,

you know, setting up your JavaScript run times faster. Your, like, build environments faster. And it's, like, you know, the 1000th 1 I've seen, so it doesn't stick. You know, when I first started getting into machine learning in, like, the early 20 tens, it was like SciPy, NLTK. Those were popular libraries. And yeah. It's a good question. I mean, it's it's still just so early today. Right? Like, back in 2015, 2016, it was like TensorFlow. TensorFlow now, it's really moved over to PyTorch.

I do think there's and and now, like, JAX, I don't know how much you digged it, dug into JAX, or heard about JAX, but JAX is getting really popular. You know, they're attempting to kind of, like, reboot TensorFlow in some ways. So I still think it's so early, and there's a lot of companies trying to build more tooling around, like, for ML practitioners. Like, Hugging Face is a great library that helps you boot building models and is making

machine learning development more accessible. Because I would say PyTorch and TensorFlow, they still can seem a little bit scary to folks that are entering ML for the first time because there are these intimidating concepts that you come across. Like, what's a convolutional neural network and what's, you know, gradient descent and back prop. There's, like, these intimidating concepts.

And you're exposed to them at a low level because, you know, PyTorch and TensorFlow kinda make them accessible to you because you if you wanna build custom neural networks or complex neural networks, you need to be able to access, like,

all the raw primitives to be able to do that. And then I think there's platforms now or libraries now, like, you know, what Hugging Face is doing that are just making it more accessible so that it's easier to start building models and it's easier to, like, fine tune

base models for specific tasks and pull in datasets. So the state of the ecosystem, I would still say, is just, like it's still so early, especially when you start considering, like, deploying these models and all the like, for us, there's a ton of things that we are having to build in house for our experiment framework and platform to be able to train more models in parallel and across larger clusters of compute.

So it's still changing so quickly. But I would say, like, definitely, I'd like the switch from classical machine learning to deep learning has has has happened. Whereas back when I was getting into it, I mean, it was still just RNNs and convolutional neural networks. And then as folks kinda graduated to LSTMs and transformers and more powerful neural networks kinda made deep learning went out, but it's evolved a lot. But at the same time, it's still super early.

In terms of what you're building at Assembly, can you share a bit about what it is that the company is focused on and why you decided to build a business with ML as 1 of the core capabilities for the product that you're offering? Definitely. So really simple way to think about it is that we are building an API where you can submit an audio or video file to our API.

And then we'll send you back a JSON response with an automatically produced transcription that our AI models for speech recognition produce. But then we have a number of other AI models that you can enable, like sentiment analysis and summarization and content moderation and 1 called auto chapters that will break up your audio file into different logical chapters, kind of like YouTube chapter style

as the topic of conversation is changing. We have a topic detection model. So you can enable a number of these models to also be run on your audio files so that in your JSON response, you get back not only an automatic transcription, but you get back topics. You get back a summary of what was discussed. You get back the entities that were detected, a bunch of other information that our models can classify or label or generate. And

product teams and developers are using this for a number of different use cases. There's some startups using our API to build, like, Zoom meeting analyzers that are summarized in Zoom meetings and providing detailed information about what's happening in a Zoom meeting automatically. There's contact center platforms that are building conversation intelligence features with

our API and the models. And then we have, like, video platforms and even podcasting platforms using our APIs to detect the topics that are being discussed in videos or podcasts to help with recommendations and in some cases, even advertising. So we have developers and product teams using our API for a number of different use cases

around audio. And where we really wanna get the company to is that is a place where we are this API platform for a number of different state of the art AI models, where some of them can operate on audio, some can operate on text, some are like prompt based, you know, for different domains beyond just audio and text. Because our goal is to build

a top tier AI company. You know? So we're like 65 people today at the company, and half of that team are research engineers or research scientists or machine learning engineers. Because our goal is to put ourselves in a position to be able to just create these state of the art AI models, research them, train them, and then deploy them into production in a way that is really easy for your everyday developer and also product teams at companies to consume those models through a simple API

and build them into their products or features that they're working on. So So that's where we wanna get it to. And to your question about where this came from, back in 2017 or so, I was looking into speech recognition services at the time, and I was also looking at creating some of my own speech recognition models.

Most of them are still based on classical machine learning approaches with like hidden Markov models and Gausson mixture models. Some of them had started to use deep neural networks basic deep neural networks as, like, parts of the pipeline. But the idea of, like, an end to end deep learning model for speech recognition was still pretty exotic. Like, there are a few papers that came out from Baidu, like DeepSpeech 1 and 2. I think they also published DeepSpeech

3. And there was also this loss function called I forget what the full name is, but CTC, because it is, you know, what it's referred to that kind of unlocked this, like, deep speech model architecture and made it possible in a lot of ways. But that really showed the potential for an end to end deep learning model to perform just as good as the classical models that had kinda plateaued in terms of the accuracy.

And this gets kind of in the weeds. But, like, the classical models for speech recognition, they had pretty much, like, plateaued in terms of the accuracy they could provide. And so the way you could get better accuracy because the accuracy data plateaued at was not very good. I I mean, it was okay. But if you think of, like, it was like pre Alexa

days where speech recognition was still bad. Right? So there's like 2014, 15, most commercial speech recognition was still based on these classical approaches where overall accuracy was pretty bad and the way you would make it better would be through like adapting those models and customizing them for specific use cases. So if you're in a car, you would, like, narrow the vocabulary

to just what you would say in a car so that you essentially make the speech recognition problem much simpler. You limit the vocabulary. It's not like open domain, large vocabulary speech recognition, or you adapt the models to a specific acoustic environment, like in a car or whatever. But that was kinda like where the state of speech recognition was. And then deep learning based approaches, like, the DeepSpeech really inspired,

like, fully deep learning based approaches. They showed that they were just as good as those classical methods and had a lot higher ceiling. Like, there was a lot still to be uncovered and investigated and researched to make those end to end models a lot more accurate.

You could just tell back at the time if you were following what was happening in deep learning that you could see the potential because there was all this innovation happening in, like, convolutional neural networks and recurrent neural networks and LSTMs. And now, transformers came out, there was attention.

There's just all this exciting new research happening that you knew was gonna not only be able to improve the field of speech recognition, but also self driving cars, NLP tasks, like every single machine learning domain. And so the idea that we had when we started Assembly was, hey, what if we focus on researching and building state of the art AI models for speech recognition.

This is back in, like, 2017 when we started the company, and make those really, really simple for folks to consume through

a free API that you can just sign up for and pay if you actually use it in a real business or in a real product, like Twilio, Stripe style developer experience. Because also at the time, really the only way to get access to speech recognition models was through these like really heavy commercial sales processes that like you just couldn't do if you were just a developer at, like, a hackathon or trying to build a new company. So that was the genesis of the idea.

But always, we wanted to build a deep learning company, an AI company that would just have really strong deep learning competencies so that we could do more than just speech recognition. And so we can build out, like all the infrastructure and competencies that we needed to be able to do a lot more than just speech recognition and build this API

platform of a bunch of different AI models. You know, 1 is speech recognition, but others, you know, will be x y z that developers can come to easily access state of the art AI models and build products around them, and product teams can build new features around them at their companies. So that's really, like, what the idea was

and still is. And that's why today, we've expanded beyond just speech recognition where we're still focused on audio, but you can use our API to summarize audio files to detect emotions, to detect speakers, to segment audio files into chapters. Like, you know, we'll say, like, hey, the people in your audio file were talking about x y z between x and x minutes. And they were talking about this between these minutes with our chapters

model. So we've expanded way beyond just speech recognition and are now starting to expand beyond just audio. So we have a lot coming out there, but that's really where it started. 1 of the interesting and challenging aspects of what you're doing is that you can't just throw together a quick, you know, UI or a landing page or, you know, wire together, you know, a sketchy web application and just to prove out an idea and determine, is this the right direction?

You actually have to have something that works up to a certain level of capability before you can really start to put it in front of people. And I'm curious if you can draw some parallels between your experiences of going from idea to prototype and putting it in front of people and how that contrasts with some of your peers who are doing more of a pure software play to build and launch startups. Definitely. Like, we went through Y Combinator, which is only 3 months. And back then,

it took, like, 3 weeks to train, like, our speech recognition model. We get, like, 3 and a half attempts throughout the entire YC program. And so, you know, I would look at my peers in YC, and they would go out. They would get feedback. They would realize, like, their UI sucked or their product sucked, and they would just, you know, throw it out and restart and have this ability to iterate really quickly. And we didn't because, you know, we didn't have the like, it's possible.

So this is the where there's a lot of nuance. Like, it's possible to iterate quickly as a deep learning company, but you have to have the tooling and infrastructure built out to basically be able to, like, run a grid search of training runs and experiments to find a really accurate model. So, like, if you have the tooling and infrastructure to train a 100 speech recognition models in parallel

within, like, a week. But, I mean, you can get done significantly faster when a company that doesn't have that tooling and infrastructure set up, that has, like, sequentially train models on worse hardware has. And so now that we've raised a bunch of money and we're further along, like, we have tooling and infrastructure to be able to fan out these experiments horizontally and move a lot faster. But back then, yeah, I mean, we have, what, you know, IC invested, like, I think it was a 120

k or something. But that's, like, gotta last you and your salary at the time. So you can't just throw all that at GPU credits. And also GPUs back then were not as good. And to your first question about tooling and, like, the ecosystem, the ecosystem was also not as good. So I think the best GPUs that you had available in the cloud to train on back when we started assembly were the k eighties, which are, like, terrible GPUs,

especially compared to today, you know, when you're training on, like, a 100 and TPUs. So it was a lot harder to iterate quickly, and that was a struggle. And I think today, it actually probably would be less of an issue because now, like, if you were gonna start an ML company today, in theory, you could go and train some models on a 100 in the cloud, so GPUs are just better.

There are better libraries for distributed GPU training. There's a lot more sample code out there now that you could leverage. And there's also a lot of foundation models that you could piggyback off of, or maybe you don't need to train your whole model from scratch. You can just fine tune a foundation model that's lying out there to get started and to maybe, like, find your first customer and prove value. But it is true that you have to get to

like, when you're building a machine learning company or when your product is a machine learning model, I guess, is a better way to put it. You have to get that model to, like, MVP status. And that's a really fuzzy line. Like, how good does it have to be? Is it 80% accuracy? Is it 82? Maybe it's 82.5. You kinda don't know. And

so I think things you can do to make it easier are, okay, well, let's narrow the domain. Like, let's only focus on this type of data so that we don't need to build a really generalizable robust model. We just will build a model that's good for this subset of data and make it specific there and just focus our go to market efforts on a specific vertical

or specific use case that has data that's very similar to the data we're training our models on or things like that. So there's things you can do, but even when you do that, it's difficult to know what that level of accuracy is before people start buying your model. And I think that it is a challenge

with machine learning startups. It's kind of I think about it more akin to, like, starting up a hardware company or a biotech company where you're not just building, like, a crud app that you can iterate on super quickly, which I've done before and I've worked on before. So it's a lot harder in a lot of ways. I mean, we could probably talk about that specific question for an entire hour,

which I'm happy to. But it's a lot harder in a lot of ways, and I think there's shortcuts there's things you can do to try to be faster, especially in the early days, like everything I talked about, but it doesn't change the reality that you are gonna iterate more slowly, and you're not able to move as fast, or if your product is a machine learning model.

To the point of accuracy and what that threshold needs to be for your end product to be something that is usable and desirable, An interesting element of that is the question of whether or not that level of accuracy is even attainable for a given problem domain. And I'm curious how you and other folks that you've seen working in this space have approached that question of, okay, I know that I need to be able to hit an accuracy threshold in the range of 80 to 85%

for problem domain x. Right. But before you actually go down the path of building it and reaching that level of accuracy, how do you actually understand whether that's something that's possible before you spend all that time and money and energy?

If there's a commercial product that exists that already has customers, even if it's, like, terrible, right, That is actually helpful in a lot of ways because that's a benchmark that you then just have to meet or exceed, and you know that you have something that's commercially viable. We'll use the speech recognition example.

Like, if you know that if there's only a single speech recognition provider in the market, right, hypothetically speaking, and so there's 1 seller of a commercial speech recognition model, and they could be terrible, terrible. Like, horrible company to work with. They're evil.

You know, they, like, are terrible. So you see an opportunity to build a better 1. But they do have a couple of customers. They're making a few $1, 000, 000 a year. Well, if you can just go benchmark, alright, what level of accuracy do they have that makes their product commercially viable to some subset of customers,

that is helpful in a lot of ways. So I think, like, if you're starting a machine learning company today, it is helpful to look at if there are things that exist that are the same or similar, like, where are they at in terms of accuracy? And that's a really good guidepost because then you know kinda where you need to get to before you can have something that's commercially viable. It's a lot harder if it's brand new and you're the 1st commercial

entrant to the market. And in that case, yeah, it's like you have to go talk to customers and figure out. And it's difficult to say, like, what level of accuracy would you need to see because oftentimes, I mean, how do they know? It's difficult for them to like, how would they even come up with, like, 85%. They have no idea. So I think it's more about framing, like, what are they trying to do?

It's like, okay, if you're trying to go to exit on a highway with a self driving car, or if you're trying to drive around, like, MIT's campus with, like, a self driving go kart, then the success criteria is, like, not crashing within that use case, I think, or not injuring a pedestrian or crashing a vehicle. Maybe those are the 2 success criteria. And so then you just have to work towards that.

Versus, like, if you approached MIT and said, hey, what level of accuracy like, if you approached MIT and said, what level of accuracy would you need to buy my self driving go kart for your campus? They might say, like, 92.

Versus reframing it as like, hey, what would success look like if I sold you a self driving car? And then they might say, okay, yeah, doesn't injure a pedestrian. Maybe like, you know, once a year it injures a pedestrian and it's miter, and the vehicle doesn't crash into a signpost frequently. And then you just go work towards that. Or,

you know, there's other examples you could probably make. But that then it's more fuzzy because there's not a specific metric that you're working towards. It's just, like, creating something that is good enough to make the customer that you're selling it to successful with whatever they're trying to use your model for. So it gets really fuzzy. And

like you said, it's also dependent on the problem domain. So, like, it also depended on a customer. There might be a college campus that's really innovative, and they're like, yeah, we don't care if we injure pedestrians.

We just want self driving go karts. So give us what you got as long as it works. Whereas there's others that are maybe a lot more risk averse, so it's gotta be flawless. So that's also where it comes down to really digging into who you're selling to and the market you're selling to and the problem domain that you're selling to, which is not unique to a machine learning company. It's unique to all startups.

But those are the types of things that you can do to try to help. And we still have to do that. So, like, court reporting. There's human transcriptionists that still type up, like, legal depositions and court transcripts. And those require, like, 100% accuracy. Like, you have to have a human do those because you can't afford mistakes with, like, a legal deposition.

And so even if you get automatic speech recognition accuracy to something incredible, which is, like, you know, 95%, still, it's not like you're gonna need a human to do a legal deposition, probably for another, like, 2 years until the AI can do it just as good. And that's why we don't still, like, focus on that market, for example. So that's always something that, yeah, is gonna have to be yeah. It goes back to the startup lens. It's, I guess, really a question around, like, product market

fit. So, like, what market and use case customer does your model have product market fit with? And that changes as it gets more accurate, which is interesting. Like, for some startups, as you add more features, your product market gets better. But if your product is a machine learning model, it's like as it gets more accurate, your product market fit improves, and your addressable market expands, which I think is really interesting. Because you can know early on like self driving cars,

there is product market fit for self driving cars. Like, I would buy a self driving car. You would buy a self driving car. A lot of people would buy self driving cars. So you know there's product market fit even though today I couldn't sell a self driving car because they're not good enough. So it's just about getting it more accurate until you all of a sudden hit that threshold where,

boom, you know, people will buy it. So, yeah, again, this is another 1 where we could probably talk, like, for an entire hour around, but, hopefully, that was coherent. Yeah. Another interesting element of machine learning as a core component of a product is the question of being able to acquire enough

data to be able to actually build the model in the first place to determine whether or not it's feasible from a product perspective and Yeah. Is this data that I am actually allowed to use for this end use case and, you know, how do I think about acquiring new data as I build and iterate on this model and start to onboard customers?

Like, what are the dividing lines between the data that this customer is generating in the process of interacting with my model is acceptable to feedback into the training cycle to improve the capabilities? And when is it something where I'd have to throw away that data as soon as it is input because of regulatory or compliance or ethical purposes? It's very fuzzy. It's very fuzzy, and it makes it really difficult to get started. So moving more now into

your specific company and the business that you're building, you spoke a little bit about how you're currently at about 65 people, about half of that is machine learning researchers. I'm curious if you can just talk a bit more to how you are structuring your team to be able to

build and maintain and deploy these models, but also still be able to continue to push the boundaries on what the models are capable of and what types of models you're able to factor into the overall product suite that you're building out? We're actively working through a lot of this, like, team structure and setting all this up now that we're growing. I mean, we were, like, 20 people 6 months ago. We've grown pretty quickly, and it's

caused some growing pains for sure. But the way that we've structured it recently is we have our research teams. And so those research teams are focused just on researching neural network architectures and training models. And they're just focused on research and training models. And then we have a deep learning

platform team. And that deep learning platform team, they're focused on taking the models from the researchers and packaging them up into basically like PIP installable packages that have automated test suites attached to them for accuracy benchmarks, for profiling the compute and memory footprint of the model, optimizing the model to run more performant. So they focus on basically the packaging. And then we have our traditional

engineering team. That deep learning platform team also, like, as part of the packaging, focuses on exposing interfaces that make those models scalable in production. So, like, basically, to be able to mini batch things at Imprints time, more or less. And then we have our traditional engineering team that focuses on our, like, REST API as well as building the microservices around these packaged neural networks and working with our traditional,

like, platform team to get those microservices into production and scaling correctly. And, you know, they're, like, on call if those go down. That's the process that we've set up. Now the deep learning platform team, they also are working on building out the tooling and, like, yeah, internal tooling around enabling our researchers to scale their experiments, like, with a single command line, be able to run a bunch of training jobs on a crazy amount of compute

so that, like, in a week, they get back rid of results for different training runs that were run on like large CPU clusters. It's kind of abstracting that away from the researchers so they can just focus on the research. So that's how we've been setting it up. Like when we first started, you know, we're a small team, we did everything. Like 1 person did everything, like research the model, built the microservice, put it into production, like, with an engineer.

But now that we've gotten bigger, we've had to, like, kinda delineate the responsibilities so that we can try to be more productive and get more done. Because the big thing for us is being able to, like, rapidly research and then also push stuff into production. Because we want to be able to put state of the art models into production for developers to use

really quickly. That's really our value proposition. It's like, hey, The way I think about it is, like, we're basically abstracting away all the pain of researching, training, and deploying state of the art AI models. Like, we're abstracting all that away behind just like a simple API call or SDK and making that available to, you know, any developer at a hackathon or at a company. And you only have to pay for it if you're actually using it, you know, for, like, a legit use case

at a real company. So that's why for us, this is a major competency that a company has to have. And we invest a ton in it from a hiring perspective and also just like an OpEx perspective too. Like, we spend a lot of money on GPUs, and probably will always do that.

In terms of the actual models that you're building, I'm curious how you think about the feature scope of what the model is capable of and how you decide sort of what capabilities you wanna bake into it versus when you want to split it apart into different models that cooperate for different aspects of a particular task and sort of the composability of the models and being able to balance the

accuracy and feature set against the ability to train and deploy them, iterate on them quickly, and just sort of the kind of sizing and composability of models as you expand the capabilities of your platform? Yeah. I mean, so we try to create models for specific tasks. So, like, we have a summarization model, entity detection model, like, our content moderation model, so that we aren't bundling things together, which gives us the ability to, like, iterate on

specific models at a time. But they all share underlying research components. You know? So, like, 1 team might come up with an optimization for, like, a certain transformer layer or something, and other teams might be able to benefit from that to also improve their models. But we try to really focus on being able to iterate really quickly, and then it goes back to that earlier question you asked around it's like build models that are gonna have product market fit.

So build models that there's a clear use case for and that are not just interesting, but people could actually build products or features around. So we spend a lot of time there. Like, how are developers or customers gonna use this model? Because a big part too is, like, once you have the model, how do you expose it to folks? Like, do you give them this, like, big messy JSON payload that's hard to understand

or even, like, figure out what's going on? Or, you know, do you try to simplify it and maybe hide certain parts of what the model's outputting and, yeah, abstract that away or simplify that into something that's, like, more easy to consume. So we think about that as well when we're building these models. Like, that's also a pretty important component we found.

And then as to the kind of operational characteristics and MLOps systems that you've built out to be able to run this business and be able to build and maintain velocity among the data science and ML engineering teams? What are some of the functionalities that you've had to build in house, and how much have you been able to take off the shelf, whether that's open source or commercial products?

I mean, we've surprisingly had to build a lot in house because I think there's not many companies that are, like, doing large scale training jobs, for example. It's still kind of limited to, like, big tech companies. So, like, we've hired a lot of people from big tech companies. So, you know, I worked at a big tech company. So, you know, we know this firsthand.

But big tech companies have a lot of internal tooling for their research teams to be able to, like, scale out training experiments and, like, run a grid search over hyperparameters and, like, easily access large datasets to train models on. So they have a lot of tooling to enable their researchers to be super productive. And there isn't a ton of that that's open source because that is, like, kind of like like, that's a competitive advantage.

You know? So, like, we're having to build a lot of that now. And, like, the tooling we have now is so much better than what we had even 2 years ago because now we have teams that can just focus on building the tooling. You know? So now we can like, a researcher can, you know, run a command on the command line to launch an experiment across, like, you know, dozens of GPUs, like, in various clouds, some on our own compute that we've purchased,

compute clusters that we've purchased. And all that's kinda managed. And there's, like, scheduling systems that, like, keep our compute constantly running jobs and so we get a lot of efficiency out of them. So there's a lot there that I haven't really seen firsthand any good libraries for. And the way we think about it is, like, that's actually a big competency that we have to build, and it's a competitive advantage that we'll have because we'll be able

to basically accelerate our research velocity, which is just about, like, it's trial and error. It's like trying stuff out, seeing what works. And then on the deployment side, it's similar. I mean, there are some libraries for, like, deploying machine learning models, but it's still pretty vanilla. And yeah. Like, when you think about alright. Like, hugging face. Right? Do you see, like, the Bloom language model that was open sourced? Have you seen that? It's like this 170

myself, but I'll have to take a look after the show. So it's this, like, 170, 000, 000, 000 parameter model. That's, like, open source. Right? So you can download it. But, I mean, how I don't think you could do inference. I haven't looked into it, so I could be wrong. But, I mean, you might struggle to inference on a single GPU with a model that size. And so when you think about deploying large models

where most of the accuracy is coming from, regardless of what the task is, it's, like, usually large neural network behind it. From the deployment perspective, there's also not a lot of great stuff out there. So we've got to build a lot of that in house. And we hope to, at some point, open source pieces of this or maybe be able to turn some of the tooling that we're building into products. But that's where the ecosystem is still in its infancy, I would say.

I think there's a lot of companies trying to solve this, but I feel like it's moving too quickly to be able to solve. It's 1 thing to like with AWS, you know, you can get load balancers and target groups and web application firewalls. And there's these, like, mature concepts

in, like, standing up web applications that, like, exist. You know? But I feel like that's not really the case yet with machine learning, and it's also going evolving so quickly that, like, you could come up with something that is amazing and super efficient to deploy, like, you know, convolutional neural network. But, you know, now

models are so much bigger and they leverage different types of neural network layers and it's changing so rapidly. So the ecosystem is pretty underdeveloped there still. And we're having to build a lot of that out in house. And that's actually like a huge part of our hiring right now is to hire folks that will help continue to build out those internal platforms and, like, tools, basically, that will help us to accelerate our research velocity and also

make it easier to deploy large models into production and scale them. In your case, you are primarily exposing these models through an API interface, which is another aspect of the engineering of designing the APIs, figuring out how this maps to the different capabilities of that model and the scoping there. And I'm just wondering if you can talk to the ways that you think about the deployment and scalability of the

models and how that maps to the scalability of the applications that are kind of housing those models and just the operational efficiencies that are necessary to be able to operate at the kind of scale that you have built up to and how you're thinking about what are the breaking points where we need to rethink how we manage deployment and monitoring and scaling? Yeah. There's a ton to unpack there. If you're gonna ship a model that doesn't have a ton of usage, you don't have to be,

like, crazy efficient about how you deploy it. But as soon as you get a lot of usage on it, then you need to be making sure that if your model has to be deployed on a GPU for inference, for example, that, like, you're tracking your GPU utilization and it's not, like, 20% consistently because you're underutilizing the GPU card that you're deploying on. And why are you underutilizing

that? Well, maybe the way you've written your application, it's not, you know, batching requests together and mini batching requests and trying to keep your compute footprint as efficient as possible. For us, at least, those problems have become more dramatic as models that we've deployed have gotten larger adoption. So, like, when I think about our speech recognition model, you know, that thing sees, like, millions of audio files every single day.

And so, you know, we were at a point where that was kind of inefficient. And at that scale, especially, you know, we've been growing like like yeah. Volumes to our API has, tripled over the last 6 months. Things can get out of hand pretty quick. Then you have to really dig into, like, how are you breaking this model down and all the different parts of it into its component parts and thinking about deploying those as separate microservices

to make things scale. And how are you feeding data into the model? Are you measuring things like GPU utilization and making sure that those are really performing? Like, what metrics are you tracking? So those have also been problems that we've had to work through as we've been seeing larger and larger scales. But I would say it goes back to, like, yeah, there are some models that we build that are experimental.

Like, we have a couple that we're working on right now, and we test them out with a couple of customers or developers at small scales. So we just kinda push them into production, and they can, basically, we finished training them and then we just kinda, like, throw them into production. They're inefficient. And even from a neural network perspective, the neural networks haven't been optimized for inference. So we just put them into production

so that we can see, like, hey, is this thing worth spending more time on? Because you can spend, you know, a lot of time on the engineering side building good microservices that are scalable and architecting things the right way. But there's also, on the research side, you know, training the models to work to basically, like, compress the models down into smaller sizes.

There's a lot of work there basically to take your model once it's done training and get it into a more performant package for inference. So there's work on both sides. And as a start up, yeah, it's like, alright. Where are we gonna spend our time? There's always something that we have to think about because we never have enough

time to do everything that we wanna do or enough resources to do everything that we wanna do. So we tend to focus most on like, we know which models cost us the most, like, basically, on our AWS bill. And we focus most of our time on the ones that are most costly. And the ones that are not that costly, even if they're super inefficient, we kinda just leave them because it's you know, we could optimize the crap out of them, and it's only contributing, like, 5% to our overall costs. So

that's how we think about it. But again, there's a ton there to to unpack. Yeah. Absolutely.

And in terms of your experience of building and deploying and scaling and tuning these ML models and building a business, that that is the core competency and the competitive differentiator that you're offering, what are some of the most interesting or innovative or unexpected aspects of ML development and maintenance and managing the business aspects around that that you have encountered in the process?

Yeah. I mean, I would just say, like, the whole industry is just evolving so quickly. You know? So, like, 3 years ago, you could have a state of the art model that is completely different than what state of the art looks like today. And so, like, basically, all of your infrastructure, like a lot of it and your systems design, you know, has to be, like, thrown out. And that still happens. I mean, we have

versions of our speech recognition. Like, there are speech recognition models that our research team is working on that are completely new neural network architectures. And, you know, a lot of the system design and architecture we've set up for our current speech recognition models is basically gonna have to be thrown out if we switch to this new neural network architecture in production. And so I think, you know, it it's a little bit different

from traditional software development. I mean, I get that in, like, traditional software development, especially as as a start up, as you reach new levels of scale, you always have to re architect your systems and design them differently. But to your point about, like, ecosystem, there's better tools at your disposal. Like, you know, okay, if you need,

you know, like, if you have a ton of transactions at your database, like or to a database, here are your options. And it's not if you have, like, high low to your database, you're, like, 1 of the only companies that has that. You have to, like, invent a new database or some transaction handling library to handle that better. Like, there's tons of options and there's tons of companies that have already solved this and there's, you know, like, lots of resources and, yeah, at your disposal.

But with a machine learning company, because the product is the model and the research is developing at such a rapid pace. And if you wanna stay ahead, you constantly have to be trying to, like, disrupt your current model, which can be like a really fast cycle. It can be like every 6 months, you know, because you can have a good neural network architecture, and then you can tweak it, make small tweaks, and eke out better performance.

But then eventually, you kinda hit a ceiling and you have to go to the next like, find the next neural network architecture that's gonna give you a boost, like a bigger boost, and then you mine that further. So I would say the thing that is challenging is that component to it where if we didn't care so much about having the best product in the market, we would just be like, yeah, our models are good and, you know, we're just gonna focus on optimizing them and,

you know, maybe like in a year, we'll look at upgrading them to a better model architecture. But because we care a lot about always running the latest and greatest and best research and production, you know, we're constantly trying to disrupt the current models that we have in production and replace them with better ones, like, very quickly because that's what we promise our customers we're always gonna do. And that exposes a lot of, like, tricky problems.

So that's been, like, a unique challenge, I think, as an ML company that we've that all ML companies that are really trying to stay on the state of the art will face. An example I think I think of is like, if you use the Zillow app, there's like the Zillow's estimate. There's probably some model that comes up with that Zillow's estimate.

That thing probably works pretty well, like and that model probably isn't touched that often. So it's just about, like, optimizing the deployment for that as the scale of request that model gets larger. But the fields that we work in, they're rapidly evolving, and we always kind of have to be focused on unlocking the next accuracy game to stay ahead.

In your own experience of building and growing the assembly business and investing further in machine learning and building out your own internal I mean I mean, I'm learning so much every day. I think, like, you know, 1 thing that's really interesting right now is we're going from, like, a company where you know everyone and you do interact with everyone to a company where that's not the case.

Like, I think once we pass 50 people and we're on pace to be, like, maybe 80 people by the end of this year, the challenges that you face as a company are, like, totally different. Then it's about, like, how do you get these teams working well together, and there's, like, a lot more, like, cross team collaboration that has to happen. So those are interesting things that are happening now that we're working on. So that's 1 of the more recent things. But, man, I mean, it's constant.

And it's fun, but it's just constant. And so as somebody who has been building a business around ML as the core capability, what are the cases where ML is the wrong choice for building a business and you're better off just using traditional software approaches or just putting an API in front of Mechanical Turk? Yeah. I mean, AI models have gotten a lot better over the last couple of years in particular.

So makes it a little bit harder to answer that question. I mean, 1 lens that you could answer that question through is, like, applications and use cases where you still need, like, human level accuracy. You know? There are some domains where progress with AI models has gotten so much better, but, like, you still need a human in the review process in some domains. Like I think of the legal deposition use case or and even self driving cars, like they're so much better now

today. I mean, just objectively speaking, like, it's incredible what, like, Tesla's autopilot can do. But still, it's not good enough to, you know, just, like, go to sleep in your back seat and tell the Tesla where to take you to your city. So I think that's 1 thing where people that aren't familiar with AI have to just keep in mind that, like, the progress has been incredible, but make sure you're really thinking about, like, yeah, what success

needs to look like to be able to you know, like, if you're gonna use an AI model, and is that feasible with the state of the art today? And if not, then maybe you need to go to Mechanical Turk. I guess another framing of it is, what are the kind of capabilities or background that are necessary to be successful building an ML focused business? I think, like, a deep competency in ML and a passion for it. I mean, it sounds so corny, but

it's just 1 of those fields that's evolving so quickly. Like, if you're gonna start an Internet business, you don't need to be super passionate about programming. But you need to know how to program so that you can build whatever, like, crud app or SaaS product you wanna build. But, like, you don't necessarily need to be, like, passionate about, like, dev tools and, like, you know, attending conferences all the time about programming to start an Internet business, like, period.

But with an ML company, I think it's different because, you know, if you use your point in time skills to build an ML company today, in a year, what you've built is gonna be, like, very outdated because the research in all ML fields is just rapidly evolving. So you have to really be interested in the space to, I think, start an ML company, in my opinion. Because it's different than just starting like any old Internet business.

And as you continue to build and scale assembly, what are some of the things you have planned for the near to medium term or any particular projects or products you're excited to dig into? Yeah. It's a great question. So right now, like, our API can handle audio. So you send in an audio file and you say, hey. I want, you know, speech recognition and all these other models applied to this audio file. Again, they give me, like, the JSON output.

Pretty soon, we're gonna expose APIs that can ingest text documents, So like news articles or tweets or chat messages. And you can run those text documents through a similar pipeline. So you can say, hey, run this text document through, like, you know, these 4 models, content moderation, summarization, entity detection, remove sensitive personal information from it, and then give me a JSON output on the other end.

That's a really exciting thing that's coming soon because our whole goal and vision as a company is to build out this API platform for state of the art AI models that can work on many different types of data. And so we've been really focused on audio. I will always will be, but I'm excited to see that we're expanding. We're actually starting to expand beyond just that. I know that's coming pretty soon.

Are there any other aspects of your work at assembly or the overall experience of building a business that has ML as the core product that we didn't discuss yet you'd like to cover before we close out the show? The biggest lesson I think that I've learned though is just you have to have a compass you're working towards with the models that you're building. So, like, you have to have a very good understanding of, like, how accurate your models are on real world data.

Get a ton of real world benchmark data, make sure it's labeled super well, super diverse, segment it into different buckets depending on the data domain. If there's anything else out there on the market, go see where those options stand against that data and then, you know, see where your models are at. But you have to have a really good compass. And I would say before even starting training a model or looking at neural network options and They can serve as your compass

because otherwise that can serve as your compass. Because otherwise, you're flying blind. Anyway, that's the biggest thing that I think I always try to reiterate to our teams that I think folks building ML companies should keep in mind. But otherwise, it's been great to chat with you, and thanks for having me on. Absolutely.

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest barrier to adoption for machine learning today. So I have a non answer for you on that, which is that I actually think barrier has never been lower because I think there's so much content

out there now. Machine learning's kinda gone mainstream. Like, The Economist did a special edition on it recently. There's a lot of examples of machine learning progress, like DALL E 2 and GPT 3, Google's Lambda model that someone thought was sentient going into the mainstream. And there's so much content on YouTube now, Twitter

that make it easy and a lot more libraries for building ML models. I know we talked about the ecosystem being immature for large scale training and large scale deployments. But if you're just trying to get into machine learning and, like, train a model, it's never been easier than it is today. So I would actually say, like, the barrier has gone lower, and I think that's a good thing.

So it's somewhat of a non answer for you. I hope that's okay. No. It's totally fine. I appreciate that perspective. So thank you again for taking the time today to join me and share the work you've been doing at Assembly and your experiences of building a business with ML as the core product. It's definitely a very interesting problem, and it's great to be able to speak to you and learn from your experience. So I appreciate all the time and energy that you and your team are putting into the work that you're doing, and I hope you enjoy the rest of your day. Yeah. Thank you so much for having me on. It was great to talk to you.

Thank you for listening. And don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management, and podcast.init, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machine learning podcast.com

to subscribe to the show, sign up for the mailing list, and the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hostst at themachinelearningpodcast. Com with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Transcript source: Provided by creator in RSS feed: download file