¶ Intro / Opening
Ejaaz: What if I told you there was a single website you could go to where you can Ejaaz: chat to any major AI model from one single interface?
¶ Introduction to OpenRouter AI
Ejaaz: It's kind of like chat GPT, but instead every prompt gets routed to the exact Ejaaz: AI model that will do the best job for whatever your prompt might be. Ejaaz: Well, on today's episode, we're joined by Alex Atala, the founder and CEO of Open Router AI. Ejaaz: It's the fastest growing AI model marketplace with access to over 400 LLMs, Ejaaz: making it the only place that really knows how people use AI models, Ejaaz: and more importantly, how they might use them in the future.
Ejaaz: It's at the intersection of every single prompt that anyone writes and every Ejaaz: model that they might ever be. Ejaaz: Alex Atala, welcome to the show. How are you, man? Alex: Thanks, guys. Great. Thanks so much for having me on. Ejaaz: So it is a Monday. How does the founder of OpenRouter spend his weekend?
¶ Founder's Weekend Routine
Ejaaz: Presumably you know out and about chilling relaxing not at all focused on the company oh Alex: I usually i love weekends with no Alex: meetings planned and i just go to a coffee shop and just have tons of hours Alex: stacked in a row uh to do things that require a lot of momentum build up so Alex: i did that at coffee shops on saturday and sunday and then i watched blade runner again.
Ejaaz: Again okay um well so Ejaaz: when we were preparing for this episode alex Ejaaz: um i couldn't help but think that you've had a pretty insane decade of startup Ejaaz: foundership right um so open router is kind of like your second major thing Ejaaz: that you've done but prior to doing that you were the founder and cto of OpenSea, Ejaaz: the biggest NFT marketplace out there. Ejaaz: And now you're focused on one of the biggest AI companies out there.
Ejaaz: So it sounds like you're at kind of like the pivot point of two of the most Ejaaz: important technology sectors over the last decade. Ejaaz: Can you just give us a bit of background as to how you ended up here? Ejaaz: And more importantly, where you started.
¶ Journey from OpenSea to OpenRouter
Ejaaz: Walk us through the journey of OpenSea and how you ended up at OpenRouter AI. Alex: Yeah, so I co-founded OpenSea with Devin Finzer the very beginning of 2018, very end of 2017. Alex: It was the first NFT marketplace. And... Alex: It was not dissimilar to OpenRouter in that there was a really fragmented ecosystem Alex: of NFT metadata and media that gets attached to these tokens.
Alex: And it was the first example of something in crypto that could be non-fungible, Alex: meaning it's a single thing that can be traded from person to person.
Alex: Most things in the world are non-fungible. A chair is non-fungible. a Alex: currency is fungible so it was Alex: back back in 2018 no Alex: one was really thinking about crypto in terms of non-fungible goods Alex: and uh and the problem with the non with Alex: non-fungible goods is that there weren't any real standards set up Alex: um there was a lot of heterogeneous like Alex: implementations for how to get uh like
Alex: a non-fungible item represented and tradable in a decentralized way So OpenSea Alex: organized this like very heterogeneous inventory and put it together in one Alex: place. We came up with like a metadata standard. Alex: We did a lot of like a lot of work to really make the experience super good for each collection. Alex: And you see a lot of those a lot of similarities with how AI works today, Alex: too, where there's also just a very heterogeneous ecosystem.
Alex: On a lot of different APIs and different features supported by language model providers. Alex: And Open Router similarly does a lot of work to organize it all. Alex: Um i was at open sea uh until 2022 um when i was kind of feeling the itch to do something new. Alex: And um i'm at the very end of i left in august and then chat gpt came out a few months later.
Alex: And uh and my biggest question around that Alex: time was whether it was going to be a winner take all market Alex: because opening i was very far ahead of Alex: everybody else and um you know Alex: we had cohere command we had a couple open source models um Alex: but opening i was the only really usable one i Alex: was doing little projects to experiment Alex: with the gpt3 api and uh Alex: and then llama came out in january um really Alex: exciting about a tenth the size one on a
Alex: couple benchmarks but it wasn't really chattable yet and Alex: uh and it wasn't until uh a few Alex: months later that somebody a team at Alex: stanford distilled it into a new Alex: model called alpaca um distillation means Alex: you you take the model and you customize it or fine tune it Alex: on a set of synthetic data that they Alex: made using chat gpt as a research project Alex: and uh and that was it was Alex: the first successful major distillation that i'm aware
Alex: of um and it was an actually usable model i Alex: was like on the airplane talking to him i was like wow this Alex: is if it only took six hundred dollars to make something like this then you Alex: don't need ten million dollars to make a model there might be like tens of thousands Alex: hundreds of thousands of models in the future and suddenly this started to look Alex: like a new like economic primitive a new building block that people that kind
Alex: of deserve their own place on the internet.
¶ Exploring Frontiers of Technology
Alex: And there wasn't one. There wasn't a place where you could discover new language Alex: models and see who uses them and why. Alex: And that's how OpenRouter got started. Josh: That's amazing. So one of the things that we're obsessed with on this channel Josh: in particular is exploring frontiers and how to properly see these frontiers Josh: and analyze them and understand when they're going to happen.
Josh: And when I was going through your history, you have this talent consistently over time. Josh: And even as far back as early on, I read you were hacking Wi-Fi routers in a hackathon. Josh: You're very early to that. You were early to the NFTs. You were early to understanding
Josh: AI and the impact that it would have. And what I'd love for you to explain is Josh: the thought process and the indicators you look for when exploring these new Josh: frontiers, because clearly there's some sort of pattern matching going on. Josh: Clearly you have some sort of awareness of what will be important and why it Josh: will be important, and then inserting yourself into that narrative.
Josh: So are there patterns? Are there certain things that you look for when searching Josh: for these new opportunities and that led you to make these decisions that you have? Alex: I think there's there's a lot to be said for finding enthusiast communities Alex: and and seeing if you're going to join it. Alex: Like, can you be an enthusiast with them?
Alex: Like whenever something new comes out that has like some kind of ecosystem potential, Alex: there's there are going to be enthusiast communities that pop up. Alex: And the Internet has made it self-certain. You could just join the communities.
¶ Patterns in New Opportunities
Alex: Um discord i think is a incredible Alex: and super underrated platform because Alex: the communities feel kind of private you're Alex: like getting you don't feel like you're you know Alex: seeing somebody trying to get s you Alex: know like advertise something for seo juice there's Alex: no seo juice in discord um it's it's Alex: just people talking about what they're passionate about and and it Alex: goes it gets really niche um and when
Alex: you find a like an interest group in discord that Alex: like has to do with some some new Alex: piece of technology that's just being developed right now and doesn't really Alex: work very well at all um you get people who are just trying to figure out what Alex: to do with it and how to make it better and i think that's like that's the first Alex: core piece of magic that jumps to mind, Alex: there's got to be like a willingness to be weird because if you jump into any
Alex: of these communities at face value it's stupid. Alex: Like oh this is like just a game or it's like a really weird game I mean I'm Alex: not really a collectible game so I'm going to leave right now and yeah. Alex: Not only do you have to be aware, but you have to be creative. Alex: Like, okay, these are just cats on the blockchain, and people are just trading cats back and forth. Alex: You can't look at the community as simply that. Alex: Think about what you could do with it.
Alex: Like, what is this unlock that wasn't achievable before?
Alex: Um and uh Alex: and and i think there are there are people who Alex: just are good who will do this and they'll join the communities Alex: and and brainstorm live and you can see everybody Alex: brainstorming uh in real time but like Alex: another incredible example of this was the mid-journey discord Alex: you know it became the Alex: biggest biggest server in discord by Alex: far uh and you know Alex: why did that happened well you could it started with Alex: something weird silly maybe
Alex: not super useful but you could see all the Alex: enthusiasts like remixing and Alex: brainstorming live how to turn it into something beautiful Alex: and how to how to make it useful and um Alex: and then you know just explode it like i it's the most it's the it's the most Alex: incredible like niche community uh i think that discord has ever seen because Alex: of like how useless it started and how insanely exciting it became.
¶ The Role of Enthusiast Communities
Alex: So um like i mean i i Alex: think i saw big sleep i was like playing around with this model Alex: called big sleep in 2021 that uh Alex: let you generate images that Alex: look kind of like deviant art okay and Alex: uh you could see you could like they're all Alex: animated images and they none of them really made sense but you could get some Alex: really cool stuff not like potentially something you'd want to make your desktop
Alex: wallpaper and if you're really like deep in some deviant art communities you Alex: know you kind appreciate it and so and that that that was like oh there's like Alex: a kernel of something here, Alex: and uh it took like a like another year or two before mid-journey started to Alex: like pick up but that was like.
Ejaaz: Where were you seeing all of this alex like where were you scouring just random Ejaaz: forums or just wherever your nose told you to go Alex: But basically there's this twitter account I'm trying to remember what it's Alex: called that posts AI research papers and and like kind of tries to show what you can do with them. Alex: And I discovered this Twitter account in like 2021.
Alex: And I. Alex: I think it was not it was it wasn't at all like related to crypto but it was Alex: a way you know big sleep was like the first thing i saw that used ai to generate Alex: things that could potentially be nfts, Alex: so i started experimenting around like how how much you could direct it to make Alex: an nft collection that would make any sense it was very very difficult um but Alex: that was how uh that was like the first generative and.
Ejaaz: This was before you were even thinking about starting open router right Alex: Um yeah yeah this was back this was when i was Alex: full-time at openc um oh is Alex: yeah i got the it's a colic Alex: this twitter account all right Alex: i really recommend it they basically post papers and like explainate and explore Alex: how this paper gets useful um they post animations uh like they make they make
Alex: ai research like kind of fun to engage with and that was that was my first experience. Ejaaz: Okay, so I mean that's a massive win for X or formerly as it was known back Ejaaz: then, Twitter as a platform, right? Ejaaz: It gave birth to kind of like two of the biggest technologies crypto, Ejaaz: also known as crypto Twitter, and now apparently all the AI research stuff which Ejaaz: kind of put you on to the path that led you to OpenRatter.
Ejaaz: So if I've got this right, Alex, you were full-time at OpenSea with a Ejaaz: multi-billion dollar company loads of important stuff to do there, Ejaaz: but you still found the time to kind of scour this fringe technology because Ejaaz: that's what AI was at the time. Ejaaz: Prior to kind of GPT-2 or GPT-3, no one really knew about this.
¶ Early Innovations in AI
Ejaaz: And you were playing around with these gen AI models, these generative AI models Ejaaz: that would create this magical little substance and maybe it came in the form Ejaaz: of a pitcher or a weird little cat. Ejaaz: And you kind of like jumped into these niche forums of enthusiasts, Ejaaz: as you say, and kind of explored that further. Ejaaz: And it sounds like you kind of like honed that even beyond your journey from OpenSea when you left.
Ejaaz: I remember actually meeting you in this kind of like this abbess between you Ejaaz: leaving OpenSea and starting OpenRouter where you were kind of brainstorming Ejaaz: a bunch of these ideas. And I remember a snippet from our conversation Ejaaz: In like one of the WeWorks here, where you just kind of like had whiteboarded a bunch of AI stuff. Ejaaz: And one of those things was kind of like the whole topic of inference.
Ejaaz: And if I'm being honest with you, I had no idea what that word even meant back then. Ejaaz: I was extremely focused on all the NFT stuff and all the crypto stuff, Ejaaz: my background's in all of that. Ejaaz: But I just found that fascinating that you always had your nose in some of the Ejaaz: early communities. And I think that's a really important lesson there.
Ejaaz: I want to pick up on something that you actually brought up when you said you Ejaaz: discovered kind of like your path to open router, Alex. Ejaaz: And that is, you said you were playing around with these early AI models. Ejaaz: So not the GPTs before Claude was even created. Ejaaz: You're playing around with these random models that you would find either on Ejaaz: forums, on Twitter, or on Reddit, right? and you would experiment with them.
Ejaaz: And I find it fascinating that back then, even when GPT became a thing, Ejaaz: you were convinced that there would be hundreds of thousands, Ejaaz: or did you say hundreds of thousands of AI models? Ejaaz: Back then, that wasn't a normal view. Ejaaz: Back then, everyone was like, you need hundreds of millions of dollars. Ejaaz: Maybe it was tens of millions of dollars back then. And it was going to be a rich man's game.
Alex: Yeah, it was basically the Alpaca Project that kind of put me over the sack.
¶ Insights on Model Development
Alex: On there being many, many, many models instead of just a very small number. Ejaaz: And can you explain what the Alpaca project is for the audience? Yeah. Alex: So the Alpaca project, after Lama came out, you really could not chat with it Alex: very well. It was a text completion model. Alex: There were a couple benchmarks where it beat GPT-3. Alex: And... It was about a tenth the size of what most people thought GPT-3 was sized at. Alex: So it was a pretty incredible achievement.
Alex: But it wasn't really like, the user experience wasn't there. Alex: And the Alpaca project took ChatGPT and generated a bunch of synthetic outputs. Alex: And then they fine-tuned Llama on those synthetic outputs. Alex: And this did two things to Llama. It taught it style, and it taught it knowledge. Alex: It taught it, like, the style is like how to chat, which was the big user experience gap. Alex: And it made it smarter. Alex: Like, you can, fine-tuning transfers both style and knowledge.
Alex: And the model would, like, respond to things that it had, you know, Alex: like, the content of the synthetic data, like, was reflected in the model's Alex: performance on benchmarks after that point.
Alex: So um so if you can do Alex: that without revealing all Alex: the data that goes in um now now Alex: there's like a way you could sell data via api without Alex: like like just dumping all the data out to the world and then never being able Alex: to to like monetize it again so there's like a brand new business model around Alex: data that emerges um yet like the ability to create just like work towards open intelligence, Alex: and uh and build like new
Alex: architectures test them more quickly and and and Alex: uh uh fine-tune them quickly basically you Alex: can build on top of the work of giants i mean Alex: you don't have to start from zero every time a lot Alex: of like the biggest developer experience innovations just involve like giving Alex: developers a higher stair to start walking up so they don't have to start at Alex: the bottom of the staircase every single time um and you know that was like the the the big.
Alex: Like generous give that llama had for the community um and it wasn't you know Alex: that wasn't the only company doing open source models, Mastral, Alex: came out with 7B Instruct a few months later. It was an incredible model. Alex: Then they came out with the first open-weight mixture of experts a few months later. Alex: It felt like actual intelligence, but completely open.
Alex: And all of these provide higher and higher stairs for other developers to kind Alex: of like, basically to crowdsource new ideas from the whole planet.
Alex: Uh and and let these new ideas build on Alex: top of really good foundations so and Alex: you know when that when that like whole picture started Alex: to form into place um it felt like okay this is going to be like a huge inventory Alex: situation you kind of like nft collections were a huge inventory situation obviously Alex: completely different really different market dynamics really different type Alex: of of goal that buyers have.
Alex: And so a lot of like my early experimentation, like I made like a Chrome extension called Window AI. Alex: I did like a few other things were just about learning how the ecosystem works Alex: and like what makes it different and how the like, like what people really want, Alex: what developers really want.
¶ Understanding OpenRouter’s Functionality
Josh: So that leads us to OpenRouter itself, right? So I kind of want you to help Josh: explain to the listeners who aren't familiar with OpenRouter what it does. Josh: Because I think a lot of people, the way they interact with an AI is they send Josh: a prompt to their model of choice. Josh: They use ChatGPT or they use the Grok app or they're on Gemini and they kind Josh: of live in these siloed worlds.
Josh: And then the next step up from the people are those kind of who use it professionally, Josh: who are developers. They're interacting with APIs. Josh: Maybe they're not interfacing with the actual UI, but they're calling a single model. Josh: And OpenRouter kind of exists on top of this, right? Can you walk us through Josh: how it works and why so many people love using OpenRouter? Alex: Open Router is an aggregator and marketplace for large language models.
Alex: You can kind of think of it as like a Stripe meets Cloudflare for both of them. Alex: It's like a single pane of glass. You can orchestrate, discover, Alex: and optimize all of your intelligence needs in one place. Alex: One billing provider gets you all the models.
Alex: Uh there's like 470 plus now uh Alex: like all the models like they sort of implement features Alex: but they do it differently and they also there's Alex: a lot of like intelligence brownouts as andre carpoffi calls them yeah where Alex: models just go down all the time even the you know even the top models like Alex: anthropic and gemini and and open Alex: ai um so what we do is you know we like developers need a lot of choice. Alex: CTOs need a lot of reliability.
Alex: CFOs need predictable costs. CISOs need complex policy controls. Alex: All of these are inputs to what we do, which is build a single pane of glass Alex: that makes models more reliable, lower costs, gives you more choice, and, Alex: and then and helps you choose between all the options for where to source your intelligence.
Josh: How does it work uh because i would imagine like what Josh: each as and i on the show we frequently talk about benchmarks right where Josh: a certain model is the best at coding and that infers that maybe you should Josh: go to that model to do all of your coding needs because it's the best at it Josh: but it would appear as if it's not true if you're routing through a lot of different Josh: providers so how do you consider which provider gets routed to when and how
Josh: to get the best result for what you're asking Alex: So we've taken a different approach so Alex: far which is instead of like focusing on Alex: a production router that picks Alex: the model for you um we try Alex: to help you choose the model so we Alex: we build lots we create lots of analytics both on Alex: your account and uh and on our Alex: rankings page to help you browse and discover the models that Alex: like the power users are really using successfully on
Alex: a certain type of workload um because we Alex: think like developers today primarily want to Alex: choose the model themselves um switching between all Alex: families can result in like a lot like very Alex: unpredictable behavior but once you've Alex: chosen your model um we try to Alex: help developers not need to think about the provider there are Alex: like sometimes dozens of Alex: providers for a given model uh all kinds
Alex: of companies including the hyperscalers like aws google vertex and azure um Alex: and uh like scaling startups like together fireworks deep infra um and a long Alex: tail of providers that provide, Alex: like very unique features, Alex: very like exceptional performance. Alex: There's all kinds of differentiators for them. Alex: So what we do is we collect them all in one place. And if you want a feature, Alex: you just get the providers that support it.
Alex: If you want performance, you get prioritized to the providers that have high performance. Alex: If you really are cost sensitive, you get prioritized to the providers that Alex: are really low cost today. and we basically create all these lanes. There's.
Alex: Innumerable ways you could get routed but Alex: you're in full control of the of the overall user Alex: experience that you're aiming for and that's Alex: what that's what we found that was missing from the Alex: whole ecosystem was just a way of doing that and uh Alex: and you know we get like between on average five to ten percent uptime boosts Alex: over going to um providers directly just by load balancing and sending you to
Alex: the top provider that's up and able to handle your request. Alex: We really focus hard on efficiency and performance.
¶ Choosing the Right Model
Alex: We only add about 20 to 25 milliseconds of latency on top of your request. Alex: It all gets deployed very close to your servers up the edge. Alex: We overall get just We stack providers. Alex: We figure out what you can benefit from that everybody else is doing and just Alex: give you the power of big data as a developer just accessing your model choice. Josh: So it kind of allows you to harness the collective knowledge of everybody, right?
Josh: You get all of the data, you have all of the queries, you know which yields Josh: the best result, and you're able to deliver the best product for them. Josh: Now, in terms of actual LLMs, EJ has actually pulled this up just before, which is a leaderboard. Josh: And I'm interested in how you guys think about LLMs, which are the best, Josh: how to benchmark them, and how you route people through them.
Josh: Is there a specific... Do you believe that benchmarks are accurate, Josh: and do you reflect those in the way that you route traffic through these models? Alex: In general, we have taken the stance that we want to be the capitalist benchmark for models. Alex: What is actually happening? Alex: And part of this is that I really think both the law of large numbers and the Alex: enthusiasm of power users are really, really valuable for everybody else.
Alex: Like when you're routing to Alex: um like clod in Alex: let's say you're routing to clod 4 and you're Alex: based in europe um there you Alex: know all of a sudden there might be like a huge variance in in throughput from Alex: one of the providers and you're only able to detect that if like some other Alex: users have discovered it before you and so we route around the provider that's Alex: like running kind of slow in Europe and send you, Alex: if your data policies allow it,
Alex: to a much faster provider somewhere else. Alex: And that allows you to get faster performance. So, like, um... Alex: That's, like, on the provider level, how, like, numbers help. Alex: On the, like, model selection level, like, what you see on this rankings page Alex: here, power users will, like, when we put up a model, like, we put up a new Alex: model today from a new model lab called ZAI, Alex: like, the power users instantly discover it.
Alex: We have this LLM enthusiast community that dives in and really figures out what Alex: a model is good for along a bunch of core use cases. Alex: The power users figure out which workloads are interesting, and then you can Alex: just see in the data what they're doing. And everybody can benefit from it.
¶ Benchmarking and Performance Metrics
Alex: That's why we open up our data and share it for free on the rankings page here. Ejaaz: I'm seeing this one consistent unit across all these rankings, Ejaaz: Alex, which is tokens, right? Ejaaz: And Josh and I have spoken about this on the show before, but I'm wondering Ejaaz: how, like you've chosen this specific unit to measure how good or effective Ejaaz: these models are or how consumed or used they are.
Ejaaz: Can you tell us a bit more as to why you picked this particular unit and what Ejaaz: that tells you as like the open router platform as to how a user is using a particular model? Alex: Yeah, I think dollars is a good metric too. Alex: The reason we chose tokens is primarily because we were seeing prices come down really quickly. Alex: Open Router has been around since the beginning of 2023.
Alex: And I didn't want a model to be penalized in the rankings just because the prices Alex: are going down really dramatically now like there's a, Alex: There's a paradox called Jevons paradox, which is that when prices decrease like 10x, Alex: users' use of some component of infrastructure increases by more than 10x. Alex: And so maybe they didn't get 10x at all. Alex: But I thought there were some other advantages to using tokens,
Alex: too. Tokens don't have this penalty and don't rely on Jevon's Paradox, Alex: which can have a lot of lag. Alex: They also are a little bit of a proxy for time. Alex: A model that is generating a lot of tokens and doing so for a while across a lot of users. Alex: It means that a lot of people are reading those tokens and actually doing something with them.
Alex: And same goes for input. But if I really want to send an enormous number of Alex: documents and the model has a really, really, really tiny prompt pricing, Alex: I think that's still valuable and something that we want to see. Alex: We want to see that this model is processing an enormous number of documents. Alex: That's a use case that should show up in the rankings.
¶ The Importance of Token Metrics
Alex: And so we decided to go with tokens. We might like add dollars in the future, Alex: but I think tokens are, you know, they don't have this like Jevons Paradox lag. Alex: And there wasn't anything else. Like nobody was doing any kind of like overall analytics. Alex: We didn't see any other company even do it until Google did a few months ago Alex: where they started publishing the total amount of tokens processed by Gemini. Alex: So we'll see which use cases really need dollars.
Alex: But tokens have been holding up pretty well. Ejaaz: Yeah, I mean, this dashboard is awesome. And I recommend anyone that's listening Ejaaz: to this that can't see our screen to get on OpenRouter's website and check it out. Ejaaz: I've been following it for the last two weeks kind of pretty rigorously, Alex. Ejaaz: And what I love is you can literally see...
Ejaaz: So two weeks ago Grok 4 got released right Ejaaz: and Josh and I were making a ton of videos on this we were Ejaaz: using it with pretty much everything that we could do and Ejaaz: then this other model came out of China pretty much a few days after called Ejaaz: Kimi K2 and I was like oh yeah whatever this is just some random Chinese model Ejaaz: I'm not going to focus on it and then I kept seeing it in my feed and I thought
Ejaaz: okay maybe I'll give this a go and I kind of like went straight to open rather than just Ejaaz: almost gauge the interest from a wider set of AI users. And I saw that it was skyrocketing, right? Ejaaz: And then I saw that Quen dropped their models last week. Ejaaz: And again, I came to Open Router and it preceded the trend, right?
Ejaaz: People had already started using it. So I love how you describe Open Router Ejaaz: as this kind of like prophetic orb, Ejaaz: basically, where the enthusiasts and the community itself can kind of like front Ejaaz: run very popular trends. And I think that's a very powerful moat. Ejaaz: And kind of on this path, Alex, I noticed that a lot of these major model providers Ejaaz: see the value in this, right?
Ejaaz: So if I'm not mistaken, OpenAI kind of like used your platform to kind of secretly Ejaaz: launch their Frontier model before they officially launched it, right? Ejaaz: Can you walk us through, you know, how that comes about and more importantly, Ejaaz: why they want to do that and why they chose OpenRoddy to do that?
Alex: Uh open ai will sometimes Alex: give uh early access Alex: to their to models to some of their customers for Alex: testing and we asked them if they Alex: wanted to try a stealth model with us which we had never done before um it involved Alex: like launching it as under another name and seeing how users respond to it without Alex: having any bias or sort of inclination for against the model at the onset. Alex: And it would be like a new way of testing it and a new way of...
Alex: It was like an experiment for both us and them. Alex: And they generously decided to take the leap of faith and try it. And we...
Alex: Launched gpt 4.1 with Alex: them at and we called it quasar alpha and Alex: it was a million uh Alex: token context length model opening us first very Alex: very long context model and it was also optimized Alex: for coding and the incredible Alex: there were a couple incredible things that happened first Alex: we have this community uh of benchmarkers Alex: that run open source benchmarks and we give Alex: a lot of them grants to help fund the benchmarks
Alex: grants of open router tokens they'll just run the Alex: suite of tests against all the models and some of them are very creative like Alex: there's one that tests uh like the ability to generate fiction there's one that Alex: tests um like how like whether it can make a 3d object project in Minecraft called MCBench. Alex: There are a few that test different types of coding proficiency.
Alex: There's one that just focuses on how good it is at Ruby, because Ruby is, Alex: turns out a lot of the models are not great at Ruby. Alex: There are a lot of like languages that all the models are pretty bad at. Alex: And so we have this like long tail of very niche benchmarks, Alex: And all the benchmarkers ran, you know, for free their benchmarks on Quasar Alex: Alpha and found pretty incredible results for most of them.
Alex: And so the model got like, you know, OpenAI got this feedback in real time. Alex: We kind of like helped them find it.
¶ Collaborations with Major AI Players
Alex: And they made another snapshot, which we launched as Optimus Alpha. Alex: And they could compare the feedback that they got from the two snapshots. Alex: Um, and, and then they, and then like two weeks later, they launched GPT 4.1 live for everybody. Alex: So it was like, uh, uh, was it an experiment for us? Alex: And, and we've done it, um, again since, uh, with, uh, another model provider Alex: that, uh, that's still working on it.
Alex: Um, and it, and it's kind of like a cool way of learning of like crowdsourcing, Alex: uh, benchmarks that you wouldn't have expected. and also getting unbiased community sentiment. Josh: That's great. So now when we see a new model pop up and we want to test GPT-5, Josh: we know where to come to to try it early. Josh: We'll see because rumor is it's coming soon. So we'll be, we're on your watch list.
¶ Open Source vs. Closed Source Models
Josh: But having, I do want to ask you about open source versus closed source because Josh: this has been an important thing for us. We talk about this a lot. Josh: You have a ton of data on this. Josh: I'm looking at the leaderboards there. There are open source models that are Josh: doing very well, closed source. Josh: What are your takes in general? How do you feel about open source versus closed Josh: source models, particularly around how you serve them to users?
Alex: Both models, both types of models have supply problems, but the supply problems are very different. Alex: Typically, what we see with closed source models is that there's there's very Alex: few suppliers, usually just one or two. Alex: Like with Grok, for example, there's Grok Direct and there's Azure.
Alex: Um with anthropic there's anthropic direct there's google vertex there's aws Alex: bedrock um and then we also like deploy it in different regions like we have Alex: an eu deployment um for customers who'd like only want their data like to stay in the eu, Alex: and uh and we do custom deployments for Alex: the for the closed source models too to just kind of guarantee good Alex: throughput high and high rate limits for people um Alex: but uh the
Alex: like a tricky part is that like the the demand usually the like the closed source Alex: malls are doing most of the tokens on open router um it's it's dominant you Alex: know it's probably 80-ish 70 to 80 percent closed source tokens today. Alex: But the open source models have a much more fragmented supply, like cell supply.
Alex: Side order book um and and like Alex: the rate limits for each provider is Alex: like a like less stable on average um it Alex: usually takes a while for the hyperscalers to serve a Alex: new closed source a new open source model um so we so the load balancing work Alex: that we do on um open source models tends to be a lot more valuable the load Alex: balancing work that we do for closed source models tends to be very focused Alex: on caching and feature awareness,
Alex: making sure you're getting clean cache hits and only transitioning over to new Alex: providers when your cache is expired. Alex: For open source models, there's way less caching. Very, very few open source Alex: models implement caching.
Alex: And so switching between providers becomes more common. and Alex: uh like we we also track a Alex: lot of quality differences between the the open Alex: source providers some of them will deploy at lower Alex: quantization levels which means like it's kind of like a way of compressing Alex: the model um generally doesn't have an impact on the quality of the output uh Alex: but and yet we still see some odd things from some of the open source providers.
Alex: And so we run tests internally to detect those outputs. And we're building up Alex: a lot more muscle here soon. Alex: So that like, they get pulled out of the routing lane and don't affect anyone. Josh: So closed source accounts for 80% or something like that, a very large amount. Josh: Do you see that changing? Josh: Because that post we just had, it's at nine out of the 10 fastest growing LLMs Josh: last week, they were open source.
Josh: And every time it seems like China comes out with another model, Josh: it was Kimmy K2 a week or two ago, it kind of really pushes the frontier of open source forward. Josh: And the rate of acceleration of open source seems to be as fast, Josh: if not faster than closed source, where it's just, it's making these improvements very quickly. Josh: It has the benefit of being able to compound in speed because it's open source Josh: and everyone can contribute.
Josh: Do you think that starts to change where the percentage of tokens you're issuing Josh: are from open source models versus closed source? Josh: Or do you continue to see a trend where it's going to be Google, Josh: it's going to be OpenAI that are serving a majority of these tokens to users?
¶ Future Trends in Model Adoption
Alex: In the short term, we're likely to see open source models continue to dominate Alex: the fastest growing model category on OpenRouter.
Alex: And the reason for that is that a lot of users who come for a closed source Alex: model, but then decide they want to optimize later, Alex: either they want to save on costs or try out a new model that's supposed to Alex: be a little bit better in some direction that their app cares about or their use case cares about, Alex: then they leave the closed source model and go to an open source model.
Alex: So open source tends to be like a last mile optimization thing, Alex: making a big generalization because the reverse can happen too. Alex: And so because it's a last mile optimization thing, Alex: the jump from this model is not being used at all to this model is really being Alex: used by a couple of people who have Alex: left Claude 4 and want to try some new coding use case will be bigger.
Alex: Than the closed-source models, which start at a really high base and don't have Alex: growth quite as dramatic. Alex: So the other part of your question, though, was whether there's going to be like a flippening of. Josh: Close or some sort of like chipping it away at that monopoly of close source tokens.
Alex: It's hard to predict these things because, you know, Alex: I think like the the biggest problem today with open source models is that the Alex: incentives are not as strong like the model lab and the model provider.
Alex: Um they've you know they're they're Alex: sort of established incentives for how to Alex: grow as a company and attract good high quality um ai talent and um and giving Alex: the model weights away impairs those incentives now like we might see yeah this Alex: is where we might see like decentralized providers, Alex: helping in the future.
Alex: A way for like, Alex: uh you know like a really good incentive scheme that Alex: like allows high quality talent Alex: to work on an open source model um Alex: that remains open weights at least uh like could fix this i like i you know Alex: i stay pretty i try to stay close to the decentralized providers um and like Alex: learn a lot from them there's some like cool on the provider side on like on
Alex: running inference i I think there's some really cool incentive schemes being worked on. Alex: But on actually developing the models themselves, I haven't seen too much, unfortunately. Alex: So I think if we see one, flipping in the radar. And until we do, I personally doubt it. Josh: TBD, do you have personal takes on how you feel about open source versus closed source?
Josh: Because this has been a huge topic we've been debating too. It's just the ethical Josh: concerns around alignment and closed source models versus open source. Josh: When you look at the competitors, China, generally speaking, Josh: is associated with open source, whereas the United States is generally associated with closed source.
Josh: And we saw Llama and Meta release the open source models, but now they're raising Josh: a ton of money to pay a lot of employees a lot of money to probably develop a closed source model. Josh: So it seems like the trends are kind of split between US and China.
Josh: And I'm curious if you have any personal takes, even outside of OpenRouter, Josh: of which you think serves better for the long term outlook on, Josh: I mean, the position of the United States or just the general safety and alignment Josh: conversation around AI?
¶ The Role of Innovation in AI
Alex: I mean, like a very simple fundamental difference between the two is that an Alex: innovation in open source models can be copied more quickly than an innovation Alex: in closed source models. Alex: So in terms of velocity and like how far ahead one is over the other, Alex: that is like a massive structural difference. Alex: That means that closed source models should be theoretically always ahead until Alex: a really interesting incentive scheme develops, like I mentioned before.
Alex: Uh, I think, and I think that's, you know, I don't see like evidence that that's Alex: going to change in terms of China versus the U S. Alex: Um, it's, I think it's very interesting that China has not had like a major closed source model.
Alex: Um and i don't really Alex: see a great reason why i'm Alex: not aware of any reasons that's not that's not going Alex: to be going to be the case in the future um my prediction Alex: is that there's going to be a closed source model from china um Alex: and uh you know if uh uh you know if like it's possible that DeepSeas and Moonshot Alex: and Gwen have built up really sticky talent pools.
Alex: But generally with talent pools, after enough years have passed, Alex: people quit and go and create new companies and build new talent pools. Alex: And so we should see some of that. It's not the case that the AI space has NDAs Alex: or non-competes that the hedge fund space has. Alex: That might happen in the future too. But assuming that the current non-compete Alex: culture continues, there should be more companies that pop up in China over time.
Alex: And I'm betting that some of them will be closed source. Alex: And my guess is that the two nations will start to look more similar. Ejaaz: Yeah, I guess that's why you have Zuck dishing out 300 mil to a billion dollar Ejaaz: salary offers to a bunch of these guys, right? Ejaaz: One more question on China versus the US. I kind of agree with you. Ejaaz: I didn't really expect China to be the one to lead open source anything, Ejaaz: let alone the most important technology of our time.
Ejaaz: Do you think is their secret source to building these models, Alex?
Ejaaz: And I know this might be out of the forte of Ejaaz: open router specifically but as someone who has studied this technology for Ejaaz: a while now i'm struggling to figure out you know what advantage they had you Ejaaz: know they're discovering all these new techniques and maybe the simple answer Ejaaz: is like constraints right they don't have access to all of Ejaaz: nvidia's chips they don't have access to infinite compute so then maybe they're
Ejaaz: forced to kind of like figure out other ways around the same kinds of problems Ejaaz: that western companies are focused on But it's pretty clear that America, with all its funding, Ejaaz: hasn't been able to make these frontier breakthroughs.
¶ Comparing Global AI Talent
Ejaaz: So I'm curious whether you are aware of or know some kind of technical moat Ejaaz: that Chinese AI researchers or these AI teams that are featuring on Open Rata Ejaaz: day in and day out have over the U.S.? Alex: Well, I don't know. Alex: There are certainly some that they've come up with that like DeepSeek had a Alex: lot of very cool inference innovations that they published in their paper.
Alex: But a lot of what they published in the original R1 paper were things that OpenAI Alex: had done independently themselves many months before.
Alex: So uh i like Alex: on the inference side and on Alex: uh some of the model side i think like deep seek we we Alex: had talked to their team for years before r1 came Alex: out they had many models before that and Alex: they were always like a pretty sharp optimum like Alex: team for doing inference um like they Alex: came up with like the best user experience for caching prompts Alex: long before deep cpr1 came out and they had very good pricing um they uh they
Alex: were just they were like you know by far the the strongest chinese team um that Alex: we were aware of uh well before that happened and so i'm guessing there was like some talent.
Alex: Uh accumulation that they were working on in china Alex: for people who wanted to stay in china and yeah that's Alex: that's a huge advantage like american companies are obviously not Alex: doing that there's a duck is very on Alex: point that a lot of this is just based on talent Alex: um there are a lot of Alex: ai is open and out there and just like and Alex: very composable like a big tree of knowledge Alex: there's a paper that comes out and it cites like
Alex: 20 other papers and you can go and read all Alex: of the cited papers and then you like have kind of Alex: a basis for understanding the paper but you really have to Alex: go one level deeper and read all the cited papers two levels Alex: down to really understand what's going on and it's.
Alex: Just that no very few people can do that um and Alex: it takes like a lot of years of experience to like actually Alex: apply that knowledge and learn all these Alex: things that have not been written in any paper at all and uh Alex: and there's just there's just such such it Alex: like a small number of people um who can Alex: really lead research on all the different dimensions that Alex: go on to making a model and uh um and Alex: and like the the border between china and the u.s is
Alex: is pretty defined you have to leave china move to the u.s Alex: and really establish yourself here um so Alex: i do think there's like country arbitrage there's like Alex: there's you know the head the hedge fund background arbitrage there's uh there's Alex: there's hardware arbitrage like there's like a ton of hardware that's only available Alex: in china but not here vice versa that creates an opportunity um and this this Alex: will just continue to happen.
Ejaaz: Yeah, I think this arbitrage is fascinating. Ejaaz: I read somewhere that there's probably less than 200 or 250 researchers in the Ejaaz: world that are worthy of working at some of these frontier AI model labs. Ejaaz: And I looked into some of the backgrounds of the team behind Kimi K2, Ejaaz: which is this recent open source model out of China, which broke all these crazy rankings. Ejaaz: I think it was like a trillion parameter model or something crazy like that.
Ejaaz: And a lot of them worked at some of the top American tech companies. Ejaaz: And they all graduated from this one university in China. Ejaaz: I think it's Tsinghua, which apparently is like, you know, the Harvard of AI Ejaaz: in China, right? So pretty crazy. Ejaaz: But Alex, I wanted to shift the focus of the conversation to a point that you Ejaaz: brought up earlier in this episode, which is around data.
¶ Data Utilization Strategies
Ejaaz: Okay, so here's the context that like Josh and I have spoken about this at length, right? Ejaaz: We are obsessed with this feature on OpenAI, which is memory, right? Ejaaz: And I know a lot of the other memory, sorry, a lot of the other AI models have memory as well. Ejaaz: But the reason why we love it so much is I feel like the model knows me, Alex. Ejaaz: I feel like it knows everything about me. It can personally curate any of my prompt.
Ejaaz: It just gets me. It knows what I want and it just serves up to me in a platter Ejaaz: and off I go, you know, doing my thing. Ejaaz: Now, Open Router sits on top of like kind of like the query layer, right? Ejaaz: So you have all these people writing all these weird and wonderful prompts and Ejaaz: kind of routing it through on towards like different AI models. Ejaaz: You hold all of that data or maybe you have access to all of that data.
Ejaaz: And I know you have something called private chat as well, where you don't have access to it. Ejaaz: Talk to me about like what OpenRouter and what you guys are thinking about doing Ejaaz: with this data, because presumably, Ejaaz: or in my opinion, you guys have actually the best mode, arguably better than Ejaaz: ChatGPT, because you have all these different types of prompts coming from all Ejaaz: these different types of users for all these different types of models.
Ejaaz: So theoretically, you could spin up some of the most personal AI models for Ejaaz: each individual user if you wanted to. Ejaaz: Do I have that correct? Or am I, you know, speaking crazy? Alex: No, that's true. No, it's something we're thinking about. Alex: By default, your prompts are not logged at all. Alex: We don't have prompts or completions for new users by default. Alex: You have to toggle it on in settings.
Alex: But the result, a lot of people do toggle it on. And as a result, Alex: I think we have by far the largest multi-model prompt data set. Alex: Uh, but what we've done today, we've barely done anything with it. Alex: We classify a tiny, tiny, tiny subset of it. And that's what you see in the rankings page. Alex: Um, but, uh, what it could be done on like a per account level is really, Alex: um, like three main things.
Alex: One memory right out of the box. You can, you can get this today by like combining Alex: open router with like a memory as a service. We've got a couple of companies Alex: that do this, like Memzero and SuperMemory. Alex: And we can partner with one of those companies or do something similar and just Alex: provide a lot of distribution.
Alex: And that basically gets you a chat GPT as a service where it feels like the Alex: model really knows you and the right context gets added to your prompt. Alex: The other things that we can do are help you select the right model more intelligently. Alex: There's a lot of models where there's like a super clear, like migration decision that needs to be made. Alex: And, and we can just see this very clearly in the data.
Alex: But we right now we just like, you know, we have like a channel or like some Alex: kind of communication channel open with the customer, we can just tell them Alex: like, hey, and we know you're using this model a ton.
Alex: It's been deprecated. This model is significantly better. you Alex: should move this kind of workload over to it or like Alex: this workload you'll get way better pricing if you do this um Alex: and and that's basically like that's the Alex: only sort of guidance and kind of like Alex: opinionated routing we've done so far and it could Alex: be a lot more intelligent a lot more out of the box a lot more Alex: built into the product um and then Alex: the the last thing
Alex: we can do i mean there's there's probably tons of Alex: things we're not even thinking about um but Alex: like getting really Alex: really smart about how Alex: models and providers are responding to prompts and Alex: uh showing you just the really coolest Alex: data just like telling you Alex: what kinds of of prompts um are Alex: going to which models and how those models are replying and Alex: just like characterizing the reply in all kinds of interesting ways
Alex: like did the model refuse to answer what's the refusal rate Alex: did the model um did the.
Alex: Model like successfully make a tool call or did it decide to Alex: ignore all the tools that you passed in that's a huge one Alex: um did the model like pay Alex: attention to its context did uh you know did what did did some kind of truncation Alex: happening happen before you sent it to the model So there's all kinds of like Alex: edge cases that cause developers apps to just get dumber and they're all detectable.
Ejaaz: I'm so happy you said that because I have this kind of like hot take, Ejaaz: but maybe not so hot take, which is I actually think all the Frontier models Ejaaz: right now are good enough to do the craziest stuff ever for each user. Ejaaz: But we just haven't been able to unlock it because it just doesn't have the context.
Ejaaz: Sure, you can attach it to a bunch of different tools and stuff, Ejaaz: but if it doesn't know when to use the tool or how to process a certain prompt Ejaaz: or if the users themselves don't know how to read Ejaaz: the output of the AI model themselves, like you just said, we need some kind Ejaaz: of analytics into all of this, Ejaaz: then we're just kind of walking around like headless chickens almost.
Ejaaz: So I'm really happy that you said that. One other thing that I wanted to get Ejaaz: your take on on the data side of things is, I just think this whole concept Ejaaz: or notion of AI agents is becoming such a big trend, Alex. Ejaaz: And I noticed a lot of Frontier Model Labs release new models that kind of spin Ejaaz: up several instances of their AI model. Ejaaz: And they're tasked with a specific role, right?
Ejaaz: Okay, you're going to do the research. You're going to do the orchestrating. Ejaaz: You're going to look online via a browser, blah, blah, blah, Ejaaz: blah, blah. And then they coalesce together at the end of that little search Ejaaz: and refine their answer and then present it to someone, right? Ejaaz: You know, Grok4 does this, Claude does this, and a few other models.
Ejaaz: I feel like with this data that you're describing, OpenRouter could be or could Ejaaz: offer that as a feature, right? Ejaaz: Which is essentially, you can now have super intuitive, context-rich agents Ejaaz: that can do a lot more than just talk to you or answer your prompts. Ejaaz: But they could probably do a bunch of other actions for you. Ejaaz: Is that a fair take, or is that something that maybe might be out of the realm of open router?
¶ Future of AI Agents
Alex: Our strategy is to be the best inference layer for agents. Alex: And what I think developers want... Alex: Is control over how their agents work. Alex: And our developers at least want to use us as a single pane of glass for doing Alex: inference, but they want to see and control the way an agent looks. Alex: An agent is basically just something Alex: that is doing inference in a loop and controlling the direction it goes.
Alex: So um what what Alex: we want to do is just like build incredible docs Alex: really good primitives that make that easy Alex: to do so that you know like i think like Alex: a lot of our developers are just people building agents and so Alex: what they want is they want the primitives to Alex: be solved so that they can just keep creating new Alex: versions and new ideas um without worrying Alex: about like you know re-implementing tool calling over Alex: and over again and um and and
Alex: and so like at least for this is like a it's it's Alex: a tough problem given how many models there's like a new model or provider every Alex: day and uh and people actually want them and use them so uh to standardize this Alex: like make make these tools like really dependable um that's kind of like where Alex: we want to focus and uh so that like agent developers don't have to worry about it.
Josh: As we level up towards closer and closer to getting to AGI beyond, Josh: I'm curious what Open Router's kind of endgame is. Josh: If you have one, what is the master plan where you hope to end up? Josh: Because the assumption is as these systems get more intelligent, Josh: as they're able to kind of make their own decisions and choose their own tool Josh: sets, what role does Open Router play in continuing to route that data through?
Josh: Do you have a kind of master plan, a grand vision of where you see this all heading to? Alex: You're saying like as agents get better at choosing the tools that they use Alex: what what becomes our role when like the agents are really good at that yes.
Josh: Yes and like where do you see open router fitting into the picture and what Josh: would be the best case scenario for this this future of open router Alex: Right now open routers bring your own tool, Alex: platform um we don't have like a Alex: marketplace of mcps yet uh and Alex: and i i do think like a lot of the i think most of the most used tools will Alex: be ones that developers configure themselves agents just work like they're given
Alex: access to it like i think like a holy grail for for open router is that.
Alex: The the ecosystem is going to like basically my Alex: prediction for how the ecosystem is going to evolve is that um Alex: all the models are going to be adding state and Alex: other kinds of stickiness that just make you want to stick Alex: with them so they're going to add server-side tool calls Alex: they're going to add like um you know web search that that is stateful they're Alex: going to add memory They're going to add all kinds of things that try to prevent
Alex: developers from leaving and increase lock-in. Alex: And OpenRouter is doing the opposite. Alex: We want developers to not feel vendor lock-in. Alex: We want them to feel like they have choice and they can use the best intelligence, Alex: even if they didn't before. Alex: It's never too late to switch to a more intelligent model. That would be like, Alex: you know, a good always on outcome for us.
Alex: And so what I think we'll end up doing is, is like partnering with other companies Alex: or building the tools ourselves if we have to, so that developers don't feel stuck.
¶ OpenRouter's Vision for the Future
Alex: That's how I, you know, there's a lot of ways the ecosystem could evolve, Alex: but that's how I would put it in a nutshell. Josh: Okay, now there's another personal question that I was really curious about, Josh: because I was also right there with you in the crypto cycle when NFTs got absolutely Josh: huge, was a big user of OpenSea. Josh: And it was kind of this trend that went up and then went down.
Josh: And NFTs kind of fizzled out, it wasn't as hot anymore, and AI kind of took the wind from the sails. Josh: And it's a completely separate audience, but a similar thing where now it's Josh: the hottest thing in the world.
Josh: And i'm curious how you see the trend continuing is this a cyclical thing that Josh: has ups and downs or is this a one-way trajectory of more tokens every day more Josh: ai every day is do you see it being a cyclical thing or is this a a one-way Josh: trend towards up into the right nfts Alex: Kind of follow uh crypto in a, Alex: indirect way um when crypto Alex: has ups and downs nfts generally lag a bit Alex: but they they have similar ups and downs and um
Alex: and crypto is an extremely long-term play on like building a new financial system Alex: and there are so many reasons that it's not going to happen overnight um and And they're like, Alex: it's very, very entrenched reasons. Alex: Whereas AI, there are some overnight business transformations going on. Alex: And the reason AI, I think, moves a lot, one of the reasons that AI moves a Alex: lot faster is it's just about making computers behave more like humans.
Alex: So if a company already works with a bunch of humans, then there's, Alex: you know, there's some engineering that needs to be done.
Alex: There's some like thinking about how Alex: to like scale this but Alex: but in general i think that it's not like Alex: after seeing what can be possible um inference Alex: will be the fastest growing operating expense for all companies Alex: it'll it'll be like oh we can just hire Alex: high-performing employees at a click of a Alex: and they they work 24 7 they Alex: scale elastically it's like you know Alex: it it's not that hard it's not like huge mental
Alex: model shift it's just like a huge upgrade to the way companies work today um Alex: in most cases so it's just completely different from crypto there's there's Alex: like other than both being you know than nfts i mean other than both being new Alex: they're fundamentally very different changes.
¶ Trends in AI and NFTs
Ejaaz: You're probably one of very few people in the world right now that has crazy Ejaaz: insights to every single AI model. Ejaaz: Definitely more than the average user, right? Like I have like three or four Ejaaz: subscriptions right now and I think I'm a hotshot. Ejaaz: You get access to like 400 and what is it? 57 models right now on OpenRata.
Ejaaz: So an obvious question that I have for you is Ejaaz: I'm not going to say in the next couple of years, because everything moves way Ejaaz: too quickly in this sector. Ejaaz: But over the next six months, is there anything really obvious to you that should Ejaaz: be focused on within the AI sector? Ejaaz: Maybe it's like the way that certain models should be designed, Ejaaz: or perhaps it's at the application layer that no one's talking about right now.
Ejaaz: Because going on from our earlier part of the conversation, you just pick these Ejaaz: trends out really early. and I'm wondering if you see anything. Ejaaz: It doesn't have to be open-racket related. It could just be AI related. Alex: I've seen the models trending towards caring more about how resourceful they Alex: are than what knowledge they have in the bank.
Alex: Not all of, I feel like a lot of the applications, I think the model labs maybe, Alex: a lot of them, I don't know how many of them really deeply believe that, Alex: but a couple of them uh talk about it and i don't think it's really hit the Alex: application space yet um because people will will ask chat gpt things and if Alex: the knowledge is wrong they think the model is stupid, Alex: and that's just kind of a bad way of evaluating a model um
Alex: like whatever knowledge a person has whatever Alex: a person like where calls happen at a certain time like Alex: does not it's not a proxy for how smart they are um Alex: like the the intelligence and usefulness of a model Alex: is going to trend towards how good it is at using tools and Alex: uh and and how good it is at like paying Alex: attention to its context of a long long long long context and so it's like it's
Alex: it's total memory capacity and accuracy um so i think those two things need Alex: to be like emphasized more um the.
Alex: Like it might be that that models pull all Alex: of their knowledge from like online databases Alex: from like real-time uh scraped Alex: index indices of the web along with a Alex: ton of real-time updating data sources um and Alex: they're never they're always kind of like relying on some some sort of database Alex: for knowledge but relying on their reasoning process for for tool calling you Alex: know like we we put it We spend probably the plurality of our time every week
Alex: on tool calling and figuring out how to make it work really well. Alex: Humans, the big difference between us and animals is that we're tool users and tool builders. Alex: And that's where human acceleration and innovation has happened. Alex: So how do we get models creating tools and using tools very, Alex: very effectively? there's very little, Alex: There are very few benchmarks. There's very little priority. Alex: There's the Tau Bench for measuring how good a model is at tool calling.
Alex: But there's, and there's like maybe a few others. Alex: There's Swee Bench for measuring how good a model is at multi-turn programming tasks. Alex: It's very, very hard to run, though. It costs like, you know, Alex: for Sonnet, it could cost like $1,000 to run it. Alex: And it's like the user experience for kind of like evaluating the real intelligence Alex: of these models is not good.
Alex: And so like I love, as much as we don't have benchmarks listed on OpenRouter Alex: today, I love benchmarks. Alex: And I think like the app ecosystem and like developer ecosystem should spend Alex: a lot more time making very cool and interesting ones. Alex: Also, we will give credit grants for all the best ones. So I highly encourage it. Ejaaz: Well, Alex, thank you for your time today. I think we're coming up on a close Ejaaz: now. That was a fascinating conversation, man.
Ejaaz: And I think your entire journey from just non-AI stuff, so OpenSea all the way Ejaaz: to OpenRouter has just been a great indicator of where these technologies are Ejaaz: progressing and more importantly, where we're going to end up. Ejaaz: I'm incredibly excited to see where OpenRatter goes beyond just prompt routing. Ejaaz: I think some of the stuff you spoke about on the data side of things is going
Ejaaz: to be fascinating and arguably one of your bigger features. So I'm excited for future releases. Ejaaz: And as Josh said earlier, if GPT-5 is releasing through your platform first, Ejaaz: please give us some credits. We would love to use it. Ejaaz: But for the listeners of this show, as you know, we're trying to bring on the Ejaaz: most interesting people to chat about AI and Frontier Tech. We hope you enjoyed this episode.
Ejaaz: And as always, please like, subscribe, and share it with any of your friends Ejaaz: who would find this interesting. And we'll see you on the next one. Thanks, folks.
