Building 2 Iconic OSSs Back-to-Back | Maxime Beauchemin (Airflow, Preset) - podcast episode cover

Building 2 Iconic OSSs Back-to-Back | Maxime Beauchemin (Airflow, Preset)

May 21, 202459 minEp. 38
--:--
--:--
Listen in podcast apps:

Episode description

If you’ve worked on data problems, you probably have heard of Airflow and Superset, two powerful tools that have cemented their place in the data ecosystem. Building successful open-source software is no easy feat, and even fewer engineers have done this back to back. In part 2 of the conversation, we talk about Max’s journey in open source.

Segments:

   (00:03:27) “Project-Community Fit” in Open Source
   (00:08:31) Fostering Relationships in Open Source
   (00:10:58) Dealing with Trolls
   (00:13:40) Attributes of Good Open Source Contributors
   (00:20:01) How to Get Started with Contributing
   (00:27:58) Origin Stories of Airflow and Superset
   (00:33:27) Biggest Surprise since Founding a VC-backed Company?
   (00:38:47) Picking What to Work On
   (00:41:46) Advice to Engineers for Building the Next Airflow/Superset?
   (00:42:35) The 2 New Open Source Projects that Max is Starting
   (00:52:10) Challenges of Being a Founder
   (00:57:38) Open Sourcing Ideas

Show Notes:

Part 1 of our conversation: https://softwaremisadventures.com/p/maxime-beauchemin-llm-ready
Max on LinkedIn: https://www.linkedin.com/in/maximebeauchemin/
SQL All Stars: https://github.com/preset-io/allstars
Governator: https://github.com/mistercrunch/governator

Stay in touch:

👋 Make Ronak’s day by leaving us a review and let us know who we should talk to next! [email protected]

Transcript

What do you think are the attributes that make a really good 12-pin source contributor? One thing that I think is extremely undervalued in software engineering in general and technical position is just code orientation. I just feel like I'm both to find your way in a large code base. Because you end up on these messy big repos that are layered with stuff, right? You join a company like Facebook, there's two big monoripos with a hundred different ways of doing things.

Micro-Services, similarly, you want to contribute to Airflow, SuperSet, or really any open-source project out there. Maybe you want to change the color of a button, where is that button? Or you want to add a feel to a dashboard that's the common data engineering problem. So they're disinterested in. We get this data from HubSpot, and there's a dashboard, and then we've got to carry this new custom property all the way through the pipeline to the end.

What is that pipeline? Where is this dashboard pointing to? So I think code orientation will be able to find your way in a larger repository. And figure out what to do, and how to decipher everything that's been done before and why is extremely key. For maybe someone starting, are there easier projects to start with? Maybe that's not in Rust.

Yeah, well, so I think it's all relative to where you're starting from, but I think the purpose is what's most important to be pip install library. It doesn't quite do what you need it to do, or you find a bug, or it'd be nice if this method existed. To go and then start with just scratching your edge and someone else's repository is a really nice way to get involved, because it will force this little bit of exploration I was talking about, developing the code navigation skill.

Okay, I pip install this library, reading the documentation, the method I need is not there, the documentation is not clear on something. Let me get to the bottom of this. Then you have to get that repo and figure out your orientation, that repo, clone it, contribute something, and track with someone. And I don't know, maybe eventually make a friend along the way. I'm sure there were a lot of pros through Airflow and Greece that, like, how did you go about handling problem children?

Welcome to the software and misadventures podcast. We are your hosts, Ronuk and Guan. As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they've learned, and of course, the misadventures along the way.

Talking about the open search projects, now you've built successfully a lot of open search projects. I mean, I think at least that's why there's super certain Airflow or concern. They've become more or less the defect or choices in their own domains. Like, I know a lot of companies using Airflow for workflow management, for example, what used to be Askaban back in I think 2011, 12 is completely the place. I'm going to say Luigi, but we are doing it. Is he dating yourself there? Yeah, I know.

So what I meant to say is, you started these open search projects, which have become super successful, and you've done it more than once. What are some of the common ingredients there? Yeah, I mean, it's a tough question. I think, like, there's a lot of parallels to be drawn with, you know, so I call it like project community fit, is like product market fit, and product market fit is, you know,

multi-dimensional, a very complicated thing. They're just like timing, like, you know, what's what's the say them, I'll be talking about product market fit. I think there's, and then we can translate some of these learnings to project community fit. I got like a fit of an open source project, but in PMF, you know, I think you have timing is a huge thing. Like, what is the market? How ripe is it?

I was, I'll rip is it to be disrupted in a way. What's the minimum viable product? Right? Like, and what's the new, what's the tam of that market? How much can you put towards it? So at first, you need clearly something that there's a need for and you need to address and unaddressed needs somewhere out there. Like for air flow is probably there's no really good, or like data orchestrator with like, you know, data pipelines as code.

Though like, arguably there was some before, like, I think, Uzi was like XML, you mentioned like Luigi, Askabon, and then there's a whole generation of like gooey driven tools like Informatica and data stage and other things.

But there's that there's also like, you know, kind of the founder, the founder fit where, you know, like, like, I had been in the engineering for probably 10, 10, 15 years at the time, or like in the ancestor of the engineering and then at the place that was pushing a big change in that area.

So I think I was like, the right person at the right time, the right ideas, where those stars, you know, you can't really make them align. They just align, like, you know, I was, because I had been in and kind of sitting through some of that.

So I think there's some of that that you can't recreate. But then in terms like, once you have the project started in some form of like MVP and a little bit attraction, I think it's really about, you know, it's like, you really grow the project of one interaction at a time with people in the community, one issue, one PR at a time.

And then, yeah, and then you just got to iterate like crazy and put, you know, passion and effort into it and then build, if you want stuff to snowball, you need to be welcoming to a certain point. So it's pretty subtle exercise, I guess, right? Like in some ways, and if you're too flexible to, that could be a risk, right? Like it's so often talked about the BDFL, the Benavill Intactator for life and open search projects.

You need to be hard like you look at Linus, the Linus Store Vault and the Linux project. It's sometimes been like really very hard on people. It's like, this is a bad idea. This idea should, we shouldn't do this.

There's always a, yeah, so I think I've always done it like super respectfully. I think that's super important, but you need a clear direction leadership and keep the fluff and the nonsense out of the project or at least to keep the mission envisioned for the project, the scope, the project, you know, somewhat clear.

I think I could add that better in the past in some of these projects too, or they became so, so big. Like Airflow is kind of everything too, right? Like it does so much. Where now it's, maybe it's not as good at certain things because it's a little bit less focused. But in a lot of ways, it's proven at this point that it's co-boiled because there's like more than 100,000 organizations using, say, Airflow. So I think adoption speaks volume too.

And it's like, you know, maybe it's not in some ways there's a new generation of the engineering tool that are, I think, offer more guarantees, but they, they, they, they put more constraint in your ways. It's a trade-off. Like with more constraint, we can provide more guarantees. So you need this, you know, maybe there's more you need to do to respect the framework.

And if you do, you get more value of it. Maybe Airflow at the right level of like constraint guarantees at the time. To be less data driven. So give me your tasks and I'll run your task. There's not necessarily tell me everything about your data and like define your schema is up front. And, you know, if I don't know your lineage, I can't run this stuff.

So it's like, okay, give me some tasks. I'll run them for a lot of people at the time where they adopted it, you know, it worked. And that stuff is like extremely sticky. Like there's going to be people. If there's no, you know, meltdown of the planet, there's going to be people running Airflow in 100 years, probably somewhere.

It's the way we have, you know, mainframes today. So next, next. I'm very curious like, you know, for Airflow in like the early days, right? Like once you hit that inflection point where you being like the main creator can no longer sort of oversee all the different aspects of it, where you do need to rely on like a community members. Like how do you go about like picking the right people? And how do you like kind of foster that relationship?

Yeah, so I think like fostering is just going to one interaction at a time, right? So then being welcoming and I'd say like spend more cycle with the related to the impact of the people that they've had in their commitment. So a lot of people, you know, I've talked to so many people at so many companies.

And sometimes like entire companies like, I we want to put 12 people on this project, like you need to work with us so that we to enable us. And in the end, like they don't really they never really get started. So basically saying that they're like the best way to define we're going to put energy is how much return you've gotten from putting energy on individual organization and people you interact with.

So if someone says if you're a manager in someone in your team is like super promising, maybe you have multiple interns and it one is just like every time you help them a little bit, it multiplies their impact capacity, like spend more cycles with the people that makes that support it. Yeah, that's makes sense.

Though sometimes you have the opposite where it's really natural to to go and help the people struggling more or to spend more time with the people that make the most noise, for instance, as opposed to the people that have the best track record.

So that's a that's a thing there, but it's the fundamental reason why open source working so well is it's a meritocracy right and then and then you really get a embrace that so it's really you define empowerment and maybe how much you help and support people based on the accumulated merit.

And of course you need to provide a way for people to go from zero to some amount of merit, right. But then from from that point, I think it's a good governance model for a lot of things and definitely for software. So I'm like with meritocracy for some issues, a US government that wasn't at work, you know, like instead where you're going to do a we're going to try meritocracy from here. So we were chatting with Mitchell Hachi, like that hashi court.

Exactly. I thought like he had this really cool framework of like dealing with pros that he like learn from working at the Apple store of you know this whole acronym of like AFP. I thought it was quite cool. Like I'm sure there were a lot of pros through like airflow and preset like how did you go about handling the problem children.

I'm curious not to go back and listen to that episode. But and to hear the framework, but I've seen a lot less like negative deception in on around the communities have been part of and around software and GitHub.

Then I ever then I thought I would counter if we ask me like, hey, you got to interact with like, you know, thousands of people, what percentage do you think are going to be a bad interaction. I would have said like, oh, there's so much like stupidity in the world and you go out there in the world and you see so much like friction and troll and just negative emotion everywhere.

But like to me like in my professional life and my open source life, I've seen very little people that are, you know, very deceptive or like trolling or just like acting like a douche right like they we don't we don't see a whole lot of that on repositories. There's definitely some instances of it, but they're few and far between. I think we were saying max is that data engineers are very nice people.

There is. That was very subtle, but nice. But even like Silicon Valley is like if you work in Silicon Valley and the companies I worked at as always been like educated nice people. I think there's a handful of people that I could look back and said I've had numerous bad interaction with this individual. I think good organizations are these the companies that I've been lucky to be a company is with really good culture.

With good immune system around like just not letting an asshole like stick around for very long. What about like say people with good intentions about like poor execution.

Well, so it's so that's more the danger I would say right so that's where I would say like cut your losses there. So if someone opens a very large and ambitious PR and a little bit of guidance does not course correct them. Then you know you should probably spend more time and attention on the PRs that are promising the people that are learning faster.

So it goes with what I was saying before so like spend your time where you can be helpful and where there is you know a good outcome without too much of your time and coming and support. So I would say like focus on the committers, the contributors that that you know I've demonstrated so far that they're doing well.

Sometimes you might see from the first PR the first draft you're like oh this is someone that knows what they're doing you know or from the first like review on like you can get a pretty good sense for it. And what do you think are the attributes that makes like a really good open source contributor.

Yeah, I think you know it's a lot of it's a lot of things in some ways it's not that different from a good software engineer right. One thing one thing that I say that might be an interesting thought for people outside of everything that's already been said around you know what are the good skills to be a good software engineer they're the engineers like one thing that I think is extremely undervalued.

And in software engineering and general and technical position is just like code orientation just being able to find your way in a large code base because like you end up on like these messy big repos that are layered with stuff right like you join a company like Facebook there's like two big monoripos with a hundred different ways of doing things and you know thousands of microservices.

Similarly like you know you want to contribute to our flow super said or really any open source project out there where do you start like maybe one change the color of a button like where is that button where do you find it or you want to you want to add a field to a dashboard that's the common digiting problem so they're disinterested in that's like oh you need to you need to carry like we get this data from HubSpot and there's a dashboard and this you know this thing and then we got to carry this new custom property all the way through the pipeline.

To the end. What is that pipeline like where is this dashboard pointing to so I think code orientation be able to find your way in a large repository and forget what to do and out of decipher everything has been done before and why is extremely key and that's some stuff you only get through building context and spending time in cycles like doing this and want to repo and a lot of this is transposeable right what the patterns you learn on a project or repository when you get the data.

So it's like your navigation skill you you got in some village or in some country they do transpose when you get to a new village or a new country and some capacity.

That's a nice analogy for me that I was really glad that I left my first company because of exactly that because the first company I was at was like super green like I built like a lot of stuff that my team was working on so they kind of knew like where all the things are like how things are set up and then so when I left and I went to the second company say like a C code right and I was like oh my gosh how do you want why is there shit everywhere like it's I mean my was also pretty shit but you know it's very different I don't like this.

So other than going to a new company like is there any recommendations I guess I'm like engineers like wanting to give better this like you know a code orientation. Yeah I mean I think open source is always an outlet right because there's all this code out there that you can go and get lost into or try to find your way so then you have to find something worthy to work on but that's usually pretty easy.

So to some ensure so maybe venturing off you know as a good thing and one way to venture off is to transplant yourself like there's no better way than to make moving to a new country that forced yourself to learn a new culture right and and and develop a lot of these like these skills that are really important in terms of orientation they figure out how to make things work in a different context but I would say yeah I think like generally to expose to force force yourself out.

So you're a good thing you know something that's interesting related to that is like you know face I learned so much I phrase for me was like big kick off of my career that's where I change from like old school to new school in some ways and I felt like that's where I first got really empowered but they were very limited in their own stack in a lot of ways like they did things their own way and it's interesting I say day now I said we for so long.

But but yeah so I think like to go to Airbnb and then that forced us into the open source ecosystem because there there was no like like I face looks like they are on like orchestration their own like build system and there's a solution for everything and all these names that you learn in these context of like how the microservices work together like all of that is not that useful outside of face.

Facebook you need to do that translation of like oh Kubernetes is the clone of Tupperware right and then and but it's so much more useful to know Kubernetes and Tupperware because Tupperware has no use outside of Facebook right.

So I think like to go to a more open place where you can I mean that's a great thing for everyone's career to use more open source because everything you learn there is trans transposeable to the rest of the world and then any work that you do in the open is also recognize invisible to the to to to everyone forever as opposed to you leave a proprietary company and and then no one knows there's no track record.

That's visible to anyone so I doubt that I can interview you or you know read your resume to figure out. You know your your your extended reputation so working on open source always a great thing you know getting involved in different things.

I assume working in open source like depending on the project is like you pick your inside playing a game you're picking your difficulty level like depending on like the culture or like how much sort of infrastructure is around that project or like what the people are like you know for maybe someone starting okay I guess you can't pick air flow and freeze that out of the team but outside of those two like are there like easier you know projects to start with maybe that's not in rust you know.

Yeah well so I think it's all relative to where you're starting from but I think like if the the purpose is what's most important because if you're like I'm going to go and try to contribute I'm going to pick a project that's cool like you know I'm going to pick Kubernetes and try to go contribute something I think that's I think that's still a cool and it's good to do it and it's very good to get exposure to these big projects to like you know that have like you know hundreds of contributors and effective contributors.

But but I would say to scratch your own itches of really so you pip install library doesn't quite do what you needed to do or you find you find a bug or be nice if this method existed to go and then start with just scratching your itch and someone else's repository is a really nice way to get involved because it will force this little bit of exploration I was talking about developing the code navigation skill like okay what is this okay I pip install this library.

I'm reading the documentation the method I need is not there the object that I'm using is you know the documentation is not clear on something they may get to the bottom of this then you have to get that repo and figure out your orientation that repo clone it contribute something and track with someone and I don't know maybe eventually like make a friend along the way and then maybe like oh I'm going to start using this for more things and advocate for it and contribute more to it so so I would say find like don't try to find an art.

Yeah, it's like if you're a day of your day you use air flow every day. I mean I really I don't use it. But I think it applies just as well right like whatever you use daily you're like oh this is my toolkit. Yeah I think it's really good to spend some time like using your if you use an axe you should spend some time sharpening your axe and sharpening your axe and software. Often might mean contributing to an open source project you use.

One thing which I would just add is like in general for engineers especially early in their career like read open source course but like similar to what you said if you're using a specific toolkit just go read about that even if initially things just don't make sense read it over and over again and read code more than you end up writing because I think just navigating these large court basis helps one understand how to go about organizing their own code eventually.

And then making it over time easier to find things like oh I need this thing where how does it do it this thing all I know where to look or at least I can navigate my way through it much faster. Oh yeah is that how you guys started with Kubernetes. For me it was pretty much that so I work a lot on primarily on Kubernetes or the last four years.

Yes and it's like a lot of things with any project there are many ways to do a certain thing and then like well I don't know what the right way to do this is and in many cases like a why does this thing operate this way the documentation say is X is it really true. So I think developing that practice of always go to the court to see where it actually means and how it actually works like that's where you will know the guarantees and documentation could be out of date.

And navigating the code will also help at least me understand a lot more about just a system and some other principles that the system has been built with and I'm sure it's true for other open source projects too. It makes the leap of contributing feel like a lot easier to you read you're in the code you're like oh I you know I know where that would be and how it would change it to do certain things and it definitely makes the system less care.

Like you're like oh this is such a giant system how can I even go more understanding something was like I have I have this one question let me try to get answer to that one question. And along the way you kind of know that I don't need to look at the other 10,000 lines I just get about these three so it's a matter of finding these three.

The big thing is like the imposter syndrome a lot of people are like oh would they accept a PR for me you know or like there's that I think I've heard that so many times of like oh wait do they even accept PR and then they're going to someone going to make fun of me or I don't know what exactly.

I think people are just like others there's a line here there's an imaginary line that absolutely cannot cross or and it might be a difficulty thing it might be a self self value or self confidence issue or this or that but that's something that we need to shoot down actively in open source. I don't know whether I place it I mean the right place to do this is like in a place like here to say like anyone can for like if you contribute a PR people will be like super happy to receive it.

If you open an issue you contribute a PR even if it's misinformed incomplete draft there's a nice little draft button and draft you know the question but I people will and people people will be stoked to see a new face on the repo and and as I said like I've seen very very very little like negative interactions anywhere. Most of the places I've seen it is like entitled users right people are like I can't believe you don't offer a way to do I'm like like just contribute it like seriously.

It's like I'm not like what what the relationship do we have I don't know what the company you work for. It's it's fun it's interesting that you think we owe you that you know but but I think most people are I have the I have the bias of being overly cautious.

We need to fix that maybe we fix that we're going to get more entitlement which we're going to have to fight back on but but overall the problem we have is we need to we need more empowerment be more welcoming for people to just be like yeah I can totally fight pip install it I can open a PR on it for sure. I would just plug in a tool here like it's not a tool but a company so good.

The tip is amazing for hosting code obviously and majority of open source repels live there I would say for code navigation personally I love source graph like from a code navigation and search perspective. It's at least my go to to and I know a lot of my colleagues to who use it as well.

So it's significantly better than what you get to get with get up and makes code navigation much more easy like you have an ID on the browser when you're searching opens or scored and I think get up can use some improvements there. Yeah I mean that's that sounds like a great thing like knowing that code navigation is so important and so potentially challenging you know having better tooling there really helps up for me.

I'm just a place where like I still use them I don't use a lot of ID I'm just a school because this guy and I'm not saying that it's better it's just like bad habits I've got all the muscle memory and I like to have my shell really cool. Wait wait wait wait like Vim in shell or like V.I. mode within like V.S. code just trying to get the gauge.

So definitely I'm in T-mux and I'm in the shell but it's not because I think every year I'm like I need to teach myself you know a proper IDE and then I don't because I just refer I just use like my own method I I get grip a lot like I just like the way I navigate code is kind of my own way of doing things using like shell and shell and bash.

Just in general but I think like having I think the new world of stuff engineering you know you can have these these great graph of saying like this method is part of this class here's it in there it's in scheme it's all visual you can click around. You can do that and Vim in some other ways like we all have different ways of doing the same things and what's hard is like if you have muscle memory and then send it to us.

I don't think I mean I can't do it but I think like I watch it on Twitch right it just like someone who's like really really good because you just see like without clicking anything right just with all the switches and yeah yeah it's it's going to be. So talking about open source price like you you came to Airbnb from Facebook and obviously you knew a lot of these tools existed and there were some gaps and you wanted to build something new.

For Airbnb what was that pitch like I mean you are a data engineer on the team you're like hey let's build a new tool and open source it and yes sounds like a great idea. Well that was a little bit of an item. Well as I joined with that premise so I was really I was very happy at Facebook but I like the idea of like moving often just to kind of force yourself to to to to experience the environment and stuff like that.

But I moved that was the premise on which I decided to join was like I'm going to get to work on this problem and I'm likely to be able to open source the stuff that everything goes well and then. Actually between the job I took a break about two weeks and I started writing airflow and and invocation in Mexico on the beach is like I'm like class dad does it go capital D. A. G. or D.

I remember those moments of saying like okay well what's the executor I should use you know local one you know celery one so what are we going to need that was pre-cubran but yes so so and then that landed on my own personal repo as I joined and then I was like okay let's try to use this stuff here and internally but I think I did a play for them.

I mean for organization sponsoring open source is that there's a bunch of things one is like attracting talent like I would not have gone without that guarantee but then I think there's the huge like for for a long time and matters that matter more or when the market gets more competitive matters more but the the aura of the engineering team is really important for these are like engineering driven organization.

So for Airbnb to be like you know we do have these you know these 12 of the source projects and you can see everything we do. Out in the open you might get to work on some of this stuff in the open if you join because is kind of exciting for for people so it's more on like talent acquisition retention.

I think is the real thing because like the angle of like we we get like free contribution to projects we care about I think is you can you play that card is like somewhat arguable but it's generally tip to me has been not a super net positive though I don't know or maybe that's early on in the projects where it was most active but like the fact that say Airbnb is they pray they they have a huge amount of airflow and the fact that airflow is most important.

If I that airflow is much greater than it would have been if it would have been like one person working on that problem in isolation. It's is a really positive thing now people can come in it out Airbnb and know the orchestration you know so you have these two open source projects and I think both came out when you were an Airbnb like airflow and superset and you worked out left after Airbnb and then you initially started preset. What prompted you to start a company around superset.

Well so so much of things there but I think the move from Airbnb to live I was ready to like you know I just get like this feeling I gotta keep moving you know so after like three years at every now I can get where am I going next. I also wanted to plant the seed of like superset try different contexts and then plant that seed and create a team there to around around it and I was just really excited to work on more geospatial real time stuff.

It's just like seem really fun to work on and then in terms like starting the company so the VC started approaching me. It was in the fall of like probably before then but like in 2018 and I think it was as a result of things like you know hashie core being super successful confluent data bricks that like oh shit like commercial open source can be a really good business model in some cases.

So what are the open source projects out there that are getting attractions are popular so a lot of people found me as this became a pattern. Martin Katsado at East East India think like what's part of his thesis like data open source like we're going to like the modern data stack. It's going to be like open source so they found me like why don't you start a company I was like I don't know I love to just chase IPOs and go from tech startup to tech start up and work on open source.

That's not a little stressful you know I'm not sure if I don't I don't really want to you know I don't have the MBA type skills and I'm sure if I want to acquire them. But then I realized to that so I was in my they had just turned 40 and then I realized it was just like a really unique opportunity.

The VCs like we want to find you very very well and you don't need to write a business plan and all this stuff like you just really you know you're in a position that a lot of want to be founders wish they were in which is like you get like a tub VC would get investment when we skip the seed ground once right this series a. So I was like I'm going to regret it for my whole life if I don't take this opportunity at this time very unique opportunity so.

And the rest is it's going to history and it's been super in terms of like I talked a few times already about like taking yourself out your comfort zone to learn new things and maybe become a better professional better human as a result like that that was the single most important things that I've done to just like transplant myself to this different planet be a founder.

And it's been it's been super great it's been a fun right but with intense ups and downs any was the biggest like surprises I guess compared to expectations. I think one thing was you you kind of think the VCs are a little bit evil and that the oversight is going to be very intense like basically like I'm I'm going to be the moment I take this money it's you know lots of millions of dollars.

The heat is on right like people are going to be on my ass looking for results at all time and the pressure the tension is going to come from the top down and then what I realize is like not that there's no pressure the sticks are extremely high but it's mostly self inflicted and it's mostly like if I do well you know I can I can do very very well you know if the company does well and then you make promises to you know every investor but also every employee every customer every prospect.

So people and little by little the pressure goes up but it's not necessarily inflicted by the investors or the organization that and it's surprising how much just latitude you have as to how you run your business like no one is like oh you got to do a or B you know it's just like you're like okay well you build the business you want to build it which is great.

So you definitely don't come across as the MV type some in you you see you and right now as a CEO of the company you're deep into the trenches and actually writing code which is super impressive to see. How do you go about thinking about the business plan for example like this is not a skill set that many engineers typically have and I would say engineers probably make the worst customers. So how do you put yourself in that show and say okay this is how I can build a business around this.

Yeah I mean I think you take like the challenges of being a founders first grad problem you go the first principle and then you try to figure out how you should organize your time and where you need to seek advice from and what's most important to work on you know today this week this month this year you know find the right to advisor and the right people to surround yourself with.

So it's nothing nothing unusual on the on the answer here on the on coding I think it's like maybe by the time we got to a certain scale I just didn't make sense to do any of the coding that would that the company would depend on to succeed. Right so then it's more like I could for the early on I was like definitely very involved like when we're like less than 10 still like a you know acting as a very active engineer and PM.

And then over time I think distancing myself from from that and more saying like okay I could because I need that as an outlet or it's good for my mental health or something like that I just like you know or that's all I've been realizing myself for the past 15 years that did not have that as an outlet that I know that I'm good at.

You know it's difficult so then I disin myself from that and there's a good long period where I didn't code at all there's just like too much stuff to to manage and then recently I decided to spend more time to be to where the CTO hat you know more often which includes like being being a code basis very positive thing to be around.

So yeah it's been an interesting journey there and you learn things along the way like maybe things that you really love that you didn't know you might love and then some some things that you're like okay I know I need to find someone to do that for me. Wait, I'm not interested in that. Wait was there something that you found that you really that you didn't know that you love.

Yeah I think or that and it goes with like what you're good at is generally what you love to work in the sign fashion but like I love product marketing in general now. So just like messaging positioning pricing packaging some of the strategy right like how do we think and expose about the component of the product right to the market.

So and that can shape that shapes not that can shape the roadmap the product direction to right so it's a so maybe the layer of like I feel like scope mission vision scope for product like the product marketing can shape the direction of these things it can apply that stuff but it can also shape the direction of it.

I think like I was doing it naturally in open source in some ways right like the airflow had a logo and a one liner and some way like what it does what it doesn't do right like there was a there if you think of a read me of an open source project. It's effectively product marketing is how you present your project to the potential community right. So that's that's a thing I think I generally hate operation like hate operation but it's a must do but like just things that are more repetitive.

Financial planning is kind of interesting like modeling stuff in Excel you know I don't know but there's the diversity of these things is what's what's interesting. And then management has never been like my like I like you know spending time with people to but like managing this you know prefer like coaching leadership then management.

When you said so free therapy so so when you are viewing coding in balance like how do you go about picking what work on what to work on yeah I don't know it's it's a mix of like if you use you duck food in your user the product you can fix the. Little things that I know you on the products so sometimes I don't really like CSS but I hate you know cricket pictures on the wall my OCD triggers so like also some some that is really easy usually doesn't get in the way.

It's non critical work and then about going more metaphorically recently I got I got closer to the repo and got really into dev lap our experience doctor dot. Dr. Compose. Helm chart like just making sure that the stuff we all this the CICD stuff.

Doctor builds like getting that stuff to actually run and it's more of getting more meta on the problem as you have more senior I don't know what it is I'm not that passionate about that I'm like I can't get up actions and like it's it's really hard to work with but maybe it's like the repo really needed it to.

I think that's cool pattern you're describing I've noticed that pattern obviously different context different scale but translating that to a tech lead instead of a founder for example like a tech lead bring in the trenches initially designing exactly what the system should do and should not do and then slowly they go out kind of spread the word for it within the company outside the company fits open source and then eventually that start looking at it from a user's time point in the start fixing these things that you mentioned like well someone should be able to get clone and get better.

Something build right away for example in proving that the CICD pipe. It makes sense to put on that user hat and seeing how the product can be improved not just the engineering aspect of it.

Yeah the developer is their both like user experience and the developer experience there both like closer to you man which is kind of interesting and they're both like kind of if you think of like the development pipeline or like what's you know middleware or their their both like want CSS is like probably the most like veneer layer on the application.

And then CICD is like the deep right of the back so but it's like extreme but in some ways it comes full circle because in both cases one is about user experience for a developer and then user experience for our user the product.

And in the middle maybe it's like because the middle is like get so tangled up and complicated you're like if I touch anything and you know it's a bag of not you start touching and then you got to you know you got to get deeper so that maybe that's why I'm staying away from the the gut. And you don't want to be in the blocking side. I think the therapy for the free therapy. That's it. That's not that yeah then you need some extra real therapy.

So for like a listener right that's like yo this next guy is pretty cool. I want to do what he does. I want to make the next airflow for LLM's like what do you? Okay I don't think you'd be fair to ask you like what that would mean or like in terms of like what specific project but like what does that journey look like right because I for you you join like big companies you see like how things work at scale like what we're

going to be a device be to this engineer. Well it's the first thing I would say don't be a founder. It's a part it doesn't work.

I think you need to tell people to not be a founder because like you need the skill that you need to succeed is like this like delusional like I don't care what you think I'm going to do it regardless. So we're kind of testing that with yeah you need because you need some of that so people need to break through like like this guy max we sit down this podcast and my other people that told me not to be a founder I'm still going to be a founder so I think it's good

overall to put that message out there unless you like you know chewing glass and like swimming sub zero water and like it just like this this stuff that's like kind of more more brutal but then in terms of like

the question on like how to start a project I think to be in tune with your your environment and its needs and skills and get a holes in the market offering for open source specifically you know I wouldn't be great if there was a tool that allow me to do test driven development type the stuff with prompt engineering

so it doesn't exist well maybe I can create a thing there so to keep I think it's better when it comes from within and scratching your own itch and you use case and the hole that you observe from a place that you're very very familiar with right so that's that's key for me there's

two projects I'm looking at that are like mostly in a in the early the kind of elevation phase that if people wanted to get involved to kind of run with the thing or with the idea or with some of the you know that assets and the thinking that that put together I can kind of pitch these projects

and maybe as an example to of the kind of projects that could be interesting to someone and not necessarily like take take it and run or come and collaborate with us and get it started but the first one is around

cementic layer and I know like DBT is coming up with a good like metrics layer cementic layer that's super interesting we're integrating with it at preset but then we've had a hackathon and some set of like new ideas that extend upon this idea so the world really need a universal open source

cementic layer that works well and simple so if you look maybe we'll put the link in the show note but I think it's that preset IEO slash All Stars to SQL All Stars to to semitically layer that works as a virtual database so you put your semantics of like what are your metrics and dimension and which you know you map your schema as an all the semantic layers or you say this table is going to be joined this column is a metric you know this is a dimension and you organize all your stuff as code

as it should so similar to look ML from that perspective but then it's exposed as a virtual database so you have a table called star so you can say select stuff from star and star becomes we transpile we would transpile your SQL behind us and it's exposed as a large flat but behind us see in which transpile your SQL to do the underlying joins that need to be done so it's a cool idea and there's just some other ideas around that around progressive adoptability and having

the cementic layer guess your or help you in terms of guessing the semantics of your schema that your schema already as information you should be able to figure out which tables you can join in how what looks like a metric and a dimension already so the cementic can be mostly inferred progressive that progressively adoptable you can enrich it over time and still get value from day

zero and it's exposed as a virtual database so that every BI to out there is already compatible with it because it's a SQL interface that was one idea. And I guess so because it's everything's more codified like then it makes like the LLM sort of improvements and stuff like with that.

The LM and goal in this is cementic layer there's two angles I can think of now one is like well the LM can help you define your cementic layer like set up your semantics but then if you do have your semantics set up it's it becomes like a map for the LLM to better understand your your schema and maybe instead of generating you know the LM might do better with the abstraction layer and it does without it.

So right because the cementic layer is there to use to help business user self serve and BI tools so that means that if it helps business user self serve it should help an LLM in any form of intelligence you know self serve so that's one project yeah I don't know if we want to get deeper into this one before I have a question.

Oh yeah so so the cementic layer is you can think of also maybe I'll start with the purpose but the purpose of cementic layer is to help more people namely like business user self serve with their data without necessarily like knowing SQL or understanding as much of the underlying data model.

So it bridges the gap between so this cementic layer is a bit of a map of your database and it maps the metrics and dimension and more like business term right so instead of having cryptic table names and column names that you don't necessarily know how to join you need to write SQL to get to make sense of we expose a layer on top of that that has the map of the physical layer plus metrics and dimension pretty label pretty descriptions.

So that's a that's the general idea and a lot of these things historically have been part of the BI tools their proprietary by nature and they're not shareable across tools so that's a bit of a problem if you use multiple BI tool which most companies do so whatever you do and look here you can't really take with you on the table or super said or or any other tool so.

So the cementic layer define us code expose as a virtual database of some cool ideas so there's a there's a repo out there that's mostly just early skiff holding of what project might look like and the other thing I think that the other need that we identified is something around data access policy as code so every company and the bigger the company I think the more of a reality that is but every company needs to define groups of users.

And what data access they should have in mostly for an alert an alert expert says right like of course like every company needs to define like rules and access of every user to different systems but this would be more targeted towards the access policy so you probably have some snowflakes and big queries and database left and right different BI tools and you can say certain people have access to certain tables or columns or schemas.

And and maybe this kind of category of user as access to the schema but not the PII that you know in here there so so the other project is it would be called governor with the aesthetic of our or no choice and I go for sure like somewhere there in the logo I think we have a logo that got mid-Journey to produce but so go so and then a governor you would define your did access policy as code and it can push and pull.

Or as as code or as yaml right in the repo so it's like in a file system and repo so you can know exactly who has access to what and who gave access to what and you know when you change the access you do it and the repo so you know you know gave access to what when and then the privacy ally to or CICD type tool where you can push and pull to different you know sources and destination so you could pull whatever you put in snowflake.

So it's snowflake as a data access policy rule stamp it as code change it push it back or push it to other systems and that's also a problem for us that we have which is it's good for sometimes the BI tool needs to know about your access rules so we know which charts and dashboards to show you.

But you don't want to have a service account that has access to everything and the BI tool enforce the data access policy you want to enforce it at both layers because the user might want to go straight to this snowflake UI or the BigQuery UI console and then that consistent you know what they see what they have access to so this tool would allow for people to manage you know did access policy centrally and synchronize across BI tools databases and things like that.

I imagine the like the access simulation will be pretty interesting to build out and pretty core right because it's kind of like I'm not for I am only a WS it was such a pain to work with like before they actually kind of have more like made a user friendly to actually test out like different policies and how that impacts things instead of just.

Yeah, when it's managed as code you know that's a theme in my career like for for prompting my for air flow is like you know data pipelines as code but like when things are managed as code you can test them you can virgin them you can review them you can know who changed what when.

And then you can see I see the bunch of things before you deploy you know your your data access policy you can run a bunch of tests to say like make sure data scientists a you know simulated data scientists make sure they don't have access to financial data for instance. I'll make sure they they cannot get push on this repo because they're going to break everything. So you in the advice you mentioned to people don't be a founder unless they want to glass or something.

So go on ask the question of what surprises but your your voice makes me ask like what have been the challenges of being a partner in this case especially when you're building a company around an open source project. I think it's a lot of things I mean the first thing is like you know taking yourself out of your comfort zone and learning a bunch of things that you may or may not be passionate about but it doesn't matter because it's what it takes to succeed right so then you're like okay.

Now I'm putting myself in this situation where I have to learn new things are really important really core to this business succeeding and I may or may not like that so unless like you know you might love these things or you love to be outside your comfort zone.

Then that's a bit of an exercise you don't know how it's going to go and then this this stakes are very high so sometimes I don't know these things go together better like reward and recognition is really important for everyone everyone wants to be kind of recognizing rewarded.

But it comes usually with you know hardship or difficulty level of what you take on right so if you want to be a volunteer for an organization work part time or be an advisor it can be rewarding but it's not going to be as rewarding as you know doing the real thing and you know besting your ass for for years and things like that so but these things like there's a danger in putting yourself in a situation where

the stakes are so high you know that the potential for reward recognition is very high but it's also like the risk of like you know coming short on things is very high too and the stakes are really high so unless you want to be really a hard core like more progressive approach in life to things is probably better like climbing their ranks at some companies and like making sure along the way that it's like I'm still comfortable managing them.

I'm going to be able to do a little bit of a little bit closer to being an executive at a company I still like this progression I'm going to take another step in that direction of the founder you're like okay let's just go and throw myself in a complete different area and it should be fun right or I'm going to get rich I don't know what you really think but I would say like a more progressive approach to the things you do in life is probably better.

I mean there's a lot of reasons why you want to do it we might want to do it and people do do it but but think about it you know of course before doing this big jump. Good advice good advice. By the way you mentioned you're generated this I'm doing a hard pivot here you mentioned the governor of local using mid journey so we spent a lot of time trying to generate local for a podcast using a bunch of these AI tools.

It's been significant evenings on just generating logos on Dali on chat you can eat one thing which I've on was like you given a prompt and they come up with a logo which kind of is okay but if you ask them to change something or put in a text they are terrible like if I say spell software and have it in the logo they never do that I never got the system.

They're learning how to write so yes they have to ask like do not write anything and then you add it in Photoshop or you use like what it's good at and then you add the layer of like what you're good at. But yeah it's better like prompt to instead of early just better working with these tools instead of like just yeah sorry we're sure. I was going to show the logo that I made if I was pretty good. You got mid journey to do that for you. I think it might might have been checked. Oh yeah wow.

But there was definitely I'd be interesting to pull this session because I did fight with it quite a bit because I wanted like both the governor you know I wanted both Arnie and the database logos that was pretty tricky but I got it to do this day I wanted the red eye because at some point it comes up with a red eye and like I love the red eye I wanted then it stops doing it so you have to ask for it and there's no one to do it.

But there was definitely like a bunch of prompts here for me to get to this. So that's if you want to check it out so that's all my personal get up the project is kind of early it has mostly just information architecture and some of the ideas are behind it. And the other one is called All Stars that's on the preset one really a bit further uses some of the I don't know if it's legal but as that's the Mario spawn here.

But that's pretty fun when there is a few interested in some ethical ares or the future of them that's a by a fun repo to read. Oh, so budu budu. Well this has been an awesome conversation Max. Thank you so much for taking the time. This was super amazing for us. This was super fun. Yeah, it's like really interesting topics overall so glad I showed up on the show.

We'll have links to all the things we talked about presets, superset airflow, the new repose that you mentioned go our nature All Stars so that people can go find these projects and hopefully contribute to them. And we'll also link to your profile of course so that people can find you as well. That'd be great if people wanted to run with these projects you know.

Right now I don't have to bandwidth I want to work on these things but it's like I don't have like we're pretty busy at preset working on a bunch of other things too. But it's likely I'll work on some of these things so if people wanted to get a little bit closer and help lead on these projects could be fun. Yeah, if either of these projects get a contribution for this podcast out count that as a win.

Or even I think like in a lot of like a lot of like what's great about some of these some of these things and my attention and putting the read me out there is like oh we might build this stuff but I think it's it's also just the ideas you know in the empowerment like you just put the idea out there and someone might run with it. It might be like you know what I have a different twist on this too. So I don't I don't just think like code should be open and free. It's like ideas too.

I'm very I'm used against like intellectual property. I just think these two words don't fit well together like intellectual property. No it's like the ideas want to be free. You know code wants to be free too. So it's like that. For sure. Thank you so much. Cool. Thank you so much. Thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at software misadventures.com.

You can also write to us at hello at software misadventures.com. We would love to hear from you. Until next time take care.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.