Hello, and welcome to the data engineering podcast, the show about modern data management. If you lead a data team, you know this pain. Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one off tools instead of doing actual data work.
Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data while keeping it all secure. Type a prompt like build me a self-service reporting tool that lets teams query customer metrics from Databricks, and they get a production ready app with the permissions and governance built in. They can self serve, and you get your time back. It's data democratization without the chaos.
Check out Retool at dataengineeringpodcast.com slash Retool today, that's r e t o o l, and see how other data teams are scaling self-service. Because let's be honest, we all need to retool how we handle data requests.
Your host is Tobias Maci, and today I'm interviewing Himant Goyal about how data platform investments support consumption based business models. So, Hemant, can you start by introducing yourself? Sure. Yeah. First of all, thank you for this opportunity, Tobias, and thank you for doing this podcast for the community and helping people learn from each other. So very, very thank you for that. So currently, I'm a senior product manager at Salesforce, and my focus has been building data platforms
that turn raw usage signals into revenue machine and growth levers for the B2B business. What it means basically, you know, going further down technically, I'd sit at the intersection of data engineering, subscription renewals, customer success, and I would say all the business teams who uses all the consumption data to make decisions throughout the day.
So my job has been, you know, empowering the modern consumption data models where usage metering, contractual entitlements, data quality, real time visibility of the customer's usage, shape the unit economics and product strategy. And I'm also do, I also do a lot of volunteer activities where the most interesting one is the, I'm on the industry advisory board on the San Jose State University in the data science department.
So that's the other volunteer activity that I do to help students compete with the current job market.
And do you remember how you first got started working in the data and data product space?
Sure, so honestly it does, I didn't really thought about, you know, I would be really working in the data space, so it is more have been an organic evolution than a plant pivot. Okay. I started as a business analyst when I was in New York. I was working for Tata Tata Consultancy Services, working for a Citibank North America, and Citibank North America is one of the biggest businesses I have seen, huge volumes of data. But you know,
since I was working as a business analyst over there, I started realizing, you know, that I was drawn less to documenting the business requirements, I was more interested into why am I building, why am I building? So all of those questions always bothered me, you know, so that was more of a turning point for me, you know, understanding the more history behind the data, okay, and
mostly I realized that we are talking less in the sprint meetings and the more important discussions are happening at the top, the leadership, know, that's when I thought I think I should take up a more of a product manager type of role where I can like get a chair on the table, get directly involved with the leadership, know, look at how the leadership look at all the insights, all the products, how the product usage is shaping, how the product dynamics work, all of that, you know,
was a motivation factor for me, you know, to get involved in the, I would say the data management space as a product manager. And if I look at back, right, I mean, there's a common thing around it, like whatever you do, data sits at the center of everything, that is the most critical decision that everybody makes today, okay, investing in data, and with JN AI
and all the AI use cases, right, where everybody is trying to automate the repetitive tasks, right, I mean data plays a more important role now, right, like a lot of people have been like, who are not making a big investment in the data is now making investments in the data. So I would say the product management gave me a leverage, you know, to apply my platform thinking to solving the problems with the power of data.
And so digging now into this overall space of consumption based businesses and consumption based unit economics. I'm wondering if you can just start by giving a bit of a grounding as to the types of products and the types of businesses that are using that model of pricing and the goods to be sold?
Sure. So if we look at today's businesses or we look at today's economy, right, consumption businesses are everywhere now. Right? Like, I mean, even if I'm doing my personal projects, I'm buying some credits from Google Cloud. Right? So it doesn't limit me to anything. I can
make a data pipeline, I can set up a job, all I'm all it is converting in the back end is to some credits that I'm using, right? So that is the unit economics that we are in the world today, right? So consumption businesses are everywhere, right? If you talk about cloud infrastructure, right? All the GC, Google Cloud, AWS, and then Microsoft Azure, right, they all have different, different ways to measure the customer's
consumption, their compute hours, their storage, their bandwidth that they're using, right, everything transform to some unit, right? And similarly, if you look at the data platform services that are there in the market, I mean, a lot of people use different one, but I'm aware of Snowflake Databricks, they all transform to something like how many rows customers are processing, what credits they have, how many queries they are firing, right, how many API calls,
or if we talk about all the workflow tools that are there, right, and all the data integration tool, API management tools, etc, right, they have their way of measuring the customers' consumption. So all of these are consumption based model. Right? All the b to b businesses that I've seen are using the consumption models one or the other way to serve their customers, transforming their customers value into revenue.
And so as far as the operation of a business where that unit economics and the amount consumed is core to how the business actually generates revenue, what are some of the ways that that poses unique operational challenges that are different from businesses that are in the either physical goods or services industries?
So once consumption is a unit of cost or is a unit of revenue, right, one thing is very important is that your operations becomes a real time optimization problem. Why I say that? Because now you're using consumption to drive everything. So consumption is part of your operations every day. That's a that's a real time optimization problem means you are juggling with all the metering data, all the cost attribution.
Right? Okay, customer is using this, what is the cost of serving this customer? You are always looking into this COGS part and actual consumption part, whether I'm generating any revenue when customer is using my product, right? So all of these processes become so important as part of the business processes where consumption plays a major role.
Okay, one of as you said about like operational challenges, so consumption itself becomes like how efficiently you measure the consumption, how fast the consumption is available, right? Are you dropping the data somewhere?
So all of this, right, the things are changing now, and one important thing is the billing close processes, right? Now they are also dependent on consumption. When you're generating invoices for the customer, you need to know the actual consumption, the real consumption that customer has. So manual workforce manual workflows that were tolerable at some point when we had the subscription invoicing
is changing today. Now if you do not manage your workflows very well, you are creating a revenue leakage machines. Right? So that is the most important thing today, right, that's one of the important factor everybody should consider as part of the operation, that you need your consumption data to be sitting at the forefront with the highest accuracy and available, like, a real time, near real time, real time type of thing. And one other important thing I would say is, you know,
the data is generated from so many places. Right? It's not just data is flowing from one data one pipeline. Data is flowing from hundreds or thousands of pipelines. So you need a strong governance. Okay? That is where I have seen less investment. Nobody everybody wants to talk about it, but nobody wants to own it. So that is you also need a strong governance. You're versioning around the metrics.
Okay? How you are measuring the consumption. Okay? Or if you if you are in this more complicated business of consumption based pricing, then you need versioning of your rate cards.
How you manage the different different rate cards because COGS changes every day, business changes every day, so the measurement unit also changes every day. So how you're keeping the versions of your rate card. Your customer might only be able to adopt to the new rate card only when the new subscription comes. Otherwise, you have to keep keep your commitment until the I mean, say during the contact periods, right? So all of these are important in the consumption economics, you know, how well you manage your data. So what I see is the data management plays a very important role here in operationally,
you know, keeping you alive in this competitive market.
And you mentioned to the increase in complexity when you're dealing with multiple data sources versus a single choke point for being able to retrieve or consume information from. And I imagine that the complexity and variety of offerings compounds that where if I only have one thing that I'm selling and you're only able to use one resource, then there's a lot lower operational overhead and data complexity involved in managing that metering input
versus if I have 50 or 500 different services, like anybody who's ever tried to navigate the Amazon Web Services dashboard.
Exactly. Right? So yeah, that's the part I was talking about, right? As your business gets more complex, the more products that you offer to the customers, just take an example of data integration, right? Or an applicant, there are 10 different ways to build data integration services today, it's like how you meter that, right, and not everything has the same same cogs around it. There could be a different way of how customer consume a product and there is a different cost to every method.
So you need to understand the dynamics, you know, when you sell these products in a consumption based models, right? What is the unit economics? Okay, what revenue are you making? Okay, out of every service that you offer.
Digging now into some of the complexities of how the need for accurate usage information factors into the ways that the business is able to think about their economics and their revenue streams.
Even taking the case of I only have one resource that's being consumed. So maybe we'll use a topical one such as LLM token usage where I only have one model. There's only one set of inputs and one set of outputs, So I'm able to measure that because it's a single choke point. But the actual execution of that is fairly complicated and requires a lot of investment and a lot of resources. And I'm wondering if you could just talk through some of the ways that businesses need to be thinking about using the cost of goods sold nomenclature,
the actual cost of execution versus the unit of consumption and how the operational and system complexity factors into the ways that you're thinking about your expenses being reflected in those unit economics?
So basically we are looking at how, what are the nuances of the data, right? Specifically in how you manage the pricing of a product, Is that how we are looking at this? Or help me understand this better. Sure. So for the use case of LLM token usage,
the tokens aren't the thing that I as a business are actually having consumed or even if we're using, for instance, e c two instance from Amazon Web Services, the raw RAM and compute aren't things that are being consumed from me. They're a resource that I have available, but they are finite or token usage. It's a resource that is finite because I can only have so many parallel interactions
with a set of models before I run out of hardware capacity. And I'm just wondering if you can talk to some of the ways that those inherent constraints of the overall system need to be factored into the ways that the business thinks about the unit economics and the pricing models for the resource that's actually being metered.
Got it. Got it. Okay. Sure. So see, there are different different ways. Right? I mean, yes, there there are right, especially with LLMs and all, right, where today we are measuring in terms of tokens. Right? So customer buys tokens. Right? But how many tokens are you hitting? Right? Not one query doesn't mean one token. One one query could be like, you know, 100,000 tokens also.
So it's all the richness of your metering data, how you're metering every customer's action. The richness of your metering data sets the ceiling on how sophisticated your pricing can be. Okay? So if you are only capturing the course aggregates, like, you know, customer had this many tokens. Okay? Customer purchased 100,000 tokens from us, he consumed 70,000 already. Right? Something like that, right? So this is like a top level
unit that you are capturing, okay? When I say richness of the metering data, right, how you are capturing the every customer action. He utilized 5,000 tokens for what purpose, what was he trying to do, was he blocked somewhere, you need to capture every action behind those usage. So the fine grained timely events is what you need for your hybrid models to work efficiently. Okay? Because now how do you attribute the calls to the customer action action is right
is what you need to know now. Okay? So when I say fine grained customer usage means you capture every API call, every every token that is being used for any purposes, whether customer is, like, invoking an agent or customer is making a call to an external tool or anything, every action needs to be captured, you know, because now we are in this, you know, context aware incentives,
like, you know, when we offer pricing, right, we try to offer more discounts to the customer as their usage grows. Right? You do you want to give that flexibility to the customer. Okay? Okay. As your as your tier grows, as your as your usage grow, you are eligible to get more discounts with other products. So you need to understand that fine grain in the usage. Okay? And one other thing that is, you know, if you capture the fine grain data, that enables
finance and product team to simulate more margin scenarios. Okay? Before the launch, instead of discovering surprises during their billing cycles. Right? I mean, when someone is offering customer a deal or they're working on a contract, if you have a fine grained data at the time of renewal, you can work with your customers in a better way. Hey, this has been your usage.
Based on your current run rate, we predict that you should go for a higher package, a better tier. We can offer you a better discount. So all of these discussions gets more efficient, gets more productive if you capture the data at the finest grain.
So digging now into that aspect of the capture processing management exploration, there's a lot of complexity in terms of actually making sure that you have appropriate reliability,
uptime, granularity of the data that's being collected, and how that factors into the actual execution of the business. And I'm wondering if you can talk to some of the infrastructure and technology choices that need to be made early on in the development of one of these consumption based businesses and how the data platform investments need to be factored into the actual go to market and application design to ensure that you have that level of detail
available to be able to maintain the business as it goes?
Sure, so if we talk about going to more often, how the architecture should be, right, because as we are understanding more about this, right, first thing is, you know, we need to have this agreement or I would call it like a policy. Okay? Don't treat metering like a some kind of a data platform, treat metering like a financial system. When I say financial system, your unit is your currency,
that's why your metering platform should be your financial system. Just like you have all the finance department set up, you should also give the same respect to all the metering data, because now it's a bread and butter, okay. So not like something you know, I pull the data, I just dump the data into the database, I appended the data, now it's available for usage now. That thinking has to change, now we have to develop our thinking to a platform thinking, because time is changing now.
Okay? At a I mean, when I talk about architectural components, right, at a minimum you need a well defined event schema, how you're going to capture the different different type of meet, I would say the customer usage events. It could be customer could be processing some file, customer could be making an API calls,
so there are different different events that can happen. Customer could be just running an scanner execution for some purposes, right? Because all of this is in the data engineering world, right? You are scanning all the events, you are making API calls to the external system and with LLMs it's getting more complex because now we are building agents. So there are a lot of external tools that you are calling. Right?
So at a minimum, you need a well defined event schema. You need a durable ingestion pipeline, okay, which can source in the data from different different sources. And the more you can optimize, the more windows that you can define. You have a you have a business in different different geographies, right, and different different geographies means different different time zones, right? So you need to factor all of that when you define this
data pipelines, okay? You need a normalization layer, you need a validation layer, okay? And just like in the finance system, have a general ledger, you need a usage ledger. Okay, how my usage is growing? Okay, you need to understand that, you need to have a usage ledger, general. Okay, and then
how you are going to serve your data to the downstream systems. Okay? There are systems there could be different different use cases because now you're in this consumption based businesses, customers wants to track on a daily basis how much am I consuming. Did I introduce something yesterday? Did I implemented something yesterday that have increased my run rate by 5% or 10%?
I need to know why. Am I customers are always interested to know if I deploy something, am I generating enough revenue for my company that that defines the cost that I'm consuming? Okay? So depending on your users, let's say your customer or all the business insights that you need internally as part of your consumption model, you need to understand your customers and you need to define the layers. Okay? Also keeping in mind all the regulations that are in play around the SaaS products.
So you need to understand that dynamics also, how are you going to define that boundary of, okay, I'm serving my customer from space A, I'm serving my internal analytics from space B.
So there are a lot of compliance things that I deal with all the day, that's what I'm talking about it. So you need to have a structure in such a way where you can bring all the data into a common layer, you define the normalization layer and then you define the serving layer, along with never forget about data governance.
Okay, when I say data governance, it's a very general term, but the thing is, it's not just about bringing the data and exposing the data to the users, it's about how you manage the data, how you define the data reliability, how you define the SLS for the data, okay, if the system goes down, how you want to handle it, how you want to handle the customer, or how you want to handle your internal users, you need to define all of that as part of your architectural discussions.
So data governance is not just for like someone sitting on a side and ensuring being a police, okay, you're doing your job, you're a job, no, no, no, data governance is a daily practice now in everything that you do with the data.
And as far as the data platform architectural considerations, to your point of the level of detail and accuracy and responsiveness, it sounds like you're probably not going to opt for a batch oriented
system where it's okay if I wait twenty four hours till I get updates, you're more likely to need some continuous ingest. And I'm curious how that also changes some of the cost of operating and some of the technology choices that need to be made early on and how that factors into the way that you present the features of your business in terms of your ability to be able to react to and report on those metering details?
Yep, sure. So we need to understand, right? Because a lot of data is generated, right? I mean, when customers are using your product. So I'm sure it's not possible to collect all the data in real time. Right? But you need to understand what is that most important piece in your business. I know in my
my work, what is that most important information that needs to be captured in the real time where the customer used it? Okay. We need to show it. But the drill down part, that's where like if customer consumed one credit, but that one credit could be leading to thousand events in the back end. Okay? But you need to understand that collecting thousand events might take, let's say, some time to process through the pipeline, but giving that one unit of that one credit is consumed is might something that you want to deliver immediately. So you need to understand that what is most important to your business in this, I would say, the unit economics.
Different organizations have different currencies like Google as Google credits because I only know about my personal use, I haven't used Google products directly for professional use, but you need to understand that what is the most important unit that needs to be delivered immediately. Not all the data is required at a real time speed. That's what I have learned. But as long as you define that understanding, as long as you define that policies with your businesses,
that's the way to go for it. Okay? Yes. Businesses do understand not everything can be available real time, but they do understand what is the most important thing for them to understand in real time. That can trigger any kind of, like, say, if the product is not adopted at the right run rate or if the product is overused or if there is a possibility that the customer might go into overage in the next two months, they want to know now. So you need to understand that part of your business, what is the most important unit that you need to deliver to business immediately.
And are there any anti patterns that teams need to be thinking about and guarding against as they actually work towards implementing this data ingestion and metering system and how the data that you're pulling in for those operational use cases also needs to be thought about in terms of more analytical processes as well.
Anti patterns help me understand a little bit more about this question. So I'm just wondering, as people are designing the data platform for being able to manage this metering data, what are some of the mistakes or misconceptions that teams have to struggle through to make sure that they're building a system that's able to actually support the business needs and not just be an exercise in, hey, let's do something really fancy, but it's not actually doing what you want.
What I think is there are things that teams should be aware of is
there can be nuances where, you know, like when we deal with the data, right, we always think about that the data will be available the way it is, Right? Take an example. Okay. I'm processing the data from The US region. Right? One thing that people should be aware of is not all the data is generated in US. Okay? And data is generated from different different regions. Okay? So we need to aware of the time zones, first of all. Okay? How the data is coming from a different time zone, that when you are processing the data do you have a reference point for the time zone when you're processing the data? Because
it's all the time zones or collecting the data with a standard unit of time zone that has been agreed by a policy. So that is like a one simple example that I'm taking, but I have seen just a if there is a difference in the time zone unit that you are using, let's say your US is collecting data in a Pacific time zone with a timestamp of Pacific and someone else is collecting, let's say,
Eastern time. Right? So the time zone difference itself in the data can cause either an overlap of the data or either can cause a drop of the data. It can happen. So you need to your policy should be defined very well. Okay? How you are going to collect the data if you have this data generated throughout the day. Okay? That's one thing. Other thing you need to understand, what if the data doesn't comes on time? You are expecting the data to come now, but the data is not available for the next four hours or twenty four hours, the systems are down. What are your procedures? What are your policies in the organization to process that data? If your meeting is not available,
what is your action plan? Do you want to continue serving your customer? Or so all of those entire patterns needs to be understood when you design these data pipelines. Okay? When you design these metering platforms, you need to understand those patterns. You need to have clearly defined policies, okay, for every scenario
that can cause this. Take an example of, like, some systems sending the data multiple times because system failed, they recovered from a point four hours back, they again send the data which they have already sent. Need to have a process in place to identify
that, do a dedo so that you do not reprocess the data. If you reprocess the same data, you might be overbilling a customer or you might be overestimating the usage. So all of those factors needs to be considered when you design a data pipeline.
And are there any particular technologies or frameworks that you have seen be most effective and useful and easy to work with for teams who are building these metering input systems to be able to feed into operational use cases?
So one, generally, it's a decade old concept, I would say, where you bring the data to a data lake, right, as in raw form, and then you transform it, apply the business rules, and then you serve it as a semantic layer, that's called a serving layer. So but then Databricks did a very good, I think they published a paper or something, they called it a medallion architecture, so I like their term, they call it bronze, silver and gold layer.
So bronze layer is within like bring the data as is like a data lake, the data is present in the raw form, and silver layer is where you bring the data, apply the the business rules, transform the data and bring it to like, you know, a near semantic layer, okay.
But then you have a gold layer where you bring the data in the form that you want to serve it to the users. So treat GOLD layer like a semantic layer where your business users can come and do a self serve if they want to generate the data. So it's a more like a more organized
form of the data in the GOLD layer. Let's say Hemant is using product one, product two, product three, product four, product five, a simple view of what products Hemant is using. So that is the golden layer concept. So I like this term like a bronze, silver and gold that was coined by Databricks back in 2020. That's one of the architecture
is what I've learned is working best today and it's not a new concept, they coined it on, but the concept itself has been very old, very old, think decades old. Ever since I've been in the data industry, I've heard about this.
The other piece of using this metering input as the unit of cost for customers is that it has to be highly reliable. And so I'm wondering how you've seen teams deal with outages.
What is what impact does that have in terms of your ability to accurately bill the customer, and what are some of the ways that things like out of order or late arriving data need to be factored into the downstream processing to ensure that you're accurately calculating how much actual usage is happening versus just fuzzy answers?
See, arriving data is inevitable. If you are in the data industry, if you worked in as a data engineer for one year or three months, you will know the first thing is data never comes on time every day. If you have thousands of pipelines built, there will always be one pipeline which is lagging behind. So data, late data is inevitable,
especially with all the monthly billing or external implementation partners involved, you are working with so many different partners to implement your products for your customers, so there are implementation partners also engaging with you. So data can come late, okay, and if especially if you are operating in a different different geographies as I talked about, right, different geography, different time zones, so there is a possibility your customers your same customer has users in US, your same customer has users in India, they are generating data there on speed in their different different time zones, so there can be SLAs defined for data collection in each region, which can impact your usage reporting. You need to factor all of that. The best way to deal is deal with this is to define the data reliability
and data timeliness requirement. Okay? That you define your data reliability procedures.
Okay? Because your data is arriving late. Okay? Do you want to wait or do you want to still go ahead? It's a decision you have to make. And that decision cannot be made manually. You need to have systems make those decisions automatically based on the policy that you have defined. Because if a pipeline has stopped, you cannot stop the whole pipeline. You need to understand what is the impact of this pipeline on the overall process. But there are some critical pipeline, there could be a pipeline which is bringing the customer data in. If it's a customer data, you cannot bring the meter data
before the customer's data, right? So some metering data can wait, but customer record cannot wait. So you need to understand criticality of each pipeline, define procedures, what can wait, what cannot wait because late data will always be there.
Because of the fact that there is potentially complex processing happening on these data assets as they're being ingested and enriched, what does the process look like for somebody to onboard a new product or a new price category for being able to ensure that that is accurately represented without necessarily requiring days or weeks worth of engineering effort.
Understood. So you mean to say, let's take an example, I'm introducing a new product or a new feature, how you're going to manage your systems or define your systems on demand. Generally, the way I have seen is, you know, if we use this analogy of how we use electricity at home, right, in California we have PG and E, right? So the way they measure our consumption is, you know, is Himans charging his Tesla
during this hours or Himans is just using fan at home? So all of this is metered differently, okay? There are different units consumed, right? The concept is not very different in the b to b model. If you are using a different different product,
each product has a different meter ID. That's my way of seeing things. Okay? Each product, each services has a different way of metering ID. Okay? Now you have to define your systems in such a way, right, that on your metering platform or how the the way you are capturing all the usage data, right, because usage data is captured in events, right, whether how many
many rows were processed, okay, number one, or how many API calls customer made. You need to have those dynamics captured somewhere, but all of this has to convert to some unit economics.
And when you want to convert it to unit economics, that's where the metering comes in. Metering transforms, okay, thousand API calls equal to one unit. 10,000 rows equal to one unit. Right? That's where the metering comes in. So whenever you define, introduce new product, or a new meter ID comes into the concept, you need to have a sys more configurable systems, somewhere where you manage this mapping of raw consumption to the unit economics.
It should be a configurable structure. Someone can go and define that economic well, I'm gonna say, the unit economics into the table or file system or whatever you want to call it. Right? So that has to be there. So it has to be more like a declarative or configurable interface, you know, where you say, okay, for this event type, this is a value metric, these are the dimensions,
and it is possible the versions can keep on changing. Earlier, you were saying 10,000 API calls equal to one unit, but now you say, no. The COGS is very high. 5,000 API calls equal to one unit. COGS keeps on changing, so you need to also capture the versioning of your metric. Okay? How the versions are changing. Maybe for the current customers, you want to have version one, but for new customers,
you want to offer version two of your metric. Right? So all of this needs to be has to be configurable. That's why we call it the metering and the rate card goes hand in hand. So rate card is like another aspect of the metering that you need to consider. Okay? How how you are going to measure your customers' consumption. And all of this has to be configurable in the system. You don't need a hard coding type of system, okay, where I say apply a hard coding formula. No. The numbers has to sit somewhere in a parameterized table, which can be used by these systems or the data engineering pipelines.
And then in terms of the actual collection of the events or units within the actual application or service delivery, what are some of the patterns that are necessary to ensure the reliability and accuracy of the capture, not even just the downstream processing? So for instance, are you relying on things like log records from a, you know, from a web proxy? Are you using things such as OpenTelemetry or pre Prometheus metrics for being able to collect those details? Do you have to
write logic into your application that emits a specific event or signal and either sends that via a rest request or logs that into a Kafka queue? Just what are some of those integration patterns at the application and service level that help feed into this metering capability?
I would say it's a combination of things, right? As I said, right? How you want to capture. Right? Because as I said, right, metering has different aspects. One is, are you serving your customers who are buying your products, or is your customer your internal sales team or your internal customer success teams or your product strategy teams, right? You have different different customers, right? So you need to understand all of that, right? How you are going to capture all the metering usage.
Okay? A product could be just generating so much of data throughout the day. It could be publishing events in a queue. Okay? But products are generating lot of events. Okay? So overall, it all depends on different different mechanism. Either you can consume the data published by the products to understand what is customer using. Okay? But then there are a lot of logs also generated by the product. Okay? You can use those logs for your internal analytics purposes.
Because products might be generating publish not publishing all the logs. Okay? Because then it's an overhead on the product. A product out of thousand events, product might be publishing hundred, one hundred fifty most critical events, which is most important to serve your immediate needs, but then there are logs capturing everything, every single action. This customer
navigated through five pages, it publishes, it logs all those five pages when they used your product. So logs has much more details, so you can define different different processes, whether you want to use a push mechanism or a pull mechanism, all of those kind of concepts, right? When product is pushing the data, can just consume it to serve your immediate customers. Okay. Here, one hour ago, you were using five credits, now you're using six credits. Right? To drive that. But if you want the deeper
analytics into the customer's consumption pattern, you need all of those logs, you need to pull in those logs and then build your data warehouse as in the Mediterranean architecture that I talked about, right? You need to bring all of that into your gold layer to understand the customer's consumption very deep, and that is where you need to define that I mean, a lot of time I've seen, like, people do
more in the batch processing when it comes to the actual logs, because processing logs in the real time could be real expensive sometimes, but it all depends on use case. I seen with most of my experience, 80 to 90% of the time the logs are processed in batch, that is the best way to go to manage the cost and all, but it all depends on organizations use case to use case.
And as you have been working in this space and working with businesses that are oriented around this usage based economics, what are some of the most interesting or innovative or unexpected ways that you've seen them implement that metering ingest and reporting capability?
One thing is, you know, I would say when we are in this usage based economics, right? One important thing is why you are doing it, right? You need to ask that basic question because you want to sell your product the better way. That's the overall goal right, I mean so that is where all the sales team and all they come into the picture right, before the renewal or I call it like a nudge, milestone nudge, okay.
How you can nudge that thing, okay, hey, this customer has been consuming our product at a higher run rate. Let's say a customer purchased 1,200
units of the product for twelve months, You would expect that he would consume around 100 units per month because they purchased 1,200 units for twelve months. But let's say customer has been using 80 units initially because it takes some time to scale their architecture. In the month two, they consumed 100, month three, they consumed 110, and then in month four, see suddenly they consume 140 units.
So that's where you think, okay, the customer's run rate is high. If he's consuming at this run rate, instead of using your product in the twelve months, he might end up using a product in the seven months. But you don't want to wait until the seven months to let your business know that this customer would be in overage.
You want to tell them now. You want to predict. Okay? So this is one important thing here. Okay? That's why I call it like a milestone nudge. You want to let your businesses know this customer is using at a higher run rate. There is an 80 to 90% probability this customer is going to hit the overage in the seventh month with a twelve month subscription package. That's one thing I have seen. Similarly on the other side, customer purchased your product but he's not using enough, then there is a possibility you might lose that customer when the renewal comes because he's not using your product. It's a low adoption risk. At the same time, you also want to let your business know this customer has been using the product less,
let's figure out why. So these are the important things, you know, I've seen like a more innovative things happening in the consumption based economy. Businesses know beforehand. Okay? They are not reactive, they are proactive in working with their customers. So these are good things. And then other thing is, you know, we have heard about all of these freemium models. Right? A lot of products are available for free, like if I design my website, yes, first seven days are free,
add your credit card, skip adding your credit card. So these are freemium models, right? Because businesses wants customer to enjoy the products, right? And then the and once customer gets more awareness about your products, once they get and start enjoying your product, then you want to sell your product to them. So these are like freemium models today, right, when they want to use the upper tiers or more features on the product. So all of this, you know, innovation that we have seen in the consumption based pricing, how the freemium models are evolving every day, there are free limits of the product and then to get into the higher band or more features on the product, all of this is coming into the picture. Then one other thing I would also bring in from the Cox perspective,
when you know your customers usage across different different geographies, you have a way to do a capacity planning, you have a way, because if you think, okay, now JNI is demanding more resources, customer is using product more, you can plan your infrastructure that way, can do a capacity planning, looking at the trends on the usage.
In your own experience of working in this space of operationalizing these usage based economics and services, what are some of the most interesting or unexpected or challenging lessons that you learned in the process?
See, one thing is, because as I said, right, consumption is your currency, okay, in the b to b business. So one big lesson is the cultural shift,
okay. Why I say that is, because now your consumption is driving a lot of things. So your finance team, your product team, and then your engineering team. Right? They all should own the metering. Okay? It's not just an engineering function to bring in the data, bring me the metering data I want to drive. No. Metering is everybody's business now. Your finance team is an equal partner in the metering, your product team is equal partner, your engineering team is an equal partner. So this has to be co owned now.
If one of them opts out, you get brittle pricing, you get like a duct tap pipelines, all of that, right? I mean, sometimes escalations. So I think there needs to be a casual shift where finance, product and engineering comes together and own the metering. Because I think technically the hard part is not just about building a data pipeline, right? It's evolving how the value matrix is changing every day, how the hybrid models are coming into the unit economics.
So considering all of that, I think a co ownership of the metering itself is like a one big lesson that everybody should learn in this consumption based economics.
For people who are thinking about business models and the ways that they want to deliver their products, what are the cases where you would say that either a consumption based business model is overly complex or cases where the expense of building a reliable metering system outweighs the potential benefits of the service being provided?
I would say in this like a usage based model is not really applicable to everything. Okay? You need to understand your business economics. Right? I would say it's not a silver bullet, right, that everybody should have usage based pricing. Right? Sometimes when the usage is hard to explain or it's a low frequency usage or if your usage is loosely related, I would say not like that or it's not so much correlated
to the value that you are creating for your customer, right? So in these cases, you don't need to have the usage based pricing, I would say. It can go wrong if you do usage based metering, in those cases it can go wrong and sometimes customer are very strict on budget, So if your customers is very strict on budget, do you still want to sell the usage based
license to them? You have to think about do you want to lose a customer because I have seen even the organizations who are like sometimes fiftyfifty on the usage base, fiftyfifty on the license, right? So it's more about like, you know, how best to position your business, right? Because sometimes the big names have big budget limitations also. So it's always too good to have both the hybrid model when it comes to the pricing actually, selling a license versus selling a usage.
I think a hybrid choice is the best way to go, but selling only the usage based metrics sometimes can go wrong instances.
And what are some of the predictions that you have as far as the future of either business types or service types that would be amenable to this consumption based economics or maybe some evolution in terms of the off the shelf capabilities of being able to just pick up a usage metering system and plug it into your application?
I think the consumption systems or the consumption architectures will be designed more like a financial systems, right? The way of governance we put in tracking every dollar or every cent, right? The same kind of the focus will go on the consumption architectures too, we'll have more governance, more spending around consumption architectures, rather than seeing it like a more like a data pipeline or logging stacks. Organizations are realizing
that tracking the customers consumption is the key to the business. So we'll see more converged usage ledgers type of pattern, type of thinking that can serve billing, analytics, and AI use cases all then all of them together with strong governance and programmability.
And one other thing is more products will go into usage based pricing and more into more businesses are going towards consumption based pricing and I've seen like it's changing every day, like people are coming with different concept like a pay as you go
or a tier based pricing. If you if you think your usage is going to be 100 user, this is the price, but if you think you're less than 10 users, this is a pricing, right? So this is a usage based pricing, I think it's evolving, a lot of businesses are going towards it and I think yeah, that's the that's the predictions I'm seeing, I think more businesses are going here.
And are there any other aspects of your experience working in this space or just the considerations around how to build an effective business model around this consumption based pricing that we didn't discuss yet that you'd like to cover before we close out the show?
I would say one thing is, you know, though we are spending a lot, right, we are moving, I mean, if we go twenty years back, right, people used to follow a waterfall model, there is a project manager, there is a business analyst assigned to it and then like as the technology company started taking over the, a lot of, right in the back 2010s,
right, we started seeing a product thinking getting evolved, right, but now we are, and people started adopting agile a lot, right, now we are into this situation of like a platform thinking, how people think of delivering platforms that can solve the future use cases with less efforts, right. So now we are moving towards this, but I think we in terms of all the data that we are collecting in the usage economics,
one thing is what I still say is we need that thinking of building a truly integrated control plane for the data, where all the contracts, semantics, quality, lineage, cost, policies all come together in a coherent manner to answer the questions. Because I have seen even the best organizations have these gaps of not the policies are sitting somewhere else, it's not 100% integrated with the data ecosystem. So there are gaps still, okay,
and primary reason as I always talk about is the data governance, you need a strong data governance around your data. As long as you can enforce that, you know, you can build a more you can build a more efficient and I would say the more powerful data ecosystem. So I think that control plane of bringing everything together in a one single view is somewhere, you know, can get better. Think that's the gap I have realized.
Alright, well, thank you very much for taking the time today to join me for anybody who wants to get in touch with you and follow along with the work that you're doing. I'll have you add your preferred contact information to the show notes. I appreciate you, sharing your thoughts on these consumption based services and all of the technical complexity that comes along with it. So thank you for that, and I hope you enjoy the rest of your day. Thank you, Tobias. Thank you for having me.
Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used. And the AI Engineering Podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com
with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
