Inna Tokarev Sela on Approaching Data Challenges with Generative AI

⁠¶ Ina Tokarav Sala: CEO of Illumix, AI readiness pioneer.

00:00

Welcome back to Data Driven, the podcast that peeks into the rapidly evolving worlds of data science, artificial intelligence, and the underlying magic of data engineering. Today's guest is someone who's redefining the rules of the game in AI and data, Ina Tokarev Saale. She's the CEO and founder of Illumix, a company pioneering the use of generative semantic

00:22

fabric to make organizations AI ready. We'll dig into how Ina's background as a frustrated data user sparked her innovative journey, why 80% of enterprise decisions still aren't data driven, and her bold vision for a future with app free workspaces where AI copilots handle the heavy lifting. Oh, and we're tackling the ultimate question. If the future is already here,

00:44

why does it still feel so delightfully chaotic? Sit back, grab your favorite coffee mug, or a Maryland state flag one if you're feeling fancy, and let's dive in. Alright. Hello, and welcome back to Data Driven, the podcast where we explore the emergent fields of data science, artificial intelligence, and, of course, it's all made possible by data engineering. And with me today is my most favoritest data engineer in the world, Andy Leonard. How's it going, Andy? It's going well,

01:12

Frank. It always warms my heart when you introduce me like that. Well, you are my most favorite data engineer. Well, that's cool. You're well, you're my most favoritest. I like, there's so many things. Right? Data scientist, developer, evangelist. I mean, there's all sorts of cool things that you do. Super, certified person. What are you up to in certifies in certification? 12. Wow. Yeah. I'm in I'm in the New York City area code now. So that's good. Next

01:42

up, the Bronx area code 718. So Wow. That's a big jump. Yeah. Yeah. We're we're working on we're working on it, and I'm at 760 some odd consecutive days. I'm at the point now where when I post anything on about Pluralsight or, my number, the search or the number of days, Pluralsight always sends me a congratulations, Frank. Keep going. So, like, I'm on their radar now. So which is really nice. I don't know. It's super cool. Yeah. It is super cool, which reminds me

02:10

I still have to do 2 days. But in the virtual green room, we were talking about coffee mugs. We were. And, we're we're I don't have a coffee mug with me today, but, there's an interesting anecdote from a previous show, which I think the show is live now, about the Maryland state flag coffee mug, which is, pretty funny. So today we have with us a very special guest,

02:35

Ina Tokarav Sala. She's the CEO and founder of Illumix, and a pioneer of generative semantic fabric, which I wanna know more about that, but it empowers organizations with AI readiness throughout her career leading data products, monetization, and as a data stakeholder. Ina recognized the oxymoron of our domain. Despite huge investments in data and analytics, most business decisions are still not based on these data or insights. And when I read that, I felt that one.

03:11

So she, she works she founded this company, Lumix, which is, the the byline says, get your organization data generative AI ready. So So welcome to the show, Ina. And, tell us about this. Like, because I think this is a big problem with generative AI. Well, first off, let's tackle the big one, which is the idea that despite all this money that's been thrown at data and analytics for at least 2 decades, probably longer, a lot of decisions are not data driven. Yeah. Fine. Can you hear me? Because

03:48

I see a little bit Yeah. We can hear you. Okay. So yeah. Thank you. You're totally right. The the benchmark says only 20% of decision making in enterprise is based on data. And to me, I I have been around for a while. So 25 years in data analytics, and it was always about cloud, big data. But what it actually boils down to? Are you able to pull out whatever analysis of data you need when you have, like, question on hand? Not really. And this is a situation in majority

04:22

of enterprises, right? Even if those huge data teams and huge investments in infrastructure and all of that. And to me, the biggest promise of of LLMs in enterprise setting is to to bring the contextual and relevant data to the stakeholders in need. Right? In this experience which is impromptu which means it's improvised, it's governed and hallucination free, it's

04:52

transparent. So I I would totally love have to have this experience where I'm in my Slack or Teams, right, and I've been able to to chat with my data copilot and ask a question and get the answer I can base decision happen. Right? Not just an answer. I should be reverse engineering with, you know, bunch of people. Interesting. Interesting. But I don't think that I think that the companies, they they they they throw a lot of data. They store a lot of data. They

05:26

analyze a lot of data. But a lot of at the end of the day, not all decisions, but a lot of decisions are not based on just the direct decision of the data. They're based on quite frankly a lot of it's particularly the higher the, higher the level. Sometimes it's based on what's good for the person, not necessarily the organization or the business, let alone the customer. Do you think what are your thoughts on that? I'm familiar with the saying,

05:51

if you touch your data long enough, it will confess. That's right. It goes exactly to the domain.

⁠¶ ROI and data are crucial for decisions.

05:59

So I guess you can you can massage the results

06:03

right? But, secondhandly, when an employee comes to me with suggestion with a business plan with, you know some project I always ask like what's the ROI like what's it going to be to spend and what's the impact on on you know other activities and and what it's going to be on expense of so having numbers having data to you know to the basic decision or to bring to your boss is always has been a struggle and it's still struggle today so I think it overweights maybe some you know,

06:36

reluctance to have open data for all just for the sake of of being able to to have specific context on it. Interesting. That that is very interesting. And, you know, that I think that's been the the purpose of a lot of data driven activities in in corporations globally is, you know, and for a very long time is how do you convert data in its raw natural form into

07:05

information? Mhmm. And, you know, and and defining information as, something I can glance at and know, you know, almost instantly how my enterprise is performing. And that was kind of my opening line 20 years ago when I started in data warehousing is to go talk to a decision maker, CIO, CEO, and, you know, try and do a very small, project, a phase 0. And just ask them that, how do you know?

07:39

And the surprising answer, yeah, even then it was surprising, was, you know, something along the lines of, well, people email, information to a lady out front or a secretary assistant guy out front, and he or she compiles it and puts it into this summary, and then they tell me. And so, you know, 1 PM every day or, you know, Monday on 1 PM. I know how we did last week. Something like that. It's very manual processes. So does does Illumix, address that? The

08:18

manual part? Yeah. Yeah. Totally. So I don't think reports will go anywhere, but I think we'll have, you know, at least 3 types of experience with data. So I do I do believe in application free future where you have a question or a task and then you have a launcher and you just, you know, articulate whatever request you have. And in the background whatever applications, workloads, and data have been engaged with each other to to basically come up with the

08:51

results. Right? So I do believe in this future. Right? So this is the ultimate. Right? But I think we will have this intermediate

⁠¶ Intermediate stage: copilots, insights, static dashboards persist.

08:59

stage where we'll have a lot of copilots or assisted insights in, in the context of applications you're already using. So using your CRM systems, you will have all kind of insights, suggestions, you know, data driven, actions which which might come up with the system in your

09:19

workflow inside your context. Right? And you might have to have this pure experience when you do go to analytic systems like BI or something else where you do have your static dashboards, day after day, same way that I go to, you know, to to my CRM dashboards and see how pipeline is going and all of that. So I do not them need to them to change. Right? I don't want to go to some chatbot and and ask again and again the same question, like, what's the pipeline

09:45

conversion today? Right? I do want to have those static dashboards where I just, you know, sneak peek and see if everything in line and we we in the benchmark. So those three types of experiences, I do not think they're going to to evaporate in the future. Right now, we are mostly bound to the last type of experience of being in the closed garden of our BI tools, like this 3 modeled analytic experience and then we'll have this

10:12

phase where we do have embedded experience. Majority of the companies are already suggesting some kind of improvements in the space, some better, some halfway, let's say. And and the ultimate goal is to to have this launcher when for for majority of ad hoc task of questions, you will have this improvised experience. So a follow-up on that. You mentioned Copilot, and, Microsoft has been the company that I've heard using that term most

10:41

often for some sort of digital assistance. It to me, outsider looking in, although I I use the tools, it it seems to have been a quantum leap, this year in that technology. It just seems like last year, they were talking about things that it might help with, and I've seen all sorts of examples of this. But have you seen that? Has that been your experience that in the last 12 months, these type of assistants have just, you know, taken a giant step forward?

11:11

Mhmm. I will address this question together with the previous one, like, how

⁠¶ Illumax targets structured data market, unlike others.

11:15

Illumax is is positioned in in this context. So I do see many projects in the companies which, and mainly, they're providing copilots, for call centers or support centers and mainly based on document summarization. Right? So document summary is more, lightweight and and risk averse use of LLM technology where I can actually go and check the document itself based on the resource. Right? So it's kind of and documents are already articulated with lots of context in

11:54

business language. So it's kind of low hanging fruit and majority of the companies go to the direction including, Microsoft. Where Elamax goes Elamax actually, tackles the market which is less, less digested, the market of structured data. So you mentioned you started your career in warehouse and, so warehouses, databases, data lakes, business applications such as supply chain, ARP, CRM, and all of that. All of that

12:24

con defined as structured data space. And despite the name, it couldn't be less structured than it is at the moment. Right? So you have If it is structured, it's not structured the way you need it. Yeah. Exactly. So the nay namings are not meaningful, like abbreviations, frank table, or for like abbreviations, the, frank table or and this transformation or alias. Right? So all those weird names especially under SAP systems. I love that and and no

12:52

single source of truth. Right? In documents, you might have versions, but you do still have some alignment to single source of truth. In data, you can have many definitions even in the same data source. And the thing is, if you put semantic models like semantic search on top of them and it works by proximity, you might have hallucinations and random answers every time you engage

13:15

with the tool. So this this is where we chose with Illumix to to tackle the problem as, basically, defining as a 3 step approach. Right? The first step is getting data AI ready. So there is no yeah. There is no way of using generative I or AI analytics in general if you do not have other data. But for analytics, which is served to you as BI dashboard, it's actually feasible to do manual data massaging. Right? Well, fun. Yeah. Yeah. That's fun. That's near and dear to my heart as a as a data

13:51

engineer, data quality. Because you can have the, you know, the fastest, best presentation, the slickest graphics, and it could be totally lying to you. And back, you know, even from the days of of data warehousing all the way through today's semantic models and dashboards, it's a the the quality of the data store you're reporting against, That that data quality, if you were to measure it, you know, there's a number of ways to do it. But it's well north of

14:25

99% of that. And people see that, and they go, wow.

⁠¶ Bad data skews predictive analytics, causing errors.

14:29

That that's super good. And it's like, no. No. It didn't. You can't do predictive analytics off of something that's 99% because that that 1% of bad data or incorrect data or duplicate data will skew your results. And what often, you know, the the layperson doesn't understand is that if it lies to you and tells you you're gonna make a $1,000,000,000, that's just as bad as it telling you you're only gonna make a $1,000,000 if the if the truth is you're gonna you're at about 25,000,000.

15:01

That's your real projection if you were to follow that line out and do the extrapolation, you know, properly. And you can make bad decisions with an overestimation just as easily, maybe more so than if it's an underestimation. Yeah. Exactly. So this goes to to, to the ground truth of your results as good as your data is. And you cannot trust, simple semantic search

15:27

to solve all these problems for you. And so for us, the baseline, the first use case is to get data AI ready or generative AI ready And we do use generative AI for that from day 1. We actually generated company from 2021. Yeah. It's funny to say now. It it was very hard to explain to our investors back then what it actually means. Yeah. You know, I I get it. I mean, if you build on a crooked foundation, you you can't get anything straight, you know,

15:57

out of that. So that makes perfect sense to me. And it and, please correct me if I'm mischaracterizing, the work that Illumix does. But is it automated, AI automated, data quality? Is that really what you're after? So, basically, we automated full stack of LLM deployment for structured data, and it takes the AI readiness part. AI readiness, which means we have automated reconciliation, labeling, sensitivity tagging Okay. Like lots of lots of data preparation which is automated.

16:32

Gartner actually named us as a call vendor for that lately. We have this layer of a context automation. Right? So so any LLM, any semantic model needs context and this context and reasoning usually rebuild by data scientists. To me, it's controversial because, you know we had data modelers which didn't understand business logic and now we have data scientists who do not necessarily fully understand business logic and the model into black box experience of context. Right? So ElamX

17:01

reverses process. We actually automate context and we wrap it up in augmented governance workflow so business people or governance folks can actually certify it. So it's auto generated context for LLMs but certifiable by humans. We do believe that we need to bring human to the loop, right, to to certify it. Yeah. And the last I love I'm sorry. I have interrupted you, like, 3 times now, and I apologize. I haven't met 2. I thought you paused. So finish please finish your thought.

17:31

No. No. I'm saying, like, 3 parts. So you already did data governance and the actual alarm deployment because you need to interact with the whole thing, and the interaction to have to has to be explainable and transparent. You need to understand how, especially on structured data, you need to understand how the question was calculated based, sorry, how answer was calculated based on questions and how, data was actually sourced, what's the lineage, what is the governance and access

17:57

control through search your clients. So all of that should be on the interaction layer. So AI readiness, governance, and the interaction layer explainability to the end user. Absolutely. Okay. Thanks. And I do apologize again for the interruption. So my my characterization of it as something that's just data quality is is way low. There's a little bit of overlap between

18:20

data quality and what you're describing. You're talking about taking this into that next level that is specific to, generative AI and perhaps other, you know, AI related, AI adjacent technologies, machine learning leaps to mind and stuff like that. But your the tagging, the categorizing, and all of the things you're describing there, that is next level. And it's very interesting to me that you're using AI to get data ready for AI.

18:51

That's an interesting combination. Mhmm. It makes sense, though. Right? You can kinda scale out human capability with AI. I think that's you you kind of alluded that with Newman in the loop. Right? Like, I think I think where you were kinda going with that, again, don't wanna speak for you, but it's like the idea that AI isn't gonna replace humans. It's just gonna make humans more productive. Yeah. For sure. Augment us because frankly speaking, no one wants to to model data, you know, as their

19:20

career. We want to solve problems. Right? And to solve problems, we we have to to understand what the problems are And letting AI to surface the problems as alerts and for us to to resolve them as conflicts takes, you know, 1% to 10% of the time that it should take, where we are busy, you know, wrangling data still. And, you know, it's sad to some extent because data is growing and we cannot keep up.

⁠¶ Data modeling efficiency increases with virtual assistants.

19:48

No. That's a good point. Even if even if there are people out there and some of our listeners may really do like modeling data. Right? But, you know, Dow, they can model about 10 times the amount of data or maybe a 100 times more. Right? And then ultimately, the expectation of what a, you know, what a person can do in a set period of time is gonna go up just

20:09

because I I I think I think you're on to something there. Plus, I also I would also, like, double click on the idea that you said earlier, which I think was very intriguing, was this notion of a lot of the apps that you use would kind of fade away. You just have this virtual assistant. You know, I I think back to there's a number of scenes in, you know, Star Trek The Next Generation where they have a conversation with the computer. Right? Mhmm. You know, you they

20:33

don't they use the computer. They get stuff done. There's no Microsoft Word. There's no PowerPoint. Right? Like, there's no, like, it's just the the there is no application. The application is kind of invisible. It becomes the computer. And I think that's a very intriguing kind of way. And if you had told me that a year ago, I would have been very skeptical. Now I look at it, I'm like, I mean, it's it's it's almost inevitable. Yeah. Yeah. I agree with you. Futures here,

21:02

it's not evenly distributed as people say. So I guess, you know, when you're attending conferences in Bay Area, it's already it's already here. It happens. Right and when you go to let's say Europe we even just say you know just say a EU act in Europe is is ramping up so it's all about controls and and this is great So I do not think that regulation and innovation, actually, jeopardize each other. I think they should go hand by hand and, that's where I see

21:36

industry is going. So so East Coast approach, majority of our customers are coming from East Coast US, Pharma, financial services, insurance, highly regulated data intensive companies. They have now, sometimes even inventing standards for generative AI implementations because everything is so new but companies want to go fast. Right? So no one wants to to downplay risks on one hand. On the other hand, everyone want to, you know, to implement generative AI

22:10

and see the productivity cuts. It's, you know, it's evident productivity cuts are already here with all those co pilots summarization, what have you and this is where we are today. So I think like again Bay Area running fast and east is coming up with regulation. We will meet somewhere in between. I believe in both. Well, if you kind of,

⁠¶ E-commerce evolution: convenient online shopping preferred.

22:33

like, look at, like, historically, you know, when .coms first started, right, there were a number of, hey. Look. You know, we're gonna sell pet food online. Right? Like, and then it was like, back in the dial up days, it didn't really make a lot of sense. So it would just be easier for me to go to the store. Whereas now, I mean, if you think about ecommerce, obviously, Amazon is the £2,000,000,000 gorilla in the

22:59

room. I like, do I really wanna think about, you know, dealing particularly as we get into the holiday season, do I really wanna deal with the traffic at the mall or the store when I can just click on something, either have, you know, groceries delivered or, you know, I'm I'm okay waiting 2 days for something to come up if I don't have to deal with them all. Yeah. Totally. What's what's the difference between Black Friday and Cyber Monday? No. It's not. Right? Like not really. Yeah.

23:28

Yeah. So it's like Not anymore. I remember Yeah. You know? So we're recording this just before Black Friday. And, you know, this whole idea of, you know, going to the store, get the best deals, it's like, do I really wanna deal with the crowd? No. Yeah. Although ironically, the name for the podcast came on a Black Friday, while I was at a Dunkin' Donuts, drinking coffee, waiting waiting in line actually to get so there's a I'm a Krispy Kreme person. So I'm Ah, okay. Yeah. So With you and

24:03

I, right, definitely. Right here. This is before we had a Krispy Kreme near us. So it's I I have split sides, but yeah. Yeah. Jeff's JT. He's a mess. From up north. So they are they're Dunkin' Donuts. I've noticed this. They're Dunkin' Donuts, like, north of Virginia. And he's in Maryland. I'm in Virginia. Then down south, you rarely see a Dunkin' Donuts. I see more Dunkin' Donuts down south than Krispy Kreme's up north, though, for sure. Yeah. But

24:28

I They're they're from Boston. That's why. Yeah. Oh, that's why. And then So at Krispy Kreme's from Atlanta. And plus, it's funny. Right? Like, so I live in Maryland Mhmm. Which depending on who whom you ask is either north or south. So that's right. That's true. Interesting. Interesting. We're a quarter state for sure. Yeah. That that's that goes safe for Virginia. But I wanted to follow-up on, you know, you've been we've been talking about all the cool stuff. I'm

24:55

gonna try and say this correctly. Illumix. Is that correct? Am I getting it right? So Illumix name from Illuminating the Dark Side of Organizational Data. Illuminate like illuminate. Illuminate. I like that. And x x for the x factor. Excellent. X for the x factor. Yeah. What? And I'm not asking you to I'll just ask a question. What are the risks in in what you're doing? And, you know, what are the risks you're aware of and how are you addressing those? Yeah.

⁠¶ 2025's biggest risk: High generative AI costs.

25:28

So I think the biggest risk of 2025 is going to be, a TCO, total cost of ownership. So already today, it's, it's very hard for organizations to to monitor where the generative AI tokens are spent. And the benchmark say that 80% of LLM tokens actually spend on customization of off the shelf models. And that's not a good news because which means ROI is is pretty low on on the actual production use of generative AI in in enterprise.

26:05

And I think it doesn't get any better because the customizations techniques which are used today gains a black box performed by super expensive data scientists and they're not very scalable for data that you don't want to, you know, to schmooze around. I think it's cost prohibitive actually to bring data to AI. You need to bring AI to data. So so putting data in some graph structures for graph, frog, and all of that, it's to me,

26:32

it's cost prohibitive. So this is why I think that, the Telumex position for 2025 is actually favorable because we bring this transparency. We do create this, a virtual, a semantic knowledge graph, which is transparent to certify, which is created for business people. Based on business logic. We do use extensively industry ontologies and so on so forth. And I think the the most interesting part about generative AI is we do not necessarily going to mimic processes that

27:02

the humans performed. Mhmm. We're going to invent those processes. Right? So new new processes and new workflows. So

⁠¶ Focus on domain knowledge and metadata utilization.

27:09

right now, a generative AI is deployed like like analytics is deployed, which means you you have to label your data, check the quality, usually manually, and then you have to to prepare the test set which is fed into customization of the model and then you actually provide the context to on every question. So this is very old fashioned or, you know, 40 years old machine learning technique to to actually train generative

27:39

vi. So this is why why I'm saying that, many companies are probably going to to mimic what Equinox does in the sense that you have to you have to be focused on domain specific knowledge, reason, ontologies, and knowledge graphs. You have to onboard your customers automatically via metadata because metadata has the factor all activities in organization documented for us. We're

28:04

just under utilizing them, right? And then you bring your business people, your domain experts, your governance teams to the loop because you can simply cannot bring this business acumen, to, you know, to data. You have to bring data to to those people. That's an interesting thing because I've seen the the particularly is this this this statistic around 80% of the tokens are being used to

28:27

manipulate the data. I have a microcosm example of that where I use AI to augment my blog post, my blog that I create, and I finally took a closer look at this because I was spending a lot more on the OpenAI API than I really wanted to. And I'm like, well, what exactly am I I'm using a product called Fabric. And I'm like, wait, what exactly is the source of this prompt? And I look at it, and I'm like, I can't. It's a lot. It's a long prompt. And

28:58

I'm like, I really don't need that. Right? So we are gonna do a deep dive in a show on Fabric at some point. Not not the Fabric Andy works with, but there's an open source thing called fabric. There's a I'm sure there are lawyers right now that are doing their holiday shopping based on how much money they're gonna make off of this dispute. But, the the short of it is, like, I realized, like, well, no wonder why I spent so much money. I'm sending all

29:22

of this in my prompt plus the content. So I actually in the verse before you joined in, Andy and I were talking, and I was like, I actually got a really good result based on a more optimized prompt. You know? And, you know, strictly speaking, it's not I I like your approach of bringing the AI to the data rather than bringing the data to the AI because that is expensive. You know, I I think that bringing the AI to the data will be less

29:47

expensive. How less, I think, remains to be seen. But I like that approach, right? Because that's typically what we've done, you know, and we've seen huge upsides to that, whether it's from Hadoop bringing the compute to the data rather than vice versa. I like that approach. And it's backed by historical precedent. Right? So it's not completely gonna be this crazy idea. It's just a very sensible idea. Yeah. Yeah. I believe the future was already

30:12

invented. Right? So it's just the inclination of technologies we already have. It's been healthy about it. So, we had machine learning practices which are very healthy like feature exploration, feature definitions and then we had neural net brute force and then majority of companies used combination of both, right, to to to be optimized. This is what I think what's happening with

30:34

generative AI. So this, you know, wild west of brute force or great spend is going to be replaced by methods which have, like, this automated context filtering or pre processing and then use like fraction of your budget to to actually run the query. Yeah. I remember hearing about a lot of this in the late nineties. And, I worked for a company who was a big SAP shop. I see you have a history with SAP. Yeah. And

31:01

this lady and and and so we were an we were the IT department. So we were in the basement, but the analytics team back then was in a closed in space inside the basement. So it was like even more like, you know, I was the web developer, so I didn't have a window, but I could see the window about 50 feet away. But, like, when you when when you went into this, like, you know, further enclosed space deeper into the the the the the depths of the IT department,

31:31

there was the database team. And and and and in the back of that area was the analytics group. And I remember this lady telling me that she was working with these things called OLAP cubes. Oh, wow.

⁠¶ Predicting patterns is profound, not crazy.

31:44

Yeah. And I was like, what is that? And then she went on this thing and, you know, I'm remembering a conversation, oh my god, almost 30 years ago. But I just remember walking away with, like, that sounds either crazy because she's talking about, like, you know, figuring out patterns. Right? So, you know, will rainfall patterns in Australia affect not just the agricultural side of the chemical business, but also the plastics purchasing versus rainfall in the Amazon versus this and all of

32:14

that? And I just remember walking away from that conversation as I as I as I as I leave the depths of the IT department back to my normal kinda, basement. Back to the regular basement from the sub basement. I remember thinking that is either the craziest thing I ever heard or the most profound thing I ever heard, which now with the, hindsight of time, it turns out it was the most profound thing. Yeah. You you can think about it as semantic layers of, you know, that era. Right?

32:44

Mhmm. Right. And I think You know go ahead. I'm sorry. Sorry. I think it's delayed between the between the connection. So I think around the same time I was doing my bachelor and my project was about multi dimensional theory. So multi dimensional geometry, of these neural nets. So basically, you model neural nets as multi dimensional graph and it does operational research calculations. So it's exactly the same. You you model your universe in a

33:15

graph. Back then it wasn't MATLAB. We didn't have any, you know, neural nets Right. Structures or graph structures and so you're modeling in MATLAB in this weird language, a graph which has a neural nets on there. And this is exactly like modeling all of cubes. Right? A multidimensional representation of your reality. Now, unfortunately, we have a new technologies which, which are semantic and context. Right? Large language models and graphs, which do the same thing but much

33:48

more efficiently. Yeah. So this is amazing. Like, I think it goes back to what you said. You know, The future's already here. It's just not widely distributed yet, which I think is a William Gibson quote, or is it a Esther Dyson quote? I forgot. But it's one of those 2 kinda luminaries. Yep. You you said what I was going to say, you know, and it was, you know, more of what off of what Frank said is it turns out that we're just doing more nodal analysis and vector

34:22

geometry as a result of that. That's it did all start with multidimensional and and grow from there. And that's where these algorithms, like nearest neighbor originated, was in that math. So Yeah. Yeah. Great minds. Exactly. Exactly. Alike. Exactly. Now you're complimenting me. Thank you. I I feel I feel better when smart people in the room agree with me. No. I'm on the right path. You know, I employ

34:56

millennials. So so having people with experience in multidimensional geometry and all of cubes, it's just a miracle to me to to start with. You know? People now like Python, neural nets, we do actually, the average age in in in Lumex is around 35, 37, something like that. So we do have like also pretty experienced folks, you know, but new talent, they, they they're not familiar with all all of that.

35:22

And I think it's actually a disadvantage because, when when you do know different patterns in architecture Yeah. You can model them with new technology. Right? Make them more efficient, but you already know what works and what doesn't, and it helps. That yeah. That's a great point. The old experience, you know, the experience that we have from doing this for decades is that we see the patterns that have

35:47

repeated over time, architectural patterns and design patterns. And, you know, and we know that they've I I love that how you said that. The, you know, the future's already been invented. We we realize that if we reapply some of these patterns, that there are use cases for them, not just now, but also in the future. So totally get you.

⁠¶ Industry trends are cyclical, like fashion trends.

36:09

Too, you know, like, you know, it it is painful to think that, you know, we've been in this industry for decades. Right? It is a little hurts a little bit. But, like, also, if you're listening to this, you've not been in the industry for decades, and you're thinking like, woah. You know, what are these what are these old geezers now? I would point out when I was a young kid in the industry and, you know, client server was like the new hotness. Right?

36:39

And, you know, the whole notion of going back to, you know, cloud and and and and and, you know, terminal and an old mainframe geezer basically said to me, like, this is just this industry has a cycles. Right? It's like the fashion industry. This goes in style. This goes out style. And it was like, I had that moment of, like, wait. I think he's on to something, but he's just an old geezer, so I won't listen. So, you know, so so if you are a young buck, like, or,

37:11

buck is a male deer, right? What would be a Yes. A doe. A young doe. So if you're a young buck or a young doe, I grew up in New York City. So all of this wildlife thing is brand new. I'm here for you. I'm here for you, Frank. So, you know, listen to, like, some of the things that these, you know, more experienced colleagues will say. Yeah. You know, if you don't believe it right away, just put it on the shelf in your mind because you're gonna need it later. It'll come up at some point.

37:40

And it's like, if you look at kind of, you know, everybody ran to the cloud. Right? And cloud is effectively like a mainframe effectively. Right? The same philosophy. Right? Centralized

⁠¶ Repatriating data due to AI cost efficiency.

37:51

computing somewhere else. Right? And then your browsers become the terminals, terminals with fancy graphics, but terminals nonetheless. Now I think you're gonna start seeing it kind of we're about due for a seismic shift backwards, right, as people kinda move repatriate data and things like that. Particularly, I think driven by AI because of the cost of some of this. You know, I had this debate,

38:14

you know, the other day. It was like, you know, if if one of these super clusters with, you know, a 100, 8 100, all of this, if it costs, say, $500,000, right, I could probably do the math, and that probably means about, you know, there's a certain break even point, and it's probably after about 7 or 8 fine tunings or full on trainings where it's just cheaper to have it. Just buy it. Yeah. Yeah. Yeah. Totally on that. And also, you

38:45

know, salary skills are the most expensive part. So you want to spend it on your business specific problems and not generic problems you can solve with software. Right? So it's always like that. Yeah. Yeah. So, I do think that, basically capacity to process data is is going to be a challenge. Right? And this is why we see that, that majority of, of I would even say countries not only specific enterprises, kind of gear up with, with GPUs, FPGAs,

39:21

whatever hardware you have. Right? So do you see it in middle east, in emirates? They they have national generative vi grid and they're building it for, you know, not only government companies but also private companies. We see the same in Europe and I would assume, you know, US based telcos are going to to provide those data centers with GPU soon enough, right, for, you know, for everyone to purchase as an alternative to the public cloud. Yes. And we'll

39:50

see it. So this is for starters. And second one, the second part where you don't need, this, you know, heavy machinery, you might just have your variables processing parts of whatever generated AI on your end before sending to the cloud because you do not necessarily need to to process everything in a central manner. We basically have pretty powerful machines on our hands or in our hand, you know, as glasses as well. We can see that, and it's

40:21

going to be part of the processing. So the processing is going to be distributed. You bring AI to your data, where your data is. You do not shift your data all the time. It's not, it's not cheap anymore. And we'll have this, as you mentioned, those central repositories of mass processing and those distributed powerhouses which are small enough to to process data on on edge.

⁠¶ Data processing everywhere raises security concerns.

40:47

I think you're right. I think you're gonna see a set of data being processed in one place. I think it's gonna be everywhere. There's gonna be some and and I think that that introduces some interesting, consequences. Right? So my wife works in IT security, and I can immediately hear her voice in the back of my head. Contrary to what you think, ladies, we do

41:05

listen. We just don't always pay attention. But I can hear her like, well, if compute's happening everywhere, gee, couldn't like that be poisoned anywhere. Right? I think I think that's going to be the next kind of thing. Right? It's and it's again, it's a pattern. Right? Advancement. Bad actors take advantage for that. Problem happens. And then then that's the new thing. Right? So it's almost like you're you're building like a, like a like a like a layer cake. Right? Like, you know, the cake

41:33

goes down then the frosting. The cake is the innovation. The frosting is security, and then so on and so on. So Yeah. Yeah. Yeah. So it basically back to the semantics. What we started is semantic ontology as a baseline for generative AI. It has multiple benefits. Single source of truth, of course, has the benefits for accuracy. But also, if you're passing every question to this semantic ontology context, it's almost impossible to poison it because we're going to either

42:03

match to part of your logic or Right. Right. We're going to miss. So it's it's another layer of security if you think about it. So, so yeah. That's an interesting point. All new. Yeah. All new ontology, all new semantics have governance meaning, it has accuracy meaning, it has also security meaning. And also if you want to have single source of truth you have to to have means to distribute it to those edge devices or to to bring it back to central location and without ontologies, without

42:37

semantic layers, simply it's impossible to do that. I was gonna say, like, the the the infrastructure, not just the computer infrastructure, but the logical infrastructure to distribute this stuff, it's probably not a trivial problem. That's the first thing that popped in my mind. I was like, you know, like, oh, yeah. You're right about the distributed activity on this data, but, wow, what does that

42:59

look like? What do updates look like? Like, the whole like, it's a it sounds like a growth industry to me. Definitely. Yeah. Yeah. I don't it's, it's what we call, engineering problem. Right? So creating ontology is data science or generative AI problem, but distributing it, maintaining it, thinking it's its engineering problem. Engineering problems tend to to have engineering solutions. Oh, Oh, that's a good point. That's a good way to look at it. I like that.

43:27

I like that. So did you wanna do the, premade questions? Because we haven't we've gone a few shows without them. If you're okay with those, Ina, we can we can ask them. If not, that's fine too. Of course. Yeah. Sure. Mhmm. So they're not they're not complicated. They're more kinda just general questions. I pasted them in the chat. But the first question and and you've had a a pretty significant career with SAP and and before that. How'd you find your way into this space? Did you find data or did

43:57

data find you? I found my way to data by being frustrated user. Right? So I started in engineering and it was evident to me that using data as engineer is not enough. You have to go to data management. You have to fix those things because otherwise I will I will going to be frustrated for the end of my life. Right? So I went to data management analytics to to solve the problem and I discovered that, as you mentioned, every experience

44:30

has a footprint. So my experience with graphs and with operational research and multidimensional geometry and all of that is so useful for data management. And it was actually exhilarating. That's true. Like and I like that because, like, every experience does leave a footprint. Like, you know, that that's cool. I'm gonna I'm gonna pull that out as a special quote for the episode. That's a great quote. Yeah. So our next question why we do these? Yeah. Is what's your favorite part of your

44:58

current gig? My favorite part of being a

⁠¶ Founder freedom: Experimentation unlike SAP's structure.

45:01

founder is is unlimited ability of experimentation, right? So majority of my day actually say no to things, not to experiment, which is which is hard, which is not fun part, right? But, still, we can make decisions and we can do new stuff every day. So as a founder, it's been very, very different than enterprise setting. And don't don't take

45:32

me wrong. Like, SAP is a huge place of growth and had very, fulfilling career at SAP, you know, building stuff, founding p and l's, running big organizations, but but been able to to actually, you know, start anything new. And, like, right now, we have this customer and they want to to try Illumax on in parallel on the newest, you know, newest BI tool with semantic layer or and on the oldest warehouse on premise at once. I'm like, okay. Challenge accepted.

46:05

Yeah. And next Wow. Yeah. And next day, you know, engineer comes with we have this academic data set and they have these benchmarks. Let's beat them. I'm like, yeah, let's do it. It could be cool stuff. Right? Lovely. So, you know, you know, it's to some extent, so we don't need to justify it, you know, business wise and but but in majority of cases, we can. Cool. We have a couple of complete the sentences. When I'm not working, I

46:31

enjoy blank. I used to enjoy doing jogging and yoga when I'm not working. Right? So right now when I'm not working which means when I'm not traveling I just spend time with my family. Whatever is the plan for the weekend if it's just you know Netflixing, or cooking or hiking whatever is the plan I just join So sometimes just, you know, plan it. But spending time with my family has become, indulgence and I'm

47:02

very focused on that. Cool. Very cool. Our next is I think the coolest thing in technology today is blank. I think the coolest tech is thing right now is not in tech. It's actually the pull from CEOs of companies for technology. This is something which didn't experience for decades. So we were pushing cloud and big data and machine learning and deep learning. We were explaining to business stakeholders why do they need that. Mhmm.

47:31

And now, so you're all coming and saying, okay, I want to have chatbot experience for x y that, so just build it. This is actually I think this is the coolest part because it's kind of a removes majority of the friction that we had to to deploy technology in the past. Interesting. On our 3rd and final complete the sentence, I look forward to the day when I can use technology to blank.

48:00

So many things. You know, travel has been so frustrating lately, and, I don't think what happened because it's like kind of technology goes forward but airline, you know, travel technology, hospitality technology in general, I don't feel it bridges a gap. So I really look forward to the future where I can just have this comment, this prompt of plan, this conference in Dallas on x and the system already knows all by preferences and just done. Oh, boy. It would be it would be fantastic.

48:38

Yeah. That that the travel experience as I I've had to travel quite a bit, like, for the past, like, couple months, and it's just like, oh my god. Like, it never was great, but awful is not a word I remember. But it's post pandemic, I think it's gotten way worse. It's like there's just so many small things that you could be done a lot better. I'm I'm a 100% with you on that one. So true. So our our next question is to, ask you to share something different about yourself.

⁠¶ I'm considered controversial for being very visionary.

49:11

Sharing something different about myself. I think I'm a controversial person in general. So, so some people, so some people agree with, you know, with the degree of, of living in the future. So I, I, you know take myself as person who is very much in the future so all this seed happening and I might be a little bit you know ahead because I see the technology being developed in my mind is already there, it's already

49:38

used right? So and so where this is where I see myself controversial because you know in majority of the cases, then you sit over family dinner and say, you know, we're still paying our bills online when we have this notification. Right? So everyday technology has developed a lot. And when I'm speaking about this application free future and, you know,

50:08

automated, x y zed. Sometimes or many oftentimes on everyday level, we are still not there and this is where people think that I'm too visionary or too too dreamer on that. Interesting. No. I'm with you on that one. Growing up, I was the technical person in the family. So Yeah. They don't they don't know what you're talking about. Right? I I I love how the, you know, or, you know, they all they all get confused until the printer breaks and then suddenly

50:43

But you're the smartest people in the room. That's why you're the smartest person in the world. Alright. So where can people find out more about you and Illumix? I love socializing on LinkedIn. I don't know that many people think LinkedIn became a marketing tool. I still see tons of valuable discussions and I just absolutely love keeping in touch on LinkedIn and and see the latest and greatest and I also share quite a

51:08

bit. So LinkedIn would be the the most straightforward way in Atokaropsala on LinkedIn. We do have blogs and I actually write many of them. So if you go to illumeg.ai/blocks, you will see lots of materials written on semantics, on ontologies, on generative AI governance. So those topics which are close to my heart, and we communicate quite frequently on that. Very cool. Very cool. Very cool. So

51:39

so Audible is a sponsor. And if you would, like to take advantage of a free month of Audible on us, you can go to the datadrivenbook.com. I just tested the link. That's why I was looking over here for anyone watching the video. And it works. Sometimes it doesn't. And we ask, our guests, do you have, do first, do you listen to audio books? And if so, can you recommend 1? If you don't listen to audio books, just a a good book. I do listen to audiobooks. I also podcast, more

52:14

frequently recently. I I'm not sure this book is already on Audible, but, if not, it's going to be in Audible soon enough. So it's Nexus by Yuval Noah Harari. It is audible. I have it in the library already. Yeah. Amazing. So it speaks about the truth

⁠¶ Truth's evolution parallels past technological shifts.

52:32

in the age of generative AI. Right? Interesting. What's the truth? What's the ground truth? And I was actually in the lunch party in SoHo, New York, you know when Yuval was speaking about you know how how technology and what we see right now is not very different from what we experience in you know middle age like when when Gothenburg and printing was was a new thing and like what was printed actually was you know rumors and juicy stuff rather than scientific books and this

53:06

is where what we see right now in, you know, in chatbots and internet, on social overall. So it's it's interesting parallels that he's taking about what's what truth is in generative AI age where what truth were was, like, 20 years ago or even, like, 500 years ago. Yeah. We're the we're the same species with the same problems and the same drama and the same drivers. Like, it's just our tools have changed, whether it's a printing press or, you know, celebrity gossip or whatever or fake news

53:39

or anything like that. Plus, I also think the, you know, there's an old phrase like who watches the watchers. Right? Like Mhmm. Who decides what's misinformation and who decides what's true? I think. I think because misinformation could be, you know, there there's a image of me robbing a bank. Right? Like, you know? Mhmm. Mhmm. I thought, Frank, I thought when the US Marshals put you into the witness protection program, they said we couldn't bring up you robbing a bank any any longer.

54:09

Misinformation. You gotta be careful because, like, one of the things I I wanted the flow was so good. I didn't wanna interrupt it. But, like, one of the things was I was experimenting with fine tuning an LLM locally. Mhmm. And I'm basically trained it on information about my blog. My blog's been around since 1995. Right? Or my site has been around since 1995. One of them hallucinated this really great origin story for my website. It was awesome. It was awesome. I'm like, I like that

54:35

better. So basically, it said that Always. Always.

⁠¶ Frank's World: Kids show on recycling, BBC.

54:39

It was really good. It was basically that Frank's World started as a show, a kids TV show in the nineties on the BBC or channel 4. I forget. Like one of the big British channels. And it was about a talking trash can named Frank that would teach kids about the importance of, recycling. That's my favorite part. And it was and it was the best part was that it was it was the first professional project of the guys who did Sean the sheep and Wallace and Gromit. Yeah. And I'm like so I

55:12

I I pinged the guy I worked with. Has this ever been a show? Because no. Not that I ever heard of. And I looked over it. I couldn't find it. But and then what I did was as an experiment, I fed that that whole paragraph that it came up with into notebook l m. Mhmm. Notebook l m took that and ran with it. There's, like, a 20 minute audio, and it is the funniest thing because it basically talks about the early environmental movement. They said it was the Britain's

55:41

answer to, Captain Planet. Like, they made up all the stuff. And now it's documented. So now someone is going to pulling to pull some information. And if you have Right now it's out there. Right. And I guess to your point earlier about Lumix, like, if you start building a crooked foundation, right, like, that eventually as it moves on, it's gonna so, I mean, who knows, like, couple of years from now, like, Wikipedia may say, like, there might be a

56:08

Wikipedia article about this TV show didn't exist. We're talking about it. We're feeding the machine. That's fascinating. Yeah. And it was a so a little bit on the books. I have to mention it, like, in a couple of sentences. So, in US a legal entity actually is a citizen. It has social number. Right. So, technically machines can create legal entities. They can vote, they can, you know, they can create information and this information is,

56:37

you know, created with social number, with identifiers. So it's actually real information. It's not fake news. It's created by social number. And so this is how you create, like, this new truth. Right? And, and how do you control that? So it's an interesting aspect of what's, what even is defined as ground truth. That's true. Everybody needs to define it. I think that's gonna be the question of the 20 That's a big deal. Mhmm. Yeah. Well,

57:03

awesome. It's been great. We wanna be respectful of your time. This has been an awesome show. Yeah. We'll let Bailey finish the show. And

⁠¶ Thank you, Ina Tokarev Saleh, for insights.

57:10

that's a wrap for today's episode of data driven. A massive thank you to Ina Tokarev Saleh for joining us and sharing her fascinating insights into the world of generative AI, semantic fabrics, and the ever evolving relationship between humans, data, and decision making. If you're as inspired as we are, be sure to check out IllumiX and follow INA on LinkedIn for more thought leadership in the AI space. As always, thank

57:35

you, our brilliant listeners, for tuning in. Don't forget to subscribe, leave a review, and share this episode with your data loving friends or that one colleague who insists they don't trust AI. We'll convert them eventually. Until next time, stay curious, stay caffeinated, and remember, in a world driven by data there's no such thing as a trivial question, just fascinating answers waiting to be found. Catch you next time on Data Driven.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript