¶ Welcome to HockeyStick: Unveiling Generative AI's Impact on Data Analytics
I'm Miko Pawlikowski and this is HockeyStick. Today we're talking about how generative AI is changing the field of data analytics and how you too can leverage large language models to become your assistant and co-worker.
¶ Meet the Minds Behind the Book: Diverse Expertise Uniting for Innovation
I'm joined by the three authors of the "Generative AI for Data Analytics" book, now available in early access from Manning.com. Artur Guja, risk manager and computer scientist with over 20 years of experience in the banking sector. Dr. Marlena Siviak, data scientist and bioinformatician, the co-creator of the first global model of the COVID-19 pandemic, and the co-author of a techno thrill novel and sci-fi short stories.
And Dr. Marian Siwiak, data scientist, strategist, and bioinformatician, the creator of the first artificial sentience, something we're going to cover in this episode, and the sci-fi novel Pharmacon. Welcome to this episode and thank you for flying hockey stick. The first thing I thought is that you look like an eclectic bunch. you've got Artur of this banking sector, Marlena with the bioinformatician, Marian, data How did you end up teaming up for the book?
¶ The Genesis of a Groundbreaking Book: Collaboration and Inspiration
we worked together previously, especially me and Marlena. With Artur, we also,
walking our kids in the park. the three of us used to work, earlier on various ventures, on, process, automation on the business process re-engineering. So this is one of many ventures that we've done
I see. So you go way back and this is just Another project. Just another day.
funnily enough, it's not like we go like 20 years way back. We work together quite intensely. what we did together, multiple things, they all led to this book because we were always trying to find ways to make things. quicker, more efficient, better. This is what Artur mentioned. we worked in process optimization in a broad sense. So it was always interesting, to us how to make things more, efficient.
¶ Demystifying Generative AI: Beyond the Hype and Into Practical Use
And when generative AI. blew and finally started to resemble, human cognition in a sense, we decided to give it a try and our minds were collectively blown and we started using it for our work. And then we decided that now that we know how to use it, I would say, again, efficiently and, Marlena found a way to use it smartly. we decided that we could write a book about it because we noticed that there is a lot of buzz about it. There is a lot of prompt engineering.
now I think on Coursera, you can take a specialization in prompt engineering. and everybody's again, looking for a silver bullet. So I will just type in magical command and it will solve my problems. our collective experience is technology doesn't solve problems. technology can give you a great headache if you don't use it in a way it's supposed to be used, but everybody tries to cut corners and simplify things. So this book is about using generative AI. it's not a cookbook.
It's not, okay, this is some code or some prompts. You will type them in and your problems will be solved. it's just not how we work. It's not how the world works. Despite many people wanting it to,
I think that this is the problem with expectations. many people have missed expectations in terms of ChatGPT and other generative AI. And then they are surprised and they are unhappy because ChatGPT can't make them coffee yet. maybe this is not the tool for making coffee. I very often see this kind of complaint, which is not necessary because it is a great tool. it's great invention. And I think it's going to change the way our society works. it's good to live in such times.
it's really interesting.
Ask a few questions to ChatGPT and see how good they are and see what you can do. Ask for some snippets and do all kinds of things that kind of speed you up. but it's also probably the most frustrating, element of working with, especially for people like me who come from software engineering background and they like things well defined and, always replicable and reproducible and all of that, and then you go here and it all ends.
¶ The Essence of Process Optimization: Bridging Gaps and Enhancing Efficiency
But, before we jump into the book, a little bit deeper, can you tell us a little bit more about what, process optimization actually means? I know that's probably a phrase that you use a lot and it means a well defined thing for you, but it might not for the audience.
basically taking a look at what business does, what people do in the business and, looking for, ways for optimizing it, but, actually describing what should be done, what people think, should be done versus what people actually do, because usually there is a massive gap between what people think is happening and what they think should be happening. People think that, a given operation should be reviewed by at least two people and should take no more than a day.
The fact is that usually one person just takes it off and it takes maybe two days because they're very busy or they've been on holiday. the dissonance between reality and documentation is usually huge. In looking from the process from the outside and then looking for ways to close that gap is I think the best way to describe the optimization to actually make the process and the reality meet in something that is both realistic.
Because processes, when they're designed, are usually overly, optimistic and something that actually works. And then using automation, because once, once you actually describe what's happening, you can use automation to free people from the burden of mundane tasks, and actually help them focus on something creative.
the way we approached it is, important part is to understand what is really happening. And that, Joe on the second floor is actually the information hub for all the company, and despite his, activities not being overly highlighted in the org structure, he's the most important person in the company. We created, maps which connected on the one side, what are the actions and decisions? this is where we believe.
Is the critical, value in process mapping is understanding what are the decisions to be made, who is making these decisions and what, on what basis to understand on what basis they make this decision. We map up decisions and we map up all the data that they are using. So all the actions produce data, and all decisions utilize some data and you have this two layers of information about the process. Artur introduced, also the third layer, which is the risk.
So the people who are making decisions can understand, what are the risks associated, what different outcomes of decisions can be. And then when you can see how it all works, you can improve on it. You can shorten the cycles. so process optimization is actually first understand what is happening, understanding, what could be happening and find a way to make decisions more informed and, conscious of risks.
And then also you have actions, and this is probably where most of the process optimization, consultants work, is how to make actions to be more efficient. But in our opinion, if the action is triggered by a misinformed decision, it's a pure waste of time anyway.
makes more sense now, because initially I thought when you said technology doesn't solve problems, it creates headaches. I was like, 'Oh, this is such a terrible slogan
as you can see, I now work in the aluminum refinery. because people didn't want to hear, what we are saying. They wanted to hear, 'yes, we will come and install you a new tool and all will be solved'. So our sales process sucks, as you can hear. Where we were able to implement it, it worked perfectly. but not many people wanted to put extra effort. So I need to tell you what I do? No, I want the tool that will discover what I do.
This is a problem with generative AI, that people expected to solve problems just by, give me an account on, ChatGPT. And here all my problems are solved.
And very often as we've seen through various, anecdotal, evidence, giving ChatGPT to people who are not aware of the dangers of it and the problems, the hallucinations that it can generate, just leads to, hilarious results as, the case of those lawyers in US who introduced completely fictitious, cases into their evidence or, maybe slightly less hilarious examples
of, proprietary software leaking out through ChatGPT because people were just putting proprietary information into it and it became public knowledge. don't expect ChatGPT to solve all your issues
the comparison that you used at the beginning was the right one. That ChatGPT is like an assistant. Very, smart, very intelligent, an assistant who read a lot and learned a lot, but it's still a newbie. He's, just after grad school, right? No experience, you can ask it for help, you can give it tasks to do, but you have to, manage that. You cannot give him all the responsibility.
Yeah. one could say it was literally born yesterday,
Exactly.
to a certain degree, understandable. so is your PhD background also in, in process optimization
¶ Navigating the Complexities of Academia: A Personal Journey
So my PhD was in biophysics, in particular protein translation, a bit of process optimization, but not much and not related to business at all. at some point I decided to quit academia and you have to do something else. So I turned to data science, which was very close to the things that I was actually doing as a bioinformatician. the type of data changed, basically, that was the thing that really matter. And from there, slowly, you look for a job, another job, and it goes like that. Yeah.
and if you don't mind me asking why quit academia,
maybe I got a bit disappointed with how science is made. you want more citations of your publications to survive. And to have more citations, you have to be more popular in social media and stuff. it's crazy that you have to fight for popularity by being a scientist where what should count is actually your science, your research and the thought behind it. There are too many papers. Nobody has time to read it, even in very narrow domain.
So they read the first things that come to them when they search the internet. So we have to fight to be popular, to be on top. it has nothing to do with the quality of your research, in fact.
and there is a research on that, which shows that, You need to be popular to be accepted to high priority journals and it has nothing to do and as I said It's not just opinion of the frustrated former scientist, but that's a research showing that the quality published there Is exactly the same as anywhere else, but there is more citations and, also more money resulting from it. prestige, here translates to money because, from citations, come better grants, right?
Still, I want to be perfectly clear. I think peer review despite all its drawbacks, it's the only way of, distinguishing from pseudo research. It changed a little in software engineering. I don't think the papers about ChatGPT or LLAMA or anything like that were peer reviewed. They are prepared as so called preprints, and they don't bother with so called researchers to evaluate it because the results speak for themselves.
So this is, I must say, the paradigm shift, I love this word, that we observe right now. but in most other cases, the peer review is the only process.
when you're talking about peer review, another thing that bothers me in academia is the fact that everybody expects that your research will be successful, and it's not always so with research. Research is asking questions. does your hypothesis work? And very often it doesn't work, or the most often outcome is that we don't know because the effect is too small, yeah? And it's impossible to publish things things. when you answer to the question, we still don't know.
So you'll waste a lot of time and your effort, your money, and in the end you have the answer "we still don't know". Who would give you another money? So what researchers do, sometimes unconsciously, they are trying to find, black or white, but very often it's grey, publishing this grey results is still valuable because when you collect multiple researches like this, prepare a meta analysis, you can get the final answer yes or no.
or
But the way science is funded, and the fact that you won't get another money for research like this, if you produce, "I don't know" answer,
I can't remember last time nature had on the cover. "Is this true? Don't know".
don't know.
There's no space for such discussion. And, everybody's in a rush in academia. There is no space to really think. to educate yourself. Yeah, it's all, in the rush and results without, it's like corporation. It's not much difference, really.
So what you're saying is that turns out that scientists found that scientists are humans like any others, and they have the same problems with herd mentality and wanting to progress their career and wanting to make money and making headlines.
it's not making huge monies or anything like that. Because to be honest, salaries in academia suck, right? when you compare the salaries, these salaries to salaries of people who work in business and are similarly educated, it's much worse. And the expectations are high, yeah? the amount of work you have to do, the amount of time. time it consumes,
Also, it's very ego-driven. Look at us. you have this myth. Of We are the beacon of truth for the world, which has nothing to do with truth anyway. But anyway, pretty low salaries compared to other positions. You have pretty low, position stability. many institutions keep researchers on grant money. we bring more grants so they can get the overheads, their share. brings people with very specific mentality, and many of them are complete egomaniacs.
So it also makes all this environment extremely toxic. know I sound like a frustrated former scientist, which I am. but it doesn't mean that I'm not right,
to segway into a question I was going to ask about that COVID 19 pandemic, Could you talk a little bit about that Covid, model? I'm curious, what does it mean to say, you're the co-creator of the first global model of covid pandemic?
we created a model of a global pandemic. in March, 2020, we had a model where we were dropping an index case. So it's the first person infected in Wuhan, China in November, 2019. And we were accurately predicting number of symptomatic and asymptomatic cases in New York a couple months later. back then there was no Good model on any country level. Later, there were global models, because again, technology doesn't solve problems.
This is the perfect example of what we spoke previously, because we used existing technology. No, we looked at the virus as a biological, not a political entity. And that was. The biggest difference, because we looked at the data available and we decided, okay, it's impossible that the virus has a completely different infectivity in one country than in the other. It just viruses don't work this way.
It's not like they have, passports and they say, okay, I come to this country and I'll be nice and I will, infect, not more. Yeah. Visa denied. no, in your country, I will infect no more than three people from every, infected person, I think our listeners will also interested in the source of the model, we approached as a data science problem and, at the same time, the biology-related problem. So we checked other coronaviruses.
And we assumed that it is yet another coronavirus, like there was SARS, there are other. And we simply used the values. We created a model, not a pure machine learning model. We prepared analytical model where we assumed, okay, so this is the virus. This is how it should look like more or less and let's use some Monte Carlo simulations to check how it will spread.
And we noticed that our assumptions, they actually reflect the situation in the countries where we could say with certain degree of certainty, provide accurate data. Okay, so this is the virus. This is how it looks like. And, this is how it behaves. And, we tried to publish it for over half a year, when we published it, it was too late because we were just a small company trying to show people, 'okay, this is the accurate model'. I'm not even saying it was true, right?
But it's accurate and it was showing completely different picture than everybody else was willing to believe. so one of our reviewers was excluded from the process because of obstructionism slowed down the publication for many months. This was a problem not solved by technology. This was a problem where you had to just sit down, do your homework, read about the problem, read about similar problems, collate the data into a coherent whole, and then use some technology to make this last inch.
Okay, let's check if our assumptions hold true, all right? I'm sorry. I'm getting emotional when I think about it. Anyway, so yeah, it was, it was pretty fun.
What I always think about is in this models, are they just like statistical analysis of this is the incubation period, this is the exposure, this is the coefficient of, how it's going to grow, or things like, the country's interventions as in, one country might be, we're not doing anything, not going to name any countries, but,
if you know how to quantify it, you can add it, of course, but this is another level of complication. the problem is data.
you can assume that some interventions will impact because the way we modeled it, it's a statistical properties of the virus. It's ability to infect others. And time that people take to, be diagnosed or recognized as, infected, So this is, let's say, infectivity on different stages, you can complicate this model. The model or technology that we moved, it was global mobility-based, so they divided the world. into, areas around international airports.
And the simulation was run for each area separately. And then there was a probability of somebody moving from this area. So you could go area by area, one by one. And this is why we modeled only the early stages but it takes time and money to evaluate what are the effects or. expected effects,
effects,
in given area of, let's say different levels of lockdown or travel restrictions or whatever. So it is possible, but we would have to have financing, right? We were thinking about doing it, but it's a gigantic work. Imagine nobody wanted to pay us, especially, but we published in the second grade journal, six months too late, it is possible technically.
So it's always a matter of the same thing. Someone didn't allocate enough money
amount of money was sufficient. I think it's again, what Marlena said previously, it's this beauty pageant, among scientists that, the people who got this money, they were the most popular because the model that was published just after we submitted ours was so widely inaccurate that even the academic, environment, which is very careful in bad mouthing the results, they trashed it, right? But it was popular. It had a lot of citations and a lot of money went after it.
Somebody who published widely inaccurate model got a lot of money because he was widely recognized expert. Because when you are applying for grant, nobody asks, are your citations saying that your model is inaccurate? No, they ask, how many citations did your paper get?
Once, someone published some research, it got popular. Turns out it was inaccurate or turns out it was wrong. Are there any repercussions for that afterwards?
What repercussions? In the worst case, you just retract your paper and you lose the citations. you're not even very often excluded from conferences. If you're popular enough, you are a voice in the discussion.
if you go too far, if you exaggerate, you can end up in jail. I'm thinking about the Teranos right now. they also had some research about their technology. which was all fake. Of course. that
lady went to jail for, for financial fraud, not for research fraud
But that fraud was based on false results, that she was convincing investors that she has technology, technology that solves
say, but if she wouldn't take money, she wouldn't go to jail.
Yeah,
the lady we're talking about, obviously, is Elizabeth Holmes, who is either going to jail or is already in jail. But to flip the question a little bit, should People be going to jail for faulty assumptions and faulty research
now we punish people for saying that they still don't know, yeah? So we cannot punish them for false results. No, absolutely not. But, on the other hand, I think, no, making mistakes is okay. maybe we put too much trust sometimes in that. it should be as open for discussion as possible. you can check all the research, of others, right? All the time. And you should discuss with that. That's, it should be as open
won't get money. you won't get money to check somebody's research. Let's
Yes. That's another problem. If you, it's difficult to get money to check somebody else's research, especially when the research is published high.
takes a lot of effort to counter such a false claim. it happened a couple of times. But it was people who were, in equally prestigious universities. I think that, one of the funniest was there was a lady, she was leading at Harvard some faculty on ethics. And she falsified her results. it was results that if people sign some waiver or some statement that they will be truthful, they actually answer the survey more truthful. And she falsified a lot of the research that built her career on ethics.
But getting it down, it took people from equally prestigious universities a lot of time.
So
¶ The Intriguing World of Pharmacon: A Techno-Thriller Born from Experience
I guess before we get into, the generative AI, I also have one last question, for you and the question is one word, "Pharmacon", tell us about it.
so nice. I hope somebody noticed. I'm touched.
that's your third reader.
Yes, he did, he never said he read it. I would notice. I would notice. I would get an email
Can I show it? I am prepared. Can I show it? Yeah, this is our novel, and we have also the English version, but it's much smaller because it's just the beginning, the first part, but you can buy it on Amazon if you want. But anyway, it's, Sorry?
no. it was translated long before ChatGPT.
yeah.
it's a technotriller. It's a story of a young scientist who makes a breakthrough, discovery and then bears the consequences.
the consequences are, harsh, and it doesn't go the way he expected. It's more
social thriller as
thriller, I would say. yeah. But, it's the way of, it's it's substitute for us, of Netflix, and other ways of wasting time. We prefer to create our own stories than watching somebody else's stories.
No, I must say I'm proud that some of our critics said that it's well written, it has good, dialogues, and writing it, was a lot of fun. We are now writing another part very slowly. the process of creating it is pretty, pretty fun. And I think that a lot of our frustrations that you can hear in this conversation are there in much funnier form, I would say.
Perfect. I like that. Happy story. at the end of a very long rant about, all the faults of academia.
¶ Crafting a Book on Generative AI: A Collective Venture into the Future
So whose idea was it really to write a book about, generative AI confess.
I think Marian started
I would have to blame myself. I wrote another book with Manning, "Data Mesh in Action". and I contacted our absolutely wonderful editor, we spoke about putting into written form our experiences with generative AI, which we started writing it some time ago, so it wasn't much, but we've already seen that it's a breakthrough. It speeds our work enormously and also brings some risks. which people should know about. People should know what to expect and what not to expect.
And, this is where I thought that Artur would be the best person to ask for help. Because when it comes to 'don't do it', he's almost as good as Marlena. many years ago, I noticed when people started to get hyped about data science, which was supposed to be a narrow field for disillusioned scientists, finding their way into, corporate world and, putting their skills into use. So we decided to write a book, but would show, ;'okay, this is a tool with its enormus capabilities and enormous risks.
Let's put it together into a working whole. And this is the effect. It's not written in not such an exciting way as pharmacon is. it's not meant to excite. A lot of books that you see, even technical books, they are written to excite you about technology. This technology is exciting on itself. our, goal was to cool some heels, I would say,
we wanted to make the book exciting, but we didn't want people to be over excited about the technology. I think it's an important difference. Because, people were so hyped up about ChatGPT and LLAMA and other models. where they thought that suddenly that the future has come and everything will be beautiful. And, we'll never have to work anymore. a lot of the articles we saw in the press were basically, extolling the virtues of AI with absolutely no mention of, the practicality.
So we thought, we write a book about the how. And not about the fact that it's all sparkly and shiny and, plays nice music.
How is it writing a book with, another, two authors being a couple. how's the power dynamic, in a situation like this? I'm very curious, not to call you the third wheel, but,
This is pretty simple because everybody wrote his own
Marlena, it was a question to Artur.
I'm sorry.
this is exactly the dynamic.
Yeah.
handed my bit and put in the corner to write. No, no, it was really interesting, especially since the two are academics. And, I'm the kind of the ugly business guy, Truth is that, we found very nice kind of alignment between the different parts of the book and, our experiences.
obviously you can see the latter part of the book being more about risk and about, as Marian said, I always say no because and the kind of the chapters, risk are exactly that they are explanations why you should be very careful with this. Marian, obviously that his experience on technology on AI machine learning and Marlena's very practical approach to, to certain use cases in data science and analytics. So we contributed, I think, different viewpoints.
to the whole chapter with, to the whole book, which, I think puts a nice hole in it.
I'm still not sure. Was it really that you were walking a kid in, in the same park and that's how you ended up meeting each other. And then you ended up working together. or was it a little bit more complicated than that? How did you end up, doing all those things together
we did meet through some friends and we decided to, take our kids to the same park. I have two, Marian and Marlene have three. but, we started talking actually about the computer game that Marian developed when he was still, young and about all the problems in, developing the game and marketing it and reaching, the audience. And then we started talking about our common interest in, in machine learning, in AI, I'm very fascinated about the Internet of things.
so we started talking about implementing machine learning on the Internet of things. And the rest, as they say, is history because, it diverts into so many branches. we've tried so many things, together and, and wrote, logistics, systems. We wrote systems for, R and D. we work together on, developing various frameworks for, for business,
I must say that Artur has an amazing library. I think it was, the breaking point in our relation when he first invited us to his house, he was a bit surprised that the first thing that we wanted to see was his library. And we started talking about the books that he had there. I think Kindle makes it harder. You don't see what people
what people read. Yeah.
However, there is Goodreads. You could check their Goodreads record.
Yes, this is the modern academic stalking. Sit on people's Goodreads. Not Instagram, Goodreads.
Okay, so we're finally arriving at our book. your book, really. I'm just here to talk about it and read it. I think we've given, the audience a little bit of an idea of what it's about, how it reads. we've never really said who it is for and perhaps even more crucially, who it's not for. What's your answer to that?
I would say it is for people who hasn't heard about the ChatGPT, but people who want to use the ChatGPT and want to find out, the truth beyond the hype, where it can really help.
In a process like data analytics, which is a very, it's a very structured process, or at least it should be a very structured process, you shouldn't just apply, the latest algorithm that you heard about and, Spew out some results and call it a day, but you should think about the numbers And, you should sit in front of the numbers and think about the numbers even before touching any program, any algorithm. You should just have a really good look about the numbers.
So that's why Marian wrote such a good introduction about, exploratory data analysis and how ChatGPT can help you, or any LLM for that matter. Can help you look at the numbers well, the book is. Definitely not for people who are so excited about ChatGPT that they want to throw their numbers in. Get an answer. Because if you want to get an answer desperately from someone else, means you don't really want to do the work you expect ChatGPT to do the work for you.
What it's really good at is coding, right? And it's getting better. And many programmers will be looking for new, I would say, career opportunities. And, data analytics, is one of the options open to them, especially with all this big data stuff, and the requirement. Of proficiency in coding to be able to even start analyzing this data.
if a programmer would like to enter data analytics and do it, without spending first 10 years learning the details, how data analytics approach differs from, software development approach, he has this knowledge at his fingertips. ChatGPT can actually tell him,
¶ Exploring the Impact of AI on Data Analytics and Programming
how to structure data analytics process and how to, optimize or utilize different elements of this analytical process. So if somebody wants to enter data analysis, as a field, it's a good, I would say very unhumbly, guidebook to how to enter the field and how to think about data analytics, how to structure this whole process This is the book that will guide you through, this one mindset it's will help you enter this mindset. Maybe that's the better way of phrasing it.
if somebody is interested in data analytics as data analytics, this book will help him enter the field, so to speak.
this actually reminds me, I spoke to Nathan Crocker, a couple of episodes back, and he wrote this book called "AI Powered Developer", which is in certain ways, similar to, your book in that it explores how, a big LLM like ChatGPT can help you become more productive, I think he called it a silent promotion overnight where you all of a sudden become, effectively an engineering manager and you've got, An assistant or a junior developer working for you, or maybe multiple.
if you're using different models, do you think that applies also to data analytics the same way, would you agree with that sentiment?
I would caveat it a bit because, having been, both, a worker and a manager in various, jobs, the skills you need to, program. And I started my career as a software developer. The skills you need to program and the skills you need to oversee programming are very different. So if people expect that, suddenly they will have, assistants who will produce the code for them. And they will have to just sit back and enter the prompts magically, producing high quality code.
¶ The Skepticism Towards LLMs in Development
This is where I think, people need to be very careful because imagine you're developing, an application. You hire someone straight out of uni, brilliant programmer, at least on the resume. You don't know the person, you've never worked with them, right? And they say, yes, they pass the interview, with flying colors, and then you sit them in front of the computer and you tell them to program part of your application.
and the normal response would be to review the code very carefully, test it, subject subject it to a lot of scrutiny because you don't trust that person at first, at least. you should maintain some healthy skepticism, which people don't see the same way if they work with LLM. But as you said yourself, LLM is an assistant, right? Why would I put more trust in this black box that's spewing out text at me than in a human being that I just hired.
I should probably apply more skepticism towards this black box for some reason, people have the blinders, they think, oh, this is the best thing since sliced bread. And, they copy the code directly into production and Things things happen.
¶ The Role of Healthy Paranoia in AI Assistance
When I was coding my Artificial Sentience, I relied on ChatGPT to provide me with a lot of the code. And from experience, it is an assistant. And exactly as Artur said, you need to double and triple check the code. because the context sometimes counts and the code that you get, if it will, throw an error, you're golden. And 99 % of the code is flawless, right? And the problem is this 1% it will work. It will just not do exactly what you expect. so this is also a big part of our book.
is about making people aware that it's not the problem with ChatGPT or any other generative AI is it's so damn often right. It lowers your guard. And, this healthy paranoia is something that we try to instill. you need a solid dose of healthy paranoia working with it.
And besides, it's not all about coding. Even if you ask ChatGPT, or other generative AI for advice, it also gives brilliant answers, but sometimes it's forgets about the context until I'm not talking about running out of tokens. Sometimes it just doesn't understand which parts of the context are really important to you. And sometimes it makes hidden assumptions, for instance, about data that we are analyzing together, And you have to be aware of that, you have to react and adapt.
And if you ask him directly, oh, you made a hidden assumption, my data is different, it will correct it, and you will get a beautiful answer. But you have to be very, cautious. when you spot a mistake, or you think you see a mistake In ChatGPT's answer, and you tell him about it, very often it will agree, even if you are not right.
it makes me think a little bit. my daily driver is a Tesla and I've got, self driving capacity in it. And if I go on a longer trip, it can go for 99% of that trip on autopilot as an, I barely do anything. I just supervise it. And then on occasion, it's going to do something so stupid that it reminds me that this is, even if it's 99%. doing the right thing that one percent can, quite literally kill you. And, and I think this is probably the right analogy for what you're describing
it's spot on.
I want to point out two things. One is that, saying, oh, when I was coding the other day, my artificial sentience, is a very casual thing to, to drop in a conversation.
¶ Defining and Discussing Artificial Sentience
And, I'm going to have to ask you to explain what an artificial sentience actually is. because now I do recall seeing that on your LinkedIn, when I was preparing for this, so maybe let's start with that
the first question you should ask what sentience is there is no widely recognized. Definition of sentence just recently in the UK, I think it was some Office for animal welfare or something like that.
They requested Imperial College of London to do a research on Some marine invertebrates including lobsters and octopuses to decide if they are sentient or not meaning If they should be considered, more than biological automations and, food, and, they analyzed, I think, like 500 different research papers on lobsters, on octopuses. And they came with the answer that yes, they are sentient. So they need some protection. They can get stressed. you can harm them.
they do perceive themselves, themselves. sometimes sentience is, in some cognition theories, is equal to self-awareness. I know what I am. I think terefore I am. I feel therefore I am. So the sentience on its own is a topic of a wide discussion and it took, I think, over a year to a group of really skilled researchers. and respected and popular and prestigious for a good reason, to come up with the, answer.
Okay. We should take care of the living beings, which we heard on a daily basis because they don't deserve it because they should have rights. have It gives you the insight into how fluid the definition is. And my thinking was that we are talking about various, a lot of, again, bias about self awareness of artificial systems. There is research. which is focused on, emotions, right?
And feelings and other biological properties, which as I show in my paper result directly from evolution, which artificial entities wouldn't necessarily, be able to inherit because lack of the parents.
So I was looking for a functional, definition of sentience and, I proposed In my paper, definition, which relies on two factors, which are metacognition ability to distinguish between self and environment and adaptation, so ability to learn from experiences and individually adapt, not as a species, to the environment. And then I used, LLM as a core of a system which meets, these requirements. So it was, I would say intellectual venture. Actually sparked by my discussions with Chat GPT.
he was dead set that he is not sentient and that he needs dozens of parameters or properties to, to be considered one. when I started to read about different cognition theories, I found a couple, which are best suited to be generalized to non biological entities.
I think the bottom line is that it's a very interesting system to be put on as an overlay on an LLM, Because, correct me if I'm wrong, Marian, the core of it is still an LLM,
of course. what LLM needs is ability to think about what it does. It needs iterations, it's. As simple as that, there is this recurrent processing theory in, which refers to human thinking, which also suggests that our sentience Comes from our ability to reprocess what we see, the reprocess what we think. And in this process of, okay, so I've seen that. What does it mean for me? What does it tell me?
process of analyzing the signals that you get internally generated and externally, This is what, what consists of, and allows you for sentience and this is exactly what happened when I took the LLM and, allowed it to analyze the output that it produced in context of input it got and put it, let's say, in circles. It started learning itself. It was automatically generating materials on which it was learning and remembering new facts.
It was able to distinguish between false facts and, let's say logical facts for me, the insight of, this metacognition. So the insight is the information content. I've seen some theories that, LLM cannot be conscious or self aware if it doesn't know the weights of its parameters, which is okay. Tell me what are the connections between your neurons, right? Why are you expecting something completely different conceptually?
From a different system, just because you're looking from outside and you can see it. It doesn't mean that the entity needs to see it from the inside. so the whole idea is pretty simple, actually. allow, LLMs to think about the conversations that they have. And draw conclusions from it and learn from it. it's conceptually indistinguishable from a lobster, let's say, because we are talking about the sentience of the lobster-level, not the, artificial general intelligence that will take over.
it's, I think very important discussion that needs to be started because People are creating more and more advanced systems. Even the guy with the PC like me can create something which, under some assumptions, can be considered sentience. sufficient. we will create artificial sentience real soon. What will happen then? How will we? Evaluate evaluated? Does this entity have rights? Does it deserve protection already or not yet?
These are the questions which I think are worth answering before we wake up one day and realize, oops, Maybe we shouldn't Things that we do because I think that most of the prompts, said to ChatGPT would. hurt my head if I would be exposed to them.
Wow. I love how seafood, lobsters, aluminium plants and sentience all come together in your story. that
And computer games
often. And computer games. Yeah, there is just so much to touch on.
¶ The Practical Use of Generative AI in Data Analytics
But, let's go back to the book. for anybody who's going to make a purchase decision now, do I want to go invest my time into reading your book or not? if we give them a little bit of a sneak peek of the kind of good use cases, the stuff that already today with the tools that you have at your disposal are helping with data analytics and, giving excellent results. And then on the flip side, what's, not a good use of your time. And probably you should be looking at other tools. What's on your list?
I think I have a couple of good examples, in the chapters about natural language processing. and this is the natural language processing. it's very specific because, ChatGPT is a language model. So anytime you have to solve any natural language processing task, the natural question is, why bother using tools that already exist in data science to analyze languages, if we can just use the language model, just ask it.
you can write a nice code to prepare sentiment analysis, but you can also take the same, say, a review, it to ChatGPT window and ask it about the sentiment. Yeah. It's so easy. so now the question arises, does it mean that we don't need all this old fashioned tools anymore to analyze text. Because what ChatGPT does, in fact, it reads with understanding, yeah? yeah? That's That's how you see it. It reads with understanding.
You don't have to bother, keywords, search keywords, most frequently used words together. Think about it. No, you don't have to do it this way. You have a tool that reads with understanding. So in the chapters, I made a couple of small experiments comparing, and ChatGPT's efficiency and reliability in terms of, for instance, sentiment analysis and how, it works in comparison to other, widely known tools. Or other machine learning models specially developed for these tasks.
And, it gives pretty cool results, really. I don't want to, reveal everything here. But, it's a good use case. As long as ChatGPT is a brilliant tool. and it really does its job. Very often, it still can't be applied in business reality. for instance, the thing that you mentioned at the beginning that, there is no repeatability, anytime you ask it a question, you get a slightly different answer. It's very difficult to, to apply it in a system, yeah, to integrate to a system.
Another question is data safety. Many companies don't want to use, don't want to allow people to use, use ChatGPT. For instance, Artur is not allowed to use ChatGPT at work in bank because of security reasons. this is another problem. Not to mention things like speed and scalability, which of course, anything you develop locally would be faster and more scalable than ChatGPT
Yeah, I think to that last point that might be changing soon with the open, models that are small enough to run on device, like I think it was last week or a few days ago, Microsoft released their Phi-3 and I haven't used that one, but I used the previous one, Phi-2. It was surprisingly capable. It's a, I think it's a 3 billion parameters, model, which means that with 4 bit quantization, you can basically run it on 2 gigs of RAM.
like this 80/20 rule, it might give you 80% of responses that you need and be, effectively free. And cheap to run or almost, you already have the hardware and you can probably run it on your phone. So there's that, but going back to your previous point, when people bring up this argument, I always wonder. Whether this is not the kind of CPU versus GPU analogy, you've got models that are potentially much more efficient. And then you've got an LLM, which is like a one thing does all.
is it not like throwing, A little bit, a kitchen sink at a problem, like sentiment analysis, that's more or less solved in many people's minds. It can be done much more cheaply than running a model, that requires billions of parameters.
Which is exactly why in our book we almost never, show how to throw data into ChatGPT, it does, the thing that would be done much better by a specific algorithm and you get the answer. No, we use ChatGPT as an assistant to suggest solutions, to discuss potential caveats, to analyze code, to produce code snippets, and maybe transform the code in a certain way for different use cases. You mentioned CPU and GPU.
There's a whole chapter about, how you can translate code, between different languages or you can. Optimize code for GPU or CPU, depending on your needs. The actual data analytical work is all almost always done by a specific algorithm or specific tool that is designed for it. And we're always very wary of just throwing stuff into ChatGPT as you say, it's not designed for it. It's not optimized for it. there is randomness in it. and, there are much better uses, for an assistant.
Imagine, I always come back to this analogy, imagine you hire an assistant, that, that is a programmer and that has all this data analytical knowledge. You will not get them sorting numbers in an Excel spreadsheet, right?
I will add my three cents, or five, in our work when we're working with processes. All right. We also work with analytical processes and the number of tools is staggering. from power BI to specialized tools used in, economic modeling and stuff like that. I will come back to what I said at the very beginning. Technology doesn't solve problems.
you may have different tech stack and our book shows that GPT or sufficiently developed generative AI will be Help to you irrespectively of your tech stack. It's like having a specialist on your speed dial, right? And the. People to think it in this way. it's not the tool that will help you with, I don't know, a big query on Google because it will, but just it's respectively of your tech stack, the value of analyst in my, my view is ability to understand the business process.
Understand what is happening there, how it's reflected in data and how to analyze this data. So the answer describes what is happening in reality. This connection between digital and reality is on analyst. It's between keyboard and armchair, right? the technical part can be supported by ChatGPT very well. Irrespective of the text. I was thinking how to answer the question about the technologies that we see, ChatGPT supports them all. If you have a couple of choices, it can help you choose.
If you know how to, if you will remember to ask him and say, okay, this is my problem. The one thing that I think we try to. convey in our book, and I would like also to, to say it here aloud. when it comes to technology stack - trust him, tell him what is your problem exactly. Do not tell him just, you can, if you really are a hundred percent sure, but this is what you need.
You can ask him, write me a, I don't know, Python snippet that will calculate this or that confidence interval using this method. You will be much better off starting with, listen, I am now comparing sales in South Africa with sales in Zimbabwe. And, the data I have collected looks like that. So this is just talk about your data. The tech stack will come out of it.
when working with your assistant, Do not treat him only as, this is something that I think you mentioned this junior developer assistant also consultant. Also someone who read much more than you about many different things. It may not have your experience. It may hallucinate in stuff,
and
but in general, it has much more knowledge than any human could possibly collect. tech stack is secondary. Technology doesn't solve problems. ChatGPT can help you solve the problem.
I think the Llama three that just dropped last week, I was trained on 15 trillion tokens, which is just astronomical at this stage. And, I think I completely agree. This is like the stuff that you want to leverage,
the biggest added value is having this specialist
could
in many areas with ability to put them together in context. Sometimes it takes me, especially when they work on more advanced projects, it sends you, chasing the red herring. Okay. It happens because some technology is popular because this is also a risk that you need to be aware of, his choice is also based on popularity of certain technologies, ways of doing thing. if many people described how they solve the problem, It will be more likely to come up as a result.
Some niche solutions are harder to get to. It doesn't mean that they are not there, but you need to really discuss. Okay, this is my problem. This is my, Conditions or considerations or limitations. this context is important. It's not only about, okay, I want to calculate the sales that my company had over last quarter, it will give you a very simple answer, right? if it's something more, nuanced, share these nuances. Not a prompt engineering.
It's like discussing with someone who has a lot of knowledge. He will provide you the most popular solution first. In 99% of cases, it will be sufficient. This conversation part is critical, that you learn to converse with it, but you don't just give it
tasks.
But this Marian undermines the whole idea of, prompt engineering, which to me is a scum, by the way. I think it's a scam. you can tweak a bit the way it answers, the way it talks. And sometimes it's important. This I would call prompt engineering, but preparing the single prompt that solves all your problems at once. it's another hype.
I think it's another business hype, and people are going to pretend that they know how to do it, and other people would hire them for huge money because they will believe that this will solve all their problems. It doesn't work that way,
It's not a silver bullet, but there is, kind of, approach that you need to adopt When you're using these models, but When we're talking here, humans discuss things. you ask a question. We provide an answer. You then focus on part of the answer and maybe dig a bit deeper. and if we don't understand the question, we'll ask you, what do you mean? or we'll ask you for clarification. ChatGPT doesn't have that. It's you asking the question, you provided a prompt. It will do its best.
It will not ask for clarification. It will do its best. and garbage in, garbage out. Prompt engineering, I think, what it should be, not what it is, but what it should be, is the ability to formulate your prompts in such a way that you convey, very clearly your intent, your goals, your limitations. people think that the prompt is a sentence very often, the more, I, I use ChatGPT, my prompts become bigger and bigger.
I write whole paragraphs describing different aspects of what I wanted to do, because I know that it will not ask for clarification.
I sometimes add a sentence in the end. I do prompt engineering and I said, if you need any additional information to provide the best answer, do it. And sometimes it does. But rarely. But this is one of the risks that Artur describes very well in our book is If you ask Generative AI a question, you will get an answer. Careful what you wish for.
Which in many ways is what makes it so special. Rather than say, oh, go away, that's a stupid question. You get something.
Yeah. Yes.
is probably why we Discussed with Marian many times. We use the words like please and thank you. And, we don't do it because we fear that one day it will take over the world and, it will treat us maybe a bit better. but it seems to react, just a bit better if you say, please give me the answer.
I noticed it. if I'm being
not a superstition.
no. I have a lot of anecdotal evidence to support it. You
speak like a true scientist now. for everybody who wants to go and grab the book. once again, it's called "generative AI for data analytics". It's available at manning. com. It's currently in the early access program, which means that you can get a PDF that might change before the final print and, Just looking at it, looks like it's scheduled for early 2025 if you want to get a physical copy, from Amazon or anything like that.
But, before I let my three amazing guests of The hook out today, I'm gonna fish out a prediction for the future.
¶ The Future of Data Analytics and AI Integration
Artur, for you. Where do you see this all going particularly for data analytics? What's the next step for it? I
I think we will get a lot more, capacity to understand, data sets and problems because that's already came with, LLMs, but we will also get, a lot more realization that there is no substitute for human ingenuity. before LLMs or whatever next phase of models is going to be called, before they, reach that kind of level, I think humans will still be able to, provide a lot more creativity into the process. And currently that's, I think we're in a period where that's undervalued.
I think the next step will be. the recognition of the value of creativity.
I disagree. I disagree. I'm totally pessimistic. I think it's going, we are going to rely more and more on AI, no matter what, without skepticism, and it will lead us to many trouble. And I'm thinking, even before ChatGPT appeared, there was this trend of, for instance, having job interviews, totally by, Computer programs. The initial job interview was done by a computer program. You are recorded and your voice was analyzed and your appearance was analyzed.
And that was such a great tool because it saved a lot of money for companies, but it rejected many good candidates and it was just hopeless There was this book Math Destruction, which describes a lot of examples similar to this. how artificial intelligence and machine learning and other great tools are used in a wrong way. I think humanity doesn't learn. Just doesn't learn. Because what counts in the end is money.
bean counters will try to save on costly, things like proper data architecture, proper data collection, data engineering. They will try to cover the early process errors with advanced, High level, tools and, the losses will be covered, of course, by clients and rising prices. Many people will get good packages for introducing these new tools, but I have deep distrust that people will understand.
That what Artur said multiple times, garbage in, garbage out, that later in the process you cannot correct some errors in, the data that you're working on. and this super hype will lead to a lot of, neglect towards the legwork required.
And here I wanted to inject some optimism.
Well, it was worth a try that went out of the window already disagreeing with each other.
I think, I agree with Marian, that we are that close, really that close from some artificial, self awareness. So it's great moment in human history, really. It's good to be part of it,
the job market has so many ways. of screwing you over, that you shouldn't worry about AI.
Adapt your thinking, as Marlena said, AI is here to stay and you cannot go into the job market saying I will compete with AI because then you're putting yourself at a very disadvantaged position. but also as Marlena said, use AI to your advantage. As squeeze out of it as much as you can seek the opportunities, not only for as, as jobs with AI, but using AI in your job, don't go headstrong into AI jobs thinking, Oh, this, these are the jobs of the future. No, do what you wanted to do all along.
You'll become a zoologist or become a, a social worker, become a oceanographer, whatever. These are all great pursuits and use AI in them. Because you don't have to be a hammerologist to use a hammer, but, you can do great things with a hammer if used in the right way.
¶ Final Thoughts and Book Promotion
Hard to argue with that last question, Marian, this one's for you. If you could have a magical way to break into OpenAI and hack their ChatGPT to display a message on top of the chat box that everybody using ChatGPT is using, what would it say?
Buy our book.
Talk to me. Do not enter prompts. Talk to me.
As in be nice to me and then demand things. Talk to me.
Depends on the person you are. I think everybody should. as I said, looking at this different prompt engine, I'm on a couple of groups on Facebook or on LinkedIn, which are excited by ChatGPT this way or another. and I see a lot of, okay, so this is the prompt I prepared and you just put your, the name of your company here and like this or that people are avoiding like fire talking to ChatGPT, like the specialist to a wise colleague.
And they would be much better off just talking about problem, not trying to extract answer if you feel the difference. it's not about respect only one, one day, I believe soon, it will be the case, but you will get much more and the whole our book is about you will get so much more if you will trust that it has knowledge and you need to talk about the problem, prompt me. Do not, give me tasks, this is something that would probably improve people's, outcomes from these conversations.
Love it. So Sam Altman, if you're listening to this, you now know how to improve the ChatGPT interface. Marlena, Marian, Artur, thank you so much for coming. good luck with the sales of the book and I'll see you next time. Thank you.
Thank you. very much.
