¶ Intro / Opening
I feel in that sense a PhD is actually, you know, in some sense. It's leading you to become a data scientist because that's what we do in day in and day out. One of the challenges I feel is that in the industry itself. The expectation to reach a solution is the time frame is very small. Hello and welcome to data. Shatter the podcast on all things data. This podcast is a series of conversations with experts and Industry leaders in data.
And each week. We aim to unpack a different compartment of the data suitcase. I am your host. Karthik chassis that I'm a blogger newspaper, columnist book author and a former data and strategy consultant at currently head analytics and business intelligence for Liberty one of India's largest logistics companies. You can follow me on Twitter at Karthik s that is Kar Phi. K s and read my blog at no Intruder.com. That is n 0e.
N th you be a.com, Halloween is expressing his podcast belong to me and my podcast guests, and it do not reflect the views of any organizations. We might be Associated, nothing disgusting. This podcast will be taken as Financial or legal advice. When I was graduating college in the mid-2000s the word in job, descriptions that most commonly appeared alongside data was analytics. However, in 2008, the phrase data science, got coin and took over the world in the next five years.
Nowadays. It seems, everyone wants to be a data scientist. Where is this science in data science? It's and why are so many people with phds in pure Sciences moving to data science to understand this better. I bring back one of my old guests or data. Chatham. Hernia parameswaran is an aerospace engineer, third neuroscientist turned data scientist. She's co-founder of messy fractals and cavity, Atta and a
researcher acts. 18 laps, Daniel talks about her journey from Neuroscience to data science, white PhD, is good training for
¶ Dhanya's journey from Aerospace Engineering to Neuroscience to Data Science
data science. And what the science in data science is all about So to get us started. Can you just take me through your career? Evolution? I mean, last pretty much the last time. Last I know is that you are doing b.tech in aerospace, engineering. So what happened after yeah, like you mentioned right? I did my btech in aerospace engineering at IIT Madras and that's where the pH debug between. Where during my third year, at IIT itself.
I was introduced to this. Guitarist at DHS Hospital, which isn't that our money. And his name is, yes krishnamurti, and he was doing some fmri work in neuroscience. And one of my seniors SRI that they were Arjun. Who is in fact, a professor at, I see, I see today. He introduced me to dr. Krishnamoorthy. And there was my first stint with Neuroscience.
So, where I started looking at Neuroscience data, I Getting some fmri recordings in that sense, right and post this I continued my PhD immediately after my btech at NCBS in Bangalore in CB s stands for National Center for biological sciences, right? And here I did my PhD with dr. Open the sing Bella. In fact, he himself is a physicist who moved into neuroscience and those were the transitions which were happening. I guess a decade or two earlier. Sure.
And during my PhD, I worked with another collaborator for name is Tarik, a garage in, and she is a neuroscientist of, and she's from Stanford. She was a neuroscientist from Stanford, but due to some personal situation. She had to come back to India and her father had, in fact, passed away and she had to take over her father's micro finance company. So, she became a neuroscientist
and an entrepreneur that time. So, She was working with us at in CBS and I work very closely with her for three years. Last, three years of my PhD. And once I finished my PhD, she said, why don't you kind of helped me solve some questions at mudra microfinance? Also. And so, the questions are simple questions, right? Like, whom do I best lend loans to write? How do I cluster Villages where I give loans and they were not Neuroscience questions, but they Very interesting questions.
Nevertheless, and I guess that's where I would have first taken up. What is what we call as data science today, right? Try to address these problems. In fact, the very first question I addressed was this morning, whom do we lend loans to? And at that time we built a simple, logistically, regression model and the results are very interesting because it was a mother's education, which kind of finally decides whether the person will repay.
A the loan or not, right? And so that's how I got into data science and I won't say, in my particular case. I won't say it's 100% shift from Neuroscience to data sense. Because today, I continue working with Tara And as a part of this company called Sapien Labs. So Sapien Labs is essentially, a not-for-profit neuroscience company where we are trying to collect data EG data. From thousands of people across the world, you know, and you're trying to make sense out of that data.
So a part of my life is still stuck to Academia in that sense. And another part of my life is where I do more of my data science related things today. Interesting. Interesting. So, okay, so, but that Euro science Academia part is not strictly speaking. It's not like work for a university or something like that. It's more like Neuroscience. I can't hear. Such. Yeah, exactly. Okay, so it was a sort of soft move in some sense for you, right?
It was like you moved from Neuroscience to something adjacent click. So, if I ask it, why is it that you just sort of stuck on in industry and business for the large part rather than sort of pursuing a career within its a core Academia and so on? Because my understanding is that like or it correct me if I'm wrong, but my understanding is
¶ Why data science and not academia after PhD
that if somebody does the PHD this with a strong, it did to becoming an academic. So what was the then? We particular reasons why you stuck to the to business? No, I am sure even when I started off my PhD, I guess the Eve was probably to become an academy ssion. Become a professor at some University and, you know, follow it up from there. But as you do, your PhD, I think the number 133 to anyone needs to complete their phds perseverance because it is a grueling sort of affair.
Right? And today, things have changed a bit when it comes to Getting an academy, adore bread. Because most of us need to follow up phds with a postdoc and postdoc is not a degree as such but it is more of the same stuff that you do do during a PhD. And essentially, when you finally get your job, you'll be easily around 32 years old and I would sit, right? So it's a long, wait before you actually get your tenure track position, right? Because that would take another
three to four beers. In University, I think that was one of the deterrence and the second thing. I mean I would say is monetarily. Also, a PhD after PhD it, I don't know if it really makes that much sense, right? Because you're if you look at your P scales and your growth in terms of career opportunities, you grow fast. I mean, you grow. As you do your degree is right. If I do my btech, I get something. I do my intake. I get a little more if I to do my PhD images.
Come down. Yes. Yes. Yes. Yes. Yes. And I'm a bit old, for many of the jobs also, so that would be some of the reasons I will say. But in my case, it was also serendipitous, like my husband arvind. He was running a startup called office.
Yes. At that time, which is a e-commerce startup, and they were a lot of operational optimization which It needed to be done, and he is to obviously reach out to me, and we used to work quite a bit together at that particular time point to solve, you know, supply chain problems. Operation problems for this particular startup, and that's when both of us, kind of, you know, thought about it. Why don't we kind of work together because you bring business value.
I can solve the problems and that seems to make a lot of sense that when we created messy fractals in 2015. Three years after I finish my PhD. And the goal of messy fractals was centered around data, sings. Okay, I then from there like you guys decided to sort of focus a bit more on the, on the sports bit, I guess, which is where you sort of got into comedy at diet, things like that. I have no idea. Exactly. Exactly. Yeah. They're initially, we started working with startups.
We wanted to solve data, science problems for startups, and sports was a hobby. So where we started working with Saina Nehwal prakash Padukone at that time as a hobby, I would say because it was just an interesting problem to solve. And but eventually kept getting bigger and bigger and snowballed into more sport analytics and less of other startup stuff coming back to your PhD days and so on. So what about your fears during your PHD programs?
Have have many of them. Also moved into something sort of similar to data science, or have a lot of them sort of stayed within Academia. If I have to think correctly, they were wrong ten of us who are doing a PhD is roughly around the same time. And I would say, three to four of them. Three to four of us have moved out of Academy at self. Yeah, rest of them. They are still pursuing hoping to get to Academia. Some of them are still in their postdoc.
So, but I believe we are. If they get into their postdoc at least they're quite clear that they want to end up in Academia. Yeah. Yeah. Yeah. To me personally, that's a very strong sign. I mean because I think the easiest best exit out of Academia into Industries. The moment you get your PhD, I guess because once you get a get into a postdoc, you going to the, you signal to the market in a among other things that you want to be become an academic rather than be the industry, I
guess. That's exactly how it is. Yeah, so, okay. So now coming back like that. So, what's your in your obedient rather? What's the definition of data sites have been? Because I asked a lot of my guests this because this is a
¶ Defining data science, and how she approaches a problem
very, very highly abuse term, right? Like, lots of different people use it to mean different things. So what is your opinion is data center? Okay. So my opinions, I guess they'll be biased by my experience. By doing my PhD, right? So and that's how I address all the problems which I face. So I go in with the thing that I hype. I try to design a good question. When I face a data science problem, right? It's not easy often to design
the right question. And once I designed the right question, I try to hypothesize on what could be the possible solutions. And most importantly, I feel Understand the context within which this question or problem is. They're right? For instance, in badminton. The context would be, how players understand. You know, how our players will learn the sport of badminton. What are the different dimensions when it comes to badminton itself, right?
For instance, height is one dimension, which I never thought of until I spoke to people who play badminton height at which they get the shuttle. So Creating context around the question I think, is one of the most important things, and once we do this, we can design Solutions. And that's where I think the abuse term of data science comes in today, right? Where I design Solutions and write code to address 0 Solutions.
But it's oh, so what you're saying is like, I mean, if I might be paraphrasing, correct me if I'm wrong that the bigger problem bigger challenge is in figuring out the right question to ask rather than just solve solving the thing. Gam get the problem. I'm guessing. Yeah, exactly. So they can anything standing designing the question and understanding the context.
Yes. Yes, because I know for example, recently, I put this sort of only LinkedIn post where I said that anybody can The three lines of scikit-learn sport to any machine learning model. What's what's critical is to figure out what the context is and structure the problem correctly rather than so the real skill is in structuring the problem rather than sort of solving it ready for solving it. Some extent is sort of democratized nowadays, encasing correct. Correct. Exactly. Solving.
It is moral, is democratized with the kind of codes and the packages that you get, right? It's it's pretty easy to write it. Finally, if you've got the structure, correct and well thought through and I guess that's what will be good data signs. And that is precisely, the reason I feel a lot of phds end up doing data science because that is a skill. We are trained to do in our phds, whether we like it or not
right. There is obviously a specialized question, we go in trying to address in my case. It was Sorry, I was trying to address a question that, but what I ended up doing in my first two years of my PhD was designed the question, which has not been solved before design, understand the context in which that question, is there by reading a lot of research papers.
And, you know, seeing if something which I think about has already been done or not, and then the next two years, it was about collecting data designing experiments to collect data. So I could collect as much data as I wanted or as much data as I thought make sense to solve this problem. And the final step is where you kind of write that piece of code which solves it, build a model, which addresses that particular problem that you're solving.
So I feel in that sense a PhD is actually, you know, in some sense. It's leading you to become a data scientist because that's what we do in day in and day out. Yeah. So is there other certain So Phi, G Square, this is more prominent because what I see is like, especially what I see is data sets may not be so much in India. But at least I brought like you have a lot of data scientist who come from in my limited data Gathering physics or biology
¶ How a PhD prepares you for a career in data science
backgrounds rather than other sort of academic background. So are there certain kinds of phds that are more suited towards data science? And like I mean, or or rather like if you want to get into the industry you get into data science instead of getting into something else and so on. I I believe, right, the way people think about research and It is almost any PhD kind of goes through the similar process, which I just told you which is hypothesized.
Collect data, solve the problem. In fact experiments experiments as a way of solving problems Krypton into the way science is happening. I think only after Newton's times, right? So it's quite the way that has happened. So so datas, I believe that PhD in any field will be Equally good at solving data science problems. The only thing is the figure of coding, which is stopping many people from taking up that as an industry alternative. But otherwise, I don't think it
should make a difference. In fact today. What I see is the reversal of some process in some sense that is people who are data scientists, right? At the end of their detect are interested in machine learning AI. They've done a bunch of courses, the research. I'm sorry inviting them because all research papers. Go through this experiment data,
presentation of data. And now there's a new section which is added in all research papers, which is a model to address that data and to build this model there, looking out for data scientists to come in. So they're inviting data scientist to get into the research lab and you know, solve problems, which may be of, you know, greater interest in that sense. Like this could be like a research in Pretty much any subject not like research in machine learning or anything like that.
Not reset for the instance, right? I will give you one example. I'm working on a project where we are looking at cognitive development of three-year-old children. And so there is a scale scale in the sense. There's a question which is given to three year old children are 80 85 questions and they have to say which shape is bigger. Which shape is smaller. Does this fit there in those sort of things and you'll get a cognitive score of where they are. It's called the Bailey's test.
Okay. So very popular test. And now we are doing, EG is to collect data from the brain. And so now, the obvious thing is to say, okay, what are the EG signatures, which predict my bailey score. So I want data scientist or a person who is good at machine learning to come and build this model for me. And so, we invited one person from Harvard and he was an undergrad, and he's the one who actually built out the model. And it took him three four weeks to build out this model.
But with some iterations and on them, but he was invited to build the model. So it is a reversal of process in some sense where and calling in data, scientists to quickly solve this model for me because I have got all the data. I've got all the context of got everything sorted for you now just build the model and give it to me, right?
Which I think kind of brings me to the sort of classic definition of data scientists as somebody who's like sort of given a well-defined problem statement and then like writes the model for it. I'm Exactly. Exactly. Exactly. Okay. Is this so changing track a little bit like. So we were talking about how a PhD sort of helps you in your data science career, because of your process of hypothesize. Correct data, then we'll models its own, which is pretty much what happens in everything that
we do at work. Right? So, but are there any sort of challenges that I because you come from an academic background West? Let's say, for example, I don't know. Like your output is measured in terms of research papers and The things like that. So are there any challenges that
¶ Challenges in industry due to academic background
you faced in the industry because of your academic background? Yeah, they were definitely a few challenges. I faced initially, right. One of the challenges. I feel is that in the industry itself. The expectation to reach a solution is the time frame is very small compared to that during research in research. You kind of practically unlearn that skill where speed is a skill which you unlearn during your research, right? So but the minute we came to an
industry for instance. Instance, we working with our Telecom product during early days of my see fractals, and this guy had collected a lot of data about his users. And he wanted us to Cluster the uses into many groups. And he said, you have one month and you have to give me a solution in one month and how to Cluster the data and that's practically the time. It took me to understand the context of the problem. How people interact with people, paid vouchers and things like that, right?
So, speed is one thing I think which the industry demands, but coming from an academic background. We take it a bit too lightly. And another thing which I have faced often is that the data is already ready when you come into the industry, right? So here is the data now, give me the solution. That's how it is there, as in Academia. This is a problem. I want to solve, so I need to collect this data. So I'll design an experiment together.
Get this data. So the way that works is a little different and often data is a limiting step. I mean there are certain fields which are never available, you know, there's certain data points which won't be available. But still, you need to go ahead and take a decision. So, those are two places where I feel, there is a huge difference and phds struggle a bit, at least in the initial years in
the industry. Did. There's another challenge which is that when we do a PhD is, or when we are doing in Academy of you're not bothered about building a product out of it here, often interested in solving a problem coming up with a solution, understanding key parameters, which Drive the solution, those sort of things, whereas, in the industry, in the early days. That's one of the key things I faced when we build a code. When we write a piece of code, which Works that is never
enough. It has to connect with the database. It does to sync with the product. It has to keep getting updated and it has to just work. Well and so that that is something I I think I struggled in the early days of my stint in the industry. But over time. Yeah, I figured that out over time. You learned that. So what you mentioned that th these are also generally good at coding, right? Because back when 10, 12 years back.
¶ Learning to code
I used to work for Goldman. And they're like used to hire. I think physics phds because they could write C++ code. That's what somebody told B, which is like, which sounds like sort of country do too because you would think that, if you want somebody to write C++ code, you are a software engineer. But, but so in some sense, I think these are to ask different aspects of coding. As I guess in the PHD you become
good at coding, a model. But in the industry, I think you have a lot of sort of for the lack of a better word like pipeline coding and stuff you. That you need to do to kind of kick to the database connected. The product on the other side in the and things like that. I am correct. That's exactly what I was also thinking day. Yeah, for the first seed be, I learnt coding was because I had a need for it. Even at ITV were trying to solve
some problems for my thesis. It was in combustion at that time, right of jet engines. And I had to Learn Python to write that piece of code. And so that's how we learn. Then we get into research. We don't learn coding because we had to learn coding. I'm really envious of you that in IIT Madras. You Learn Python because be in the computer science department, were asked to do everything in Java. And by the time I graduated, I hated everything about computer science.
It took on another six years for me to kind of start coding again. So, so in that sense, I'm envious of you for the ring python, but but yeah, I guess sorry. You were telling About learning coding because it was required. When you were doing. Yes. It is. A need to learn coding when you're doing research that you don't have a choice because most of us in research, especially during a PhD, we own one project ourselves.
So it's my responsibility to take it from data collection, to model to writing paper to publish in it. So, you end up learning to code? You don't have a choice really. So, and that, that is a different way of learning to code rather than I want to become a data scientist. So let me learn how to code in python or are you just a different motivation to learn coding? I think that's the major difference.
Yea though. I mean, I don't have a PhD, but I personally have a bias for I'm going to learn to code because I need to code for something rather than I need to learn to code because I want to become good at something real cents on every day. I somehow have a bias towards the first one, but coming back to another thing you were saying, I mean, you were talking about how like the nature of data is different between industry and Academia, right?
Because in Academia, I guess you get to design your own experiments, which means you have control over the data that you collect. Which I guess the downside of that is that like sometimes the amount of data that you can get, can get a little limited because you have to Even the experiments in so on. But the good thing is that you get all the fields that you want, if the way that you get and so on, but in the industry, obviously, like, I mean, pretty
much have have almost. I do run some experience from time to time. But like I almost never get to collect my own data, it's some data collected for some other purpose, that has to be used for this particular problem and so on it. So, how do you, how do you deal with this? I mean, what are the things that you do to sort of like, kind of? Because it's obviously it's a uncomfortable / unfamiliar situation for you.
So How do you deal with the situations where your space with sort of data that's been collected for another purpose and what are the challenges there? How do you solve them? It's one. See, it's a very difficult problem. I would say because in most cases the baby have solved. It is, we have in addition to whatever data we have. We do another exercise of data collection of data.
¶ The challenges of working with someone else's data, and proxies
We want, that is how we solved it so far. And for let me give you an example. For instance, when I was working with the micro Finance Company. Right. Mother of microfinance.
We wanted to kind of identify how far the villages are from a national highway or a state highway because we had a hunch or an hypothesis that distance to Highway was a key parameter in defining success of giving loans in that particular Village. And so we had to painstakingly collected data from GPS satellite maps and all that and there was more alternate to it at all, but there are also times when You can create proxy or derived variables, right which
give you an approximation of the missing data. And from there. You can probably derive things. Which can not fully give you the information. But at least partially paint a picture for you. So, for instance, the one metric we created was ratio of industrial workers and that give us a proxy for how How much migration has happened to a particular Village, or how much migration has happened from a particular Village?
And to proxy for some variables, we can often use, but when nothing helps, I think you'll have to go back and collect data. And this might be caused by a bias of my prior background in Academia itself. Right? So, I think I just because I've never been an academic. I just end up using proxies all the time. So you can give you some more examples of the interesting examples of proxy State. I mean this industrial workers was a was an interesting one. I mean could be in any of your work.
Like it's always fun to look at what kind of proxies people use for what in cricket, right? For instance. We wanted to get boners often get most of their wicked in the last sad thinking about T20 matches right Bowlers. Often get their wicked in the death overs and this could often be because of mistakes made by batsman or by the bowlers Brilliance itself. So we created a metric, which is basically how many characters were caught in the field as a proxy for how good the bowler is, right.
So if more catches are taken in the Outfield, we attribute that wicked, as wicked witch just happened because it Batman was taking a risk, whereas if it is due to an lbw or a clean bold, attribute goes to the bowler. So if feel they're dependent wickets and feel the independent wickets, and that was kind of a proxy of how good a boiler is. For instance. Ebola-like shower dual tracker. He gets a lot of wickets in the death toll was right and probably, he is equivalent to
boom, right in that sense. Right? But most of whom are circuits are Fielder independent wickets. Whereas shardas, Ricketts a Fielder dependent wickets. So you get a sense and that's a proxy for how good a boner is in that. Sense, right? Okay. I'm glad I asked you this question because I be proxies are always fun than this.
The sensor, it I mean, because it really, it always makes you think about things that you would normally think of like aspects that you would sort of, because if you sometimes kind of, if you have the freedom to collect the data, you would have probably, I mean, in some cases, you can't really run an experiment for this like inner life, which obviously their life cricket match. You can't run an experiment. But like I feel like you can But yeah, it allows you to think in
the in this manner answer. So coming back. I mean like so I think you spoke about how in data science, you in Industry you end up having to build a product and things like that. So another aspect I guess isn't just in terms of communication, right? Because not every data science problem the leads to a product decision in some sense, right? In some sense. It's more like it could be a business decision. For example, in your micro Finance Company.
I'm guessing one of the problems they might face is like, where do we have? Of our officers, sir branches and things like that, which is a business decision because you can't, you can't give them a GPS, XYZ to say that you need to put your office exactly here in. So on.
So in terms of communicating your results, make I mean, what are the challenges in terms of communicating the results of the model that you have built, how what are the challenges in terms of how is communication of the results of an experiment /, results of a problem different in Academia? Compared to the data science Industry. In fact, I'm glad you brought this up, right? This particular question, which you gave an example of which is,
how do you create more branches? And that was one of the questions we addressed where communication became an issue. Also, right. So what happened was mudra had a lot of branches in Tamil Nadu. And so we looked at the data of Tamil Nadu and identify like things. Like this industrial worker ratios importance of villages where the industrial worker ratios High. The repayment is better or distance from national highway
is an important metric. So we took all these metrics and they wanted to open branches in Karnataka and Maharashtra looking at this particular data. So I told them who can give me the tough for villages from Maharashtra and Karnataka. Let me run the model on Tamil Nadu and give you Villages where you can open branches in. Modesty and what happened was we were not getting any Villages which had scores, you know, which fit the community explanation.
And we could not understand why that was happening. And then we had to talk to the field team and they also obviously at that particular Point didn't have any answers. So we went back to data and what we realized was Tamil, Nadu is heavily industrialized state that has colonel. Attica and Maharashtra were highly agree States. So in the solutions, which are modeled through a portal gonna do, just doesn't make sense in Karnataka and Maharashtra.
And so, you had to throw out that parameter and rebuild the model to make sense of things over there. And this, and also another thing, which we face, which comes through for the communication point, right? We gave a list. Of 100 branches, which can be opened in Maharashtra and Karnataka with the field team. Often came back saying things like but there's competition is high, then people are not accepting us when we go there
and things like that, right? And so you need it to work through in a sort of painstaking manner could filters remove things, which don't fit. So that the field team is equally happy deploying the solution which we kind of created on our computers. He's in our offices. Right? And the that yeah, that's where, I guess the communication comes in communication and context, right. Sometimes while solving data
science problems. We just don't understand what are the challenges faced on the field by the team, which is actually executing the problem and they don't understand why we are saying these are good answers and a lot on a lot of back-and-forth communication probably helps it a bit. But what helps most is you traveling over there with them and you know, spending a few days and understanding the context yourself. Oh, yeah, either that can that's
super important. I mean, in my purse in my work, I mean, I work for delivery, right? So I mean, I had built a model for something away. I forget what it was and I discussed it with my team and these viewers sort of Fairly happy with the model and then one day, we decided like let's go. Look at the operations of our have been back door. So three of us went there. We are looking at operations and then one of my teammates is like Just look at how they're
collecting this data. You do take that into account while building your model, and I was like, I was like, no. And that's when I realized my model was like, completely of the monk. So, it's events like this, which Make Me Remember that. I'm not solving a math problem. I'm solving a real life industry problem right. Now. You just exactly the same for you, sort of going to your microfinance locations to spending time with the, with a team, I guess.
Because that's Pretty much the only time you kind of it's almost like you are G where your entire model is sort of getting calibrated to the market in some sense, right? When you when you sort of like when you do that to do. So the question again, like I think we discussed briefly a while back is, how does it tie in with speed? Because I think in the industry, especially when you are having a it necessarily needs some back and forth, right? Because you give a model, they will tell you.
These are the issues, so you re work on it and so on. So I guess in that Since you have to at some level prioritize speed over accuracy or speed over correctness of the solution, its own. So, how did you sort of have? Which I guess, is very different from Academia where, I want six very different from Academy in Academy. You learn one thing that you need to be patient and you need to do things. I treat every hundreds of times before you actually get it, right.
So in that particular sense, it's okay. In the cell where You quickly give one solution. You don't have, you have seventy sixty seventy percent accuracy you work on it and you edit the solution, but I guess this I traitor model Works probably in any field and it is it is something which should work well in data science also and like you said once we get to the field once you understand the context better and that takes time, that takes its own time, we can I treat if we improve the model.
Well, and I think that hydration is only solution forward here in that sense. How is the communication different? Because I think again in Academia, you're used to communicating in papers, I guess. So what is the transition there in terms of like because I think your papers I'm guessing don't work unless you're in a job I guess.
¶ Communicating results
Yeah. So yeah, and I'm glad you brought this up because yeah, I continue to publish some papers and they are research papers. They are. To read the often. I do know how many people read it. But initially when we started off messy fractals, I still had this enthusiasm of publishing research driven paper. So we used to put out a lot of white papers about things, which we have created, which we have done, which we have hypothesized.
And but over time, I think I have moved more to communicating through blogs and articles. Simplifying things to giving one message at a time and in a research paper, you don't try to do that. You try to bottle up a lot of things in a very compact way. Within the given word limit with few figures and lots of supplementary figures and you try and put it out. Whereas now, I just change the
way I'm communicating. And then realizing this is only because you brought this question up because it's been a few years since I even wrote white papers initially. We were quite interested in writing out. Why Papers and you know, publishing at least detail level of what we are doing now. We put it out in blogs. We do the same thing, except we put it out doing blogs. We try to give one or two simple messages at a time. And there are benefits to both because now I'm optimizing for
more people reading. What I've put earlier, I was optimizing for putting out my set of results and, you know, documenting it in a perfect way. That's what I did during research. Now, I am more about. Okay, let more people read it. I think it will be more valuable that. We, there are benefits to both are good. Thing is also talking about communication of results within the company and so on.
They like, for example, things like I don't know, like I build a model and I'm like, I'm like, okay, here's a model to tell you where to place your next branch and stuff. Now, they'll ask you for an explanation of the model and that explanation will be. I guess you will be very different from what you would have written. If you had written a Blog about it. I'm guessing, I think I have learned to communicate better over time when I try to speak as much data science in English as
possible. So for instance, I tell people in the microfinance example itself. If you live closer to Highway then that willage is much better. Right? And then they will give us anecdotal information. It always works better. If the answer also comes from
the field team, right? Because As they have a sense of this, pollution in their own way, and if it comes from then, obviously, the understanding is a lot greater than me saying, I build these eight models and the model has thrown up these 16 solution. And this solution is the most optimal for, in your case. I think that probably doesn't work. So I guess your journey has been such that. Like you've figured out these
things. I figured out how to communicate, figured out, how to use the data that's available. It's one another. Question for you, if they can we generalize a little bit. I mean like I don't know like I unfortunately, I don't know too many other people with a PhD in the world working in data science and so on.
So like I don't know about your network, but like if you know more people is this how or if you have worked with more such people, is this how everybody approaches things are like or are there challenges in terms of like how you approach the problem mind like whether you can make this transition to this kind of communication. And I think, I guess, Everyone's situation will be slightly different depending on the context. They're working and right like a
friend of mine. She was doing Behavioral Science Neuroscience then at lab and now she's doing an LP with company and trying to optimize the language in which Things are Written and try to get information from things which are written and I do not know when it comes to communication.
I don't know how she has handled the situation and depending on our particular context, like in my particular case, since we were running the startup and there were 67 data, scientists who are working with me, it became mandatory for me to improve my communication with the data science team and with the field team and that was a necessity, which crept in and because of which, I am the way I am today, right? And so How will the I guess communication is going to be a
challenge? I guess it's often a challenge even in my case. But communicating in simple language is one of the key skills. I believe in data scientist should pick up. And otherwise the value of a data science team will be limited because you're as good as the feel team can execute it. Right? Otherwise, your model is not good enough. So so that is a mandatory skin.
¶ Are ex-academics better at certain kind of Data Science roles?
I would think now with in data science rate, like a, we do you think there are particular kinds of roles which would academics better than other kinds of roles. Like, for example, you have some people who kind of work more or the product side like I work more on the The state where my my inputs going to kind of making business decisions rather than going into the products, or some people are closer to the tech kind of code and write more deep models.
Some people have more in at the level of what you mentioned, which is like converting are taking a business problem and formulating it as a math problem for figuring out the context in things like a. So do you think? I mean I will be might be generalizing a bit, but I think it's okay. I hope you are too but like like are there. Any certain kinds of roles where you think, like academics might do better than in other kinds of
foods. Academicians by Design are trained to solve problems that have not been solved before, right. So, when there are tough problems, it's it's better to get an academician on it because they are often not going to get overwhelmed by this. So when it comes to problems like this, where we need to, you know, build an hypothesis and, you know, take guess work going forward. And then design a solution. I think academicians will be better, climatized to something
like that. But converting a solution into a product. I would say, is not necessarily a skill, which academicians would be good. Then because there are very few Labs at least in my field which are designed to do that. In fact, at Sapien Labs, right? Where I work with Cara that is one of the things we are. Trying to achieve, which is take, insights, from research, and try and build a product out of it.
And I'm seeing challenges because it's taking a long time to actually do that transition from solution to product design. So probably academicians are not highly optimized to do that particular thing, but they are better designed at solution design. Trying out a lot of different solutions. I treating many time. Those sort of things. I mean, in the course of that
conversation, so far. I think you've Bob sort of car given a you've talked about, I think one or two problems that you faced as part of your work with mathura micropenis. So, how is the can you give me more examples of work that you have done, like industry kind of data science work that you have done and some inputs on how your background in Academia or Neuroscience has actually helped. In terms of how you actually went about the problem and so on. Yeah, so microfinance was one
thing I did. But when it comes to safety and Labs, right, so we created a lot of metrics, which is basically, we looked at lot of EEG data and converted them into metrics, such as complexity. So complexity can be thought of as a proxy for entropy in the brain.
¶ "Entropy" in the brain
So how different spatial parts of your brain are communicating different signal. So if all parts of your brain are saying different things, your complexity score is higher or the entropy is higher and we designed these sort of Matrix and we try to relate these metrics to more real-world things. Right? In this case. It was travel, mobile usage, and
things like that. So designing these metrics, I mean it involved a little bit of Processing little bit of data, little bit of Neuroscience. So I guess that's where my role came in here itself. So that is one of the things which I did and when it comes to my experience in Neuroscience itself, right? I believe, I bring two Dimensions when I look at any problem, right? I'm always looking at it in a data or a mathematical. Little scientific way that is one second.
It also gives me a Behavioral Science view of things. Right? Like how can I nudge the user to do these things when it comes to a data product and these these are the things which I kind of bring to play it when it comes to cover Dia de right? It can come to website design. It can look, we look at a lot of data of users who come to our website and try to design a website based on. The preferences and things like that.
And here we, I try to bring in two Dimensions whenever I look at this problem and these are things, I think I picked up during the course of my PhD. And we used to not essentially only look at data but we used to talk about a lot of other things which made the brain and unique sort of organ in your body, right? So, yeah actually like me I want to dig deeper into the seventh. I think a couple of nerds back you mentioned about the Easy and entropy in the brain.
It's one. I read them have sort of, especially interested in it and you can't afford it on is because for me like because I keep talking about having a traffic jam in my head sometimes because there are too many thoughts and like it's like they're all clashing at I have. I just lose track of what's happening. So so just I mean, I know it's possibly very tangential but can
you talk about the this entropy? And I mean, I'm also very interested in it would be because of Not because of thermodynamics but because of information Theory, so can you just talk about the entropy in the brain? And, and so on, I mean, just for and I'll talk about one of the results which we got, which is related to this entropy measure, right? We call this entropy measure as
complexity. So we took a study of around, 400 people across different socio-economic strata in Tamil Nadu and we gave all of them shrimp. Test similar to IQ test rate. It's a pattern completion sort of test. And we realize it's this complexity metric was very highly correlated to how much
they scored in those tests. And that's when then we did another study and looked at how much these guys traveled like the same guys and we realize this complexity is equally related to the amount of person travel. So if a person is able to get out of their comfort zone, try Well, too many places complexity is often higher and this compound. Higher complexity is also correlated with higher score in your pattern test.
Right? And so I'm not necessarily going to say that higher entropy in the brain where like traffic jam is correlated with high IQ. I mean, that will be not fair for me to say that. I also want to ask you about the negatives of having it a traffic jam in the head have user noticed. Any other Spelled like this High entropy or complexity is sort of correlated with sort of some - info, if performance on, whatever metric mean. I don't know if you have no. No, I've not seen any that
still. The thing with science is that one of the time it's a negative result. Right? What you're actually saying, we usually use sort of negative result is the positive effect. It's a negative correlation is not a negative result. Rate zero correlation is a negative result. Ain't ya zero correlation is a negative result. But what I'm saying is I don't think I have seen any this thing because you go into the hypothesis, right? And you try to prove it pretty great.
Huh? So in our previous podcast, in our previous recording, I think we spoke about you are talking about one particular project that you read about the about particular kabaddi player, and how his balances. And I think you were telling about the center of gravity at the exhibit. Maybe I think this is a To end today, maybe that's that could be a something you could talk to us about maybe with the context of your research background, everything about how we went about, how you went about
solving the problem. Right? Like I mean, I think we have here it equally spoken right now in terms of like raving, the problem, getting the context, then building the model, kind of zinc. A if you could I hope I'm not throwing you off guard. But like if you could take us through that Yeah, I can
¶ Revisiting the biomechanics of Kabaddi players, and communicating data to sportspersons
definitely take you through that. So, there is this kabaddi player called for deep neural, and he was creating havoc in all the teams that we were working with at that particular time point. And he's a reader and he used to score points at will and win matches at will and basically end up winning tournaments at will, right. So with / deep level in the team. That particular team partner, Pirates won the tournament.
Three years in a row. And so everything ended up in this one question that, how do you stop for the nerve on, right? And that's what the data question, kind of summarized to write. And we looked at a lot of data. There was no systematic Insight when we just looked at which Defender was getting in out, or what technique gets him out. And there was no, there was no light at the end of that tunnel. Right, so we had to take it to
another step deeper. So we went into the biomechanics and that time, I also had intern, who was working with me, and she was a physiotherapist. So she was interested in the biomechanics of the body, right? So, it was a good opportunity for us to address that question at that particular point. And obviously like, in most things in comedy, there was no data available to solve this problem, right?
And so, we took a few Videos of perimeter walls with braids and we marked out different points on his body, including his knee, the angle at which he's bent. And because the skill, which he's famous for is called the dookie. So two keys, essentially, a deep neural, going parallel to the ground, and he'll just be a few say 10 15 centimeters off the ground. So you can bend that Loop and he goes between two Defenders and escapes. So and people think he's going to get out.
So they pile on top of him and he ends up getting six to eight points. There. It is. Got a record-breaking heat points in a single rate, right? Because of this particular skill, right? So the idea was what is it that he's bringing to the table. And when is it that he gets out. So we separated all the reads into two kinds of raised successful raids by / deep where the defender Dash. Is that him or hold him. Whatever, he does and
unsuccessful rates by per week. And we looked at these two data sets and we marked out different parts of his body, where his center of gravity was, you know, what? He was doing during the raid across time, right? And that's when this result kind of grew up that every time, a beeps balance is kind of off that is his center of gravity is outside his body and at at that particular point when the dash happens from the defender, he gets thrown off the court and
the defender wins the point. And now this is incredibly tough for us to communicate with the cupboard deep layers or Defenders of the team. We were working with, right? And so we try to simplify it and told them that every time he's on one foot. That is when the dash has to land, but I don't know whether we were able to communicate it. Anyway, we like I told you he wrote a white paper about it and
we documented this result. But and we simplified this insight to the team saying that make sure that / deep is the third player out so that he never comes back before you get the team out. Okay? Okay. Okay. Okay. It is about the communication. There would have been like real challenge. I'm guessing because it these are like comedy players. I don't think. Anybody would have spoken English, you would have to kind of like we have in sport itself.
That is the case, right? Karthik, in sport itself. I felt communication of data Concepts was incredibly tough. So we try to break it down into saying that puts people understand videos better. So we try to play a lot of videos and they try to, you know, they have to gather the inside themselves. So we try to show a systematic set of videos from which they can pick. That sort of insight and that worked in the case of badminton as well as a birdie in badminton.
They were some other things, right? The coach really helped us. So we wanted like a player like Saina Nehwal to wait before she puts her so bright, so the coach does wrote on a piece of paper saying tie your shoelace every Five Points. So that's the way he communicated that message to him resting interesting. Yeah, right. And so yeah, communication is
stuff. So we've created tools where the players can, you know, see one set of videos and we can bias them to see some particular kinds of videos so that they can in green that insult was very interesting because I will, I mean, I'm sort of how many General begun data, visualization. So I keep talking about how you need to make. It make the visualizations subject. You control the narrative to the guy who see. So in some sense.
I mean, so in sport, let's say like, your you You can show a bar graph that gives just tear the piece of paper instead. You just have to show up a bunch of videos. So in that sense, I think your skill is in choosing the right set of videos such that he can learn for himself from those videos, what your data kind of showed you in. Exactly. Exactly. You hit the nail on the head, but it is not always easy. But yeah, it will take time. There are players even in
kabaddi. They were players who actually used to start following the videos try to. Prove themselves, but it has to come from within in the end. Right? Finally, then, yes, over to close the conversation if I were doing a PhD now and I've figured out that my academic career is going over. I don't want to do a postdoc. I don't want to play the academic game, its own, and I want to get into the industry and getting to data science. What would your advice be to me?
Like, how would you, how would you kind of advised me to go about my career? What kind of jobs to look out for what? In terms of work and so on.
So I would say one of the things which is essential if you need to come into a data science, role is to understand how products are built and if a little more of that can go into research while you're doing your research, I feel even your research will become more compact, more usable down the line, and you are going to learn a sort of skill, which is going to take keep you afloat. Right? You're going to be able to build out a product. Duct and not just a small data solution.
For the whole thing. That would be one skin. I I wish I had taken a when I was doing my research. Thank you for listening to data shatter. If you like this show, please leave a comment, share and subscribe to the podcast. You can find this podcast, an apple podcasts Spotify or wherever else you go to get your podcasts. Once again, this is carp, exciting work. Thank you.
