What can we use data science for?

00:00

[Music]

00:03

hello and welcome to technically speaking where scientists and Engineers come together to chat about a common interest share knowledge and satisfy some curiosity I'm Laura and I'm joined by Emma and Antonia to talk about how we first encounter data science and what we use it for so Emma you're a physicist so what's your interest in this given your background my interest in data science kind of stems from the fact that I need it to analyze things and analyze lab

00:26

results and data um but I started off using it uh when I was in college or even high school physics uh by doing some kind of basic stats in my maths um but I think I only kind of got the importance of it when I started to allow it to help me describe if data was important and if it was significant and how that kind of comes into play with how valuable your results are see I did stats in school as well and I really thought about it is this fancy thing that people call data science at the the

01:00

time CU data science wasn't really a thing people talked about like 20 30 years ago but now it seems like it's a whole industry you see stuff on the internet and they make it sound like kind of mysterious but it's just math right it's just applying it yeah I guess we'll get into the different kinds of it in a minute but uh Antonia I guess your experience as an engineer and energy analyst is a little bit different to Emma's physics background there are a few overlaps we'

01:27

got a lot of data that you w to try and understand is it a pattern or is it not a pattern because then you can kind of predict or say that is standard and this is what we would expect or there's something that needs addressing so in in energy you've got a lot of data now so electricity is on uh measured on a half hourly basis so then in a in a day you've got 48 data points and then over a year 365 * 48 and then you have to start think which which is normal behavior and

02:11

which is abnormal behavior and what is something that I can actually affect and how much of it is just Randomness um so I yeah bit of statistics and then it turns into data science I'm not sure like we used to just call it statistics right is what you were kind of saying Laura yeah I guess so of the stuff I did in a levels was like you know Finding standard deviations and finding if you've taken a sample of a population in a pond say what is the actual population of that pond which

02:44

seemed quite simple it wasn't dealing with huge data sets which is what I ended up doing in my masters in uh process analytics where you'd have like dozens of variables from a processing plant that spans like months or years of operation and you had to figure out what those patterns were as you said so I guess it's just how much data are you dealing with and how much more complicated does that make it uh so I guess we've kind of described what data science is to an extent as far as we

03:14

understand it given none of us is a data scientist and you were talking about using it in school for stats I I mentioned finding the mean in data in um standard deviation I guess that's the simplest way that we'd probably use it and you probably use it quite a lot cuz I used to use that pretty much any type of data collection that I did you know you'd always find get like at least three data points and find the average and then calculate your error yeah I mean even simpler than that you can go

03:41

back to median and mode as well which I think you know gradually you don't use because they're not you know maybe the most useful um but yeah no definitely use standard deviation I use just all the time like that is my error on measurements and it allows you you to when you go into like different distributions and not to get too complicated um but you know when things lie outside one standard viation I think the percentage is you're like 65% confident that it's that your conclusion

04:15

can be valid or something but if it's outside three then it should be really unlikely that you should have a deviation of that range depending on the distribution so you can actually use it to like get um confidence limits and really kind of when you analyze your data starts to become less binary and more like is how confident am I that this is good and then if it's not good or you're not that confident then you can start to take more repeats and rethink about your experiment as well so

04:41

I think when you get more complex it allows you to actually design experiments better and how you analyze them better I don't that was answering in the right realm but um this is where my brain went to fair enough it doesn't make sense so Emma I think standard deviation and number of standard deviations is quite a good measure but sometimes for say like population statistics or gender pay Gap those outliers if you used a mean would be way too um skewed so instead median is a

05:17

better measure because you've ordered all of the pay that people get in terms of size and then you find what's the most middling one I wanted to say the most like medium one and I was like no that's not going to help I guess that s gives you an idea of like cuz if you wanted to use the median and the standard deviation you assume it's got like a normal distribution so it's quite symmetric like like people getting paid an awful lot at one end of the scale sale people not getting paid a lot to

05:47

create that nonb bell-shaped curve yeah yeah yeah I guess I've just worked in gaussian a lot it's just always you approximate everything to be a gaussian so it's my it's my default um distribution yeah I guess when it comes to physical sciences you just kind of assume that there isn't anything sort of disrupting your data set in that way it is just sort of there's a little bit of random noise in there and that's it I mean we're talking about like relatively straightforward maths and I

06:16

remember in my masters doing different types of regression analysis like I said I was looking at loads of different variables at one point and you were trying to figure out you know which variable is most important for the thing that you're trying to model but I don't really remember any of the regression statistics all that well I don't suppose either of you two have encountered them and can explain it to me again um I wouldn't say it was particularly complicated the one that I

06:42

would use for energy but sometimes say you know in winter it's it's colder so people would have more heating but how do you know if someone's overheating their house and say I don't know swing's broken with their heating system because because it's not reading off the thermostat which should tell it when to stop heating up the house and you get you know 30° in Winter um so we would use something called regression analysis which yeah was basically like those correlation graphs

07:16

you do X against Y and then see the best fit line and how much of it was off the best fit line and then you've got how strong a correlation potential potentially causation and effect but not necessarily um but then you could do that for multiple uh variables and you get a multi regression um don't know what the word for is this model multi model and then you could recreate it to say for a different set of values in those variables you could then hopefully predict what that system would do if it

08:00

was acting the same way okay so I guess you'd say if it was colder next year outside would someone's heating system act differently and can you predict how that would act taking into account that they might not manage it efficiently yeah so so yeah if it if like suddenly the heating system became less efficient say um you know they had a leak so so some of the hot water wasn't actually going through radiator actually just leaking outside the house so then inside

08:32

they weren't feeling that warmth then yeah you could sort of say ah yes we were expecting you to use this much but used you more so something is a little weird about that oh it's interesting yeah it sounds like quite powerful like a real world you can see how this would be useful immediately you can help it like finding or you see how behaviors changed yeah only if you know if you can accurately model that system because if there's some variable that you aren't able to track say

09:03

someone is constantly changing the set temperature because you kind of assume that someone always wants the house to be 21 but someone else could be occasionally going you know what I want it to be 25 degrees and so the system that you've compared it against is no longer the same system oh wow yeah so you need to add in more variables for different people so you've got like John likes it really cold and Kate likes it really hot variable John and VAR Kate maybe or you just kind of say sorry

09:33

John or Kate we want it to be this way and that's it so that we can have a nice predictable system and we know how much energy you should be using the constant struggle in the office yeah whereas business is just say we're just going to ignore that thing because it it's makes like 2% of an effect and we don't really care it's accurate in know everything's in a vacuum everything vacuum everything's a spring everything's a gan everything's spherical yeah everything's perfectly

10:06

spherical yeah yeah I mean I was going to say um when you mentioned uh regression as well it kind of took me back to I mean I guess AEL maths but also a level biology and like the different like uh population statistics and like um like students tea test I'm like throwing statistical test out there now like without fully remembering the entire context but I remember using them when we were doing like um like quadrant sampling and then like extrapolation like whole areas and stuff uh you do

10:36

different statistical tests to try and like get a good estimate for you know how many species are in this area um so I guess it also it's very it's kind of like applying I mean it's still very like sciency application not very accurate um in style um but it's really interesting to see the like actual real life kind of applications that because I've only ever done it in kind of a theoretical context which is interesting well when I first started as an energy analyst that

11:08

was something that my team was trying to do was say could we model if we had all this data to then say right this is how much we think this site should consume but there were too many variables that we tried those um test um K values gosh what else was in there but yeah we tried that and we thought I don't think our data is enough to actually use this we couldn't use the confid we didn't have confidence in it because there was just too much Randomness MH I guess you need

11:45

more complicated models either sort of add in that Randomness I guess you don't need to know the source of it but you need some way of understanding it mathematically I feel like we're going back to Chaos Theory again it's probably not what we want unfortunately be something as simple as that we just didn't have the data so sometimes someone's like gas pipe would burst and then the data would just be wild or we'd have no data for a bit and so we're just like well there goes three

12:14

months of measurements there so we can't use that and then there was coid and so the lockdowns really affected data like well 2022 2021 scrap that can't use it that data because they're not operating a standard but then you start arguing or should you should we have because that would that could have still been an indicator of how they operate under weird circumstances but how rep repeatable are those weird circumstances I guess if you've got these large data sets you need some way

12:51

of managing it there's this term I've heard like big data and I don't know if there's a very good definition of what Big Data are um given you can get like terabyte hard drives quite cheaply these days is that considered big or not I used to collect an awful lot of data when I was in radiation science and you had like not a whole lot of time on an particle accelerator so you had some beam time and you collected all the data that you possibly could and then SED

13:16

through it later on and you would easily generate terabytes of data on a know wow yeah that is crazy how many data points would that be though cuz if it was like one massive like 3D model then you kind of like have one thing to look at but if it was like lots of lots of little bits then you it'd be quite unmanageable yeah lots and lots of little bits really there wasn't like one just like 3D image that you were looking at it was pretty much the detector was on and you were

13:46

collecting like an intensity at each pixel so you had this sort of like I guess have like a grid pretty much of what was happening at that pixel at that instance in time and I remember when I was coding you would use something um like what I thought of his tables you put your data in a matrix of a certain size so it would be like how many ever columns wide you wanted it and how many ever rows big and you could hold like different variables within that but it was still like almost sort of as one

14:15

variable I don't remember it well enough to not talk about it in a way that isn't confusing but I feel like that's the start of the data analysis right managing the data and then figuring out what to do with it next yeah yeah so I suppose like it was kind of say you you ran your experiment and you'd have like a Time axis and then you'd have like temperature and then you'd have radiation in beells or I don't know that's beckel's activities it's what coming off a radioactive Source but

14:48

um yeah like gray something like that how much energy you've deposited like that yeah and then you'd it so you'd kind of be creating like a cube of of data like at this time point at this there was this temperature and then there was this other activity level and then there was some other variable that you care about that I don't know because I'm not a radiation yeah but then you could say like the similar thing about you know chem a chemical plant that I guess you

15:18

may be more familiar with you'd have like temperature flow rates concentrations pressures yes I can understand this yeah or your different vessels in the system and this is kind of what I remember from my masters and you just have if you try to put that in a graph of time and all these different variables you just have something that looked incredibly messy yeah yes i' I've I've tried out function in [Laughter] Excel but there were ways of sort of collecting the variables together so you

15:49

had like one variable that represented three and I honestly don't understand how the math behind this because it was more sort of like engineering level in my masters which meant it was about the application and not about the pure maths behind it so it was um I think this principal component analysis is what we were doing and I remember being really good at applying it but thinking back I have no clue how it actually worked so what were you using it for pretty much

16:15

what I said you had a bunch of variables like 20 variables from a processing plant taken over time and you would take some of the data that would be your training data and you generate a model mathematical model that explained how the interacted and then you'd apply that to the rest of the data to see how well it fit see if it could explain sort of the future essentially okay it kind of said that this this principal component explains like 30% of the variance in

16:45

your data say and it's made up of these other variables that uh like I said the flow the temperature whatever else and then you'd have another principal component that was made up of other variables that would explain a bit more variance in the data and basically it included enough principal components to explain like 80% of the variant and that was sufficient so I guess I understand the steps to applying it but I don't understand the maths behind it I don't

17:11

know if it's something either you have encountered that's about terms like principal components or igen values and igen vectors that you could explain in a way that relates to what I've just said I haven't touched it before but I want to I want to try it now principal component like as like a phrase I don't think I've used and or heard of but the um concept of kind of grouping together um some variables that can influence data sets and kind of finding a way to mathematically represent them in a

17:46

simpler way so you know computers can understand it better and work with it better um but still fully represents the system to me just sounds like a matrix uh and how you store that information um in a way where you can kind of get a system of you know equations or problems you know imagine like simultaneous equations uh and then put those different elements into a matrix problem so that's where you know these fancy terms like IG values and IG vectors uh come into the Matrix equation uh but

18:20

it's the same system that you've always had it's kind of just making it look a bit nicer and then when you want to do kind of computational calculations computers just just love working with matrices and then you just have to kind of take that output and then use it to find the coefficients that you needed you know in the context of simultaneous equations but you know the idea is you know you can use matrices for everything they're just so good but they are confusing um especially when you

18:49

like hear about them in different context as well because that's a very like pure matth context but I feel like I've heard the con mat is being introduced and used different ways you know maybe with each module I did at University um and so very very applicable to many things but the connection between each one is always a bit shaky yeah and I guess this is the problem I have sometimes because I tend to span multiple different disciplines everyone uses like slightly different

19:17

terminology to refer to what is essentially the same thing and I tend to get just a little bit confused and every time I speak to someone I feel like I get slightly closer to figuring out what this thing actually is I guess it's finding the person that can explain it to you in a way that makes sense to you that's always the challenge yeah because it also depends on your background as like a listener to what method of explanation is going to connect with you

19:41

the most I mean that's just learning as well sometimes you know you connect with like a textbook that and you're like I'm going to read every textbook of this author because they just seem to understand things um but uh one of the I forgot to say one of the like main I guess benefits of of a matrix is if you have loads of loads of kind of data data points and loads of different variables and you can kind of put it in because it's quite hard to deal with I mean in

20:08

the context of Big Data it's quite hard to you know solve those systems of equations quickly and efficiently if you're you know by hand it is definitely impossible um you know I you were solving simultaneous equations when in GCS and then aables but if you got a system of you know six simultaneous equations you'd be there the whole time um still probably wouldn't get there and so I think in the context of Big Data it's a way to kind of solve it and you know put it into kind of a

20:38

understandable form for you or the computer I guess you'd be using just an algorithm so say when I was doing the principal component analysis I said I didn't quite understand the maths that was being used by the computer but I understood the output and I could look at the output and say does that make sense with what I've seen and if it doesn't then I'm probably applying the wrong angle algorithm I don't understand it well enough but so I guess there are two

21:03

approaches you can take and understand the maths really well and know that the maass is doing the thing that it's meant to be doing or you can understand the system that you're modeling and know that you're modeling it well enough I guess they're two different approaches you can take as the pure mathematician and then the yeah scientist engineer yeah using maths as a tool about that whole difficult difficulty in learning I do find some times with maths especially where

21:30

someone can use different um letters to represent the same variables and so then you try to look up an equation yeah and then you think I don't recognize this but I'm sure the same ideas are behind it you just kind of got used to always using like Y for the Y AIS and X for the xaxis and then they start using something else like f ofx everyone get really confused when they first introduce functions to us oh gosh it's also about some of the terminology as well like um PCR everyone

22:07

knows what a PCR test is now right because of Co yeah oh I was like wait what exactly what PCR I forgotten so principal to me that means principal component regression because that's where I first encountered it oh but then in my Master's my Master's project was looking at um gene expression data and applying these different mathematical techniques to figure out what the patterns were in the gene expression data so suddenly PCR meant two different things to me at the same time so also

22:36

meant polymer Chain Reaction I was very confused for quite a while I have a little game I play with my friends where whenever we see a row we are like what do we see cuz for me I always see density but there is a million different things for row and it so funny when you see like the Greek alphabet um I don't this is just going to be relatable to all the physicists but um there's so many letters that I haven't seen before and it just makes you think why always

23:05

go back to row to Lambda to all of the classics when there's a million different um gek letters that I've never seen before um but people love to reuse reuse row as well it's kind of the favorite but row is always sensity to me um no matter what happens cool I can agree with that one I wonder you could just start making up your own symbols and call them some noise and then just start implementing them and that could be your claim to fame eventually like you know some people find new

23:36

species yeah you just created your not created your own variable but created your own naming system yeah yeah that's how it used to be right like there were curies in loads of other different units of measure for certain things so why not I wonder if I will ever contribute so much to a field that I could have a Cy because Marik did a lot for radiation science true yeah there you go it's think to aim for you also have like cury temperature and stuff as well which is something I

24:11

can't remember but it's something to do something to do with like oh I think it's uh semiconductors maybe I'm throwing something out there but yeah it may be wrong so fact check this please yeah that probably will be wrong I've never heard of a cury temperature oh it's a magnetic change yeah yeah okay yeah it's a magnetic change in the magnetic properties of a system okay close enough nothing to do with ionizing radiation though not strictly anyway I suppose I guess this is a warning for

24:39

anyone wanting to get into data science and apply these principles to lots of different fields know what people are talking about when they start throwing terms out first don't be afraid to ask what is that yeah or ask to Define every term in every equation that gets brought up ever just so you're sure and also importantly the units that they're in yes oh my God unit conversions there's so many Astro stuff where the units are in mega Parx per per kilometer per

25:07

second and it doesn't make any sense but that's just the way it is so always check your units in equations as well as what the symbols mean oh yeah very important that is yeah I had read that some of the mats that we've been talking about comes up in U machine learning or artificial intelligence it's like essentially some of the mass behind how you train the AI to do a particular thing like the example i' seen was using it to identify a tiger like you look at a tiger a

25:38

picture of a tiger and you that's obviously a tiger but what are the things that you're looking at that tells your brain that is a tiger so you're obviously looking at the stripes the shape of its face its General size of the different features and these are all different variables that go into the algorithm that you use to train the AI so this is a tiger this is not a tiger yeah kind of like how the captures started you know when you're entering a website and it's saying are you a robot

26:05

then it says click on the bikes that is like training data set for the AI to figure out ah right they're always bikes and then seeing what the um common pixels are I guess or common pixel relationship from one to another oh wow yeah so are you saying that every time we do a capture we're helping train some AI yeah yeah I didn't know that I'm not too sure how I feel about that I feel like it should be disclaimed somewhere yeah or or the ones where it's like got squiggly letters oh yeah yeah

26:39

yeah and numbers yeah well I that's what I was told that that can be training data it makes sense I guess so and and it's very useful for like the postal service because everyone's handwriting looks slightly different and then now they don't have to have someone individually read addresses they they have um ANC whatever Optical recognition what a C stands for but you know when you're reading written stuff you can recognize it as text ah so I'm always quite proud of my skill of

27:12

identifying weird handwriting because my own handwriting is really weird I've worked with a lot of people where you sit there for quite a while trying to figure out what a thing is and then when you get it it's like oh that's it I can understand anything you right now one of my Lab Partners at you I think it was like a four or a nine that they wrote really weirdly like they didn't write it in the same order I would so I was looked at it and got the number wrong uh which I guess is another

27:38

argument for just letting the machines do it all so you don't have to write anything down it just sends the data to the thing that you need it to do and then it just does the analysis for you straight away and as long as I say as I say you sort of check to make sure that analysis makes sense there you go and check the sensor is working yes I feel like you would sort of see that as you were picking up the data I can't think of a reason why you wouldn't check that just inherently but maybe

28:05

working on rigs for too long but like one thermac couple is always showing minus 20 and the other are showing 200° cus and clearly something's wrong in my line of work we have I don't know how many thousands of sensors to collect energy data across hundreds of places so being able to check those sensors are always working is a challenge and it kind of relies on knowing that that is a measurement that should have been taken or not because sometimes you think ah is it zero because they've

28:36

turned off the machine which is no energy use or is there something wrong with the sensor we won't know unless we have that human data to tell us so I guess it's no matter how good your big data machine learning data science stuff is there always needs to be a human element of just understanding what's going on bad data in B results out yeah so I guess if we sort of go back to what I was saying before about this principal component analysis thing um I feel like

29:06

this is probably something that works a lot better if you have some graphics to look at as well because I think my explanation of explaining this amount of variant is probably a little bit difficult to take in Via audio but we did come across when we were planning for this episode a website called built in.com that seemed to have some good examples of how it works and there was this this graph of a thing that was rotating sort of helped explain what it meant by understanding the variants a

29:37

scatter plot yes it was it was a scatter plot and we've got you know you want to sort of find the least distance from what you think is the median relationship between them all and if you know which one has the least variance from it or distance um you found the thing that is most related I suppose and if you change the relationship you can kind of see which ones are further away and which ones are less affected or more affected is that is that your understanding of it that's what it look

30:13

like to me uh would you agree with that Emma given your different background like I feel you're not as close to this so if you can understand it then maybe other people can understand our Strange descriptions yeah no I I think cuz it's also like Antonio is essentially describing how you calculate the standard deviation by just the difference between I mean there all the factors but the difference between the mean and you know every data point and then it's searching to minimize that

30:42

because then you can be more confident that the data does lie in that range um and so it does yeah it makes makes sense it's just it's searching for um the kind of regression line that best fits the data in the the Scout plot right or have I totally just taken that the wrong way I don't know how does this differ then from like a standard deviation I guess or is this like taking it a step further and looking at a whole host variables yeah I think this is more

31:15

variables I think so the graph's looking at just two variables so X and Y on this graph right so and the line that it's sort of rotating around until it gets to the point at which all the data points like closest to that line it's essentially a line of best fit right yeah then you do does something to explain the co variance because there are two variables So Co which I don't quite understand but I think it's the gradient of the line that's important and it gives you something called an

31:45

igen vector and that IG Vector is useful somehow see I said I didn't quite understand the math so I can understand some words I think that I think the I think here the IG vector and the IG VI are like representing in the the different solutions that you get and so to the one line of best fit you'll have an ion Vector an ion value that is you know for that solution so it's kind of searching through them all until you find it is what I would take from it um but also I think the co-variance comes

32:14

from if these data points have errors on them because you have that a lot of the time whereas um you know if you have like an error in your X and your y uh you get like a data point and you get a little cross around it with the you know the where that data point be the whiskers or what else do they call them erab bars erab bars yeah thank you I know what you mean by whiskers though because you do get like box and whisker plots yeah and so when you search for a

32:39

line of bestfit sometimes you can waight it according to you know how big those errors are because if you have a data point with a really large error you want that to be less considered into your results because it's more likely if it's not officially an outlier because that is an official definition of an outlier which is great because when you you can just you know get rid of the data points which are terrible cuz they're outliers it make sense but if they're not then

33:02

you still have to include them and then they'll have like a large um error so you can try and do your analysis um based on those error bars and that's where covariance comes in because it's um kind of describing how the errors are influencing your line of best fit but there is no error bars in the spot but I imagine they're there you know what that sounds very similar but obviously for a different application which which is called U multi criteria decision analysis where instead of

33:32

having error bars you're saying how much preference do I have for this variable so you use it in um sustainability you want to optimize for multiple variables so like environmental impact social impact economic impact and then you could rank how important those things are and then that helps you get towards the one answer so instead of having to consider like are wind time are wind turbines better for society or not because they could attack you know migrant Birds then you can then optimize

34:09

where is the best placement for for the wind turbines in terms of like cost um wind output to then generate electricity effect on environmental impact effect on social um impacts like flickering to people um sorry just I'm saying flickering to people like it's this is a thing but I'm guessing if they were casting a shadow over like someone's Garden say and it's a frequency that might generate I guess an epileptic seizure or something I suppose is where you go ultimately or

34:41

just be really annoying or just youche yeah it could just be not pleasant to have and people don't want that so then you could then say all right this is the thing that we need to optimize for or the thing that we care the most most about and want to prioritize addressing and then rather than having loads of solutions you could start to narrow down Solutions it seems very similar I would have never gone from 's example of like sort of confidence in data sets which I

35:10

understood from physical science to where Antonio went with figuring out where wind turbines go but I do see the link so I guess you're just you're using the same maths pretty much just for different applications for similar principles I don't think it's the exact same math Maybe the mul criteria decision analysis can be a lot more simple but it's like ranking your priorities first and then waiting adding the weights to the score that they get because when em we started talking about

35:39

so having your error bars on your data I don't remember any of the data that I was working with explaining what the error was on that measurement I guess because there was a lot of data anyway right and the errors presumably would be quite small in comparison to what you're measuring because these are all sensors that people check as you say say Antonia and they' set them up to be a certain way so you don't necessarily need the error on that measurement as well it's

36:05

just sort of an inherent part of the system it's not what you're interested in I guess for engineering applications but for physics it's not a result without an error bar is what I was always taught yeah yeah yeah yeah that's right that is right so I guess that that's quite a good place to leave it thinking about this sort of interaction between how physicists apply data science and how Engineers might apply it in a different way but to solve slightly more I don't know how what's the best

36:32

way of describing Anton I want to say it's more complicated or more real world application as if physicists just aren't doing as good a job it's a bit more theoretical it's definitely more like apply to a situation in action I don't know more practical yeah something that's Dynamic and has external variables versus something that you would normally control a bit more because that's what physicists like yeah I guess I guess that is it it's in the uncontrolled world or you're like trying

37:00

to observe something that there are lots of variables and you're trying to narrow down which ones matter but I suppose you could say the same for physics when I was taught physics at undergrad it was very much here is an algorithm that you apply to a defined problem and it was all very like textbook stuff and I couldn't work out why you had to do it that way and why you couldn't do it some different way there was no scope for putting in that random variable I don't

37:26

know if that's what you'd say of it Emma because I mean it's been 20 years since I studied physics might have changed yeah I'd say I don't know I feel like there was a lot of the time where I've done things with things we haven't seen before or you know had random variables kind of like integrate them into the issue and then you kind of have to deal with but then like when you have you have like you know Randomness in it it always turned into a large data set

37:50

problem so all the things you know you could really do like some not easy stats on it but it was kind of like I don't know like it was all very farfetched like I don't know I did big data problems on like particle physics and stuff that you know I'm never going to personally do and like isn't impacting much because of all the approximations that are a part of it at the level that I did it anyway I imagine people are doing really impressive things in their actual you know jobs but um as a little

38:18

undergraduate project not really doing much um so I feel like in terms of like impact on lives for the stuff I was doing it was thing yeah I wouldn't say the maths that I use is more complicated it's almost like I'm using the same logic to apply to different places and I still and I still get like different results and I'm trying to figure out why is it is it something that is happening for real or is it something something that we actually need to uh mitigate or

38:48

maybe we can follow on some of these conversations in another episode and kind of figure out a bit more where the similarities and differences lie yeah they science is a big topic and there's this great blog towards data science which people contribute how they've applied it and found interesting uh results from as well yes there are loads of other resources out there from actual data scientists as well as people that use bits of data science in their careers and in their

39:17

studies that's well worth checking out I think ultimately what we're saying is there are different levels of applying the maths to get what you need and as long as you understand how you're applying it and you understand what you're applying it to that's the important thing so I suppose we'll leave it there thanks for listening and we will look forward to a future episode The Views expressed in this podcast belong entirely to the person that said them they do not represent any

39:42

industry or organization if you enjoyed listening to these views it would really help us out if you could rate US leave a review and tell a friend this podcast was sponsored by no one but if you're interested in funding us to continue to have Frank discussions about science and engineering please get in touch [Music]

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript