¶ Host's Apologies and New Season
Exploring the Okay, I got three apologies to do today. The first one goes to you the listener. I know we've gotten a little irregular. My life's been a bit of a whirlwind, but all in the best of possible ways. We are getting back on it soon. My second apology goes out to our guest today, Aaron. In a parallel universe, Data Skeptic has already switched over to a brand new season.
The topic there is all about being a student right now in the era of AI and a transitioning into the workforce and what sort of jobs are out there. If you think that's an interesting idea and you think I'm making a mistake,
You can write to Kyle at dataskeptic dot com and let me know. We might come back around to it. But I got a whole different plan for this universe. There's a lot to say about recommender systems, and I don't feel like I've touched on every possible voice just yet, so we got a few really good ones in the chamber.
And my third apology goes out to the band Run and Punch. It's their steam song you hear as the intro and outro for Recommender Systems. I think I told them I was gonna always mention them in the credits and then we didn't do credits. So big shout out to one of Chicago's best ska bands. Run and Punch. I'm actually gonna get to go see them next month at Reggie's in Chicago, but you guys can check out Run and Punch on Spotify. Our theme song is called We've Never Met, their newest hit single.
But before that, as a bit of an interlude, here's an episode of the season that almost was. I have a conversation with Aaron Payne about the MBA program he was in and a transition into an analytics role at Chick Friends.
¶ Aaron Payne's Analytics Journey
Hello, my name is Aaron Payne. I'm an MBA student at the at Georgia Tech studying business analytics. What brought you to business analytics? I'm actually just started a a new position at Chick-fil-A as a senior insights analyst, so I'll serve there as an internal consultant for analytics and supply chain area, but I got my career started off in technical tax transformation, doing that at Ernst Young. So I got into scripting in Python, uh doing data visualization.
And from that point I knew I wanted to continue in analytics, but also get back to Atlanta where I'm from. So I joined a hotel asset management company called Atrium Hospitality. where I did general analytics cross functionally for operations, commercial, HR finance, uh just doing ad hoc analytics as well as data visualization and some machine learning trying to Let's stay ahead of that AI curve for sure.
Have you had any learnings taking that skill of business analytics and applying it in such a variety of domains? Yeah. So I I think that especially working or going to school at Georgia Tech, all the different analytics classes that they offer, I've taken people analytics, I've taken analytics in and just general data science using machine learning, doing operational descriptive analytics.
And then that marries very well with some of the other curriculum they have there in terms of corporate strategy, uh figuring out how data can translate to insights. So we have this at Chick-fil-A. At least you have like data analytics, which is uh, you know, your operational analytics or maybe you're doing some data science project.
But Chick fil A likes to call it insights and what do you do with those analytics? How do you take those uh analytics and turn them from analytics to insights, uh an action plan? So instead of a data science you have a decision science. are instead of a data analyst or a business analyst, you have a insights analyst. So that's that's what I kind of like to to think about.
¶ Comfama Forecasting Project Unpacked
Can you share any details on some of the projects you worked on while getting your degree? Yeah, so one of the big ones, uh Georgia Tech is very big on experiential learning. So in addition to all of the learning that we've done in the classroom, there's also a big emphasis on taking that and doing uh real projects, partnering with real companies, using real data to uh kind of shape how companies operate and strategically move in their space.
So one of the big projects that I worked on in in one of the business analytics practicums I had the opportunity to take was uh with Confama. So Confama is a uh social services company in Colombia and they have essentially a affiliated population. So if you're employed in Colombia you join a social services company and they do all sorts of uh social works like education, after school care.
Things like that. So they have an affiliated population. So an affiliated population is just people who are signed up with them to receive those benefits. and they needed help forecasting what their affiliated population would look like, especially uh with all of the economic inconsistencies given the recent state of the pandemic and other economic factors being in Colombia. So they wanted a way to forecast their affiliated population a little bit more accurately.
So I was tasked along with my fellow teammates to kind of partner with their data scientists. uh go through their data and create a updated forecast for them using some advanced uh statistical and machine learning methodologies. I definitely want to dive into methodologies, but at the start of it I'm reminded of something you said earlier that a big part of any work is the insights you get out of it. What is the forecast, what decision does that inform for them?
It's always fun to at least in in my mind to think of like doing analytics, obviously being an analytics nerd, doing the data science and and having fun, kind of sticking your toe into that and working with a company to produce
these outputs and working on this with a team tediously, but it really puts it in perspective how those types of forecasts go in and actually help real people. So Obviously, there's some challenges in Colombia, especially Antioquia, where Confama is located, in that there's like rapid urbanization, there's some demographic shifts, there's rising costs.
gentrification, all of these things that are affecting the ability for them to reliably forecast demand for these social services. So collaborating with them and being able to work and supply these updated forecasts really helps them in terms of operational excellence is is something that is front of mind when we're doing this. So how can we help can Fama operate more efficiently, which in which in turn directly affects the people who they serve.
So I think that especially when working with a company like Infama that does social services, thinking about all the work that they do and all the planning that goes into that and how
just one hitch or one miscalculation can result in people, real people not receiving benefits that are essential to them, especially in a place like Antioquia. So I think that having that front of mind and thinking of the end user is not just uh the data science or the analytics teams that we're serving or we're consulting with.
but how they're then taking that information and it's affecting how they run their company and how they run their business and how they're able to provide for real people in need in in those situations. So that's really awesome to to kind of have that backdrop too, the all the analytics work that we do at Georgia Tech.
¶ Real-World Data Cleaning Hurdles
There's this adage that gets thrown around a lot, that say something like, Data scientists spend eighty percent of their time cleaning the data. And you know, your percentage may vary, but uh to what degree was that true in a project like this? And what were some of the steps to get from where you started to where you were going?
For the uninundated, obviously Columbia does English is not their first language. So i in case you did not know that. So one of the big things with the with the data handoff was that it was not in English in We did thankfully have someone on our team who did speak the language, but it was hard for a lot of us to to translate. So I mean that's not traditional data cleaning, but
that is something that is kind of just unexpected with a real life project with a real company, is that it was just a completely different language. So a lot of the stuff we didn't even know what it was until we had to translate it and and thankfully we were able to do that. But Just understanding and I think that it goes into those descriptive statistics of looking at descriptive statistics.
for these different variables that we were getting, determining if there were outliers, understanding if these outliers were legitimate. Uh some of these obviously it being in Columbia, there's not some great technological infrastructure.
So well some of these numbers were like manually typed in and some of these numbers were inaccurate. So there's not only data gaps, especially when we're looking for exogenous variables, which I'm sure we'll get into as a part of the methodology for our project.
But even with the case sensitive uh information that Confama gave to us, making sure that we were combing over doing our due diligence with creating these uh descriptive statistics, understanding kind of the seasonality effects and the trend effects, understanding COVID effects and how all of that
needs to be accurate for us to produce an accurate forecast. It's it's kind of, you know, like they say, garbage in, garbage out. So we want to make sure that the data that we're getting is not only robust enough to create good predictions, but also accurate and representative of
reality. So it was a task, but we had a great team around us and the data scientists with Confama and they were able to, you know, quickly correct a lot of the internal errors that are questions that we had and we were able to also do some data interpolation, which ultimately gave us a a pretty accurate forecast that we're excited about.
¶ Selecting the Best Forecast Model
Well, there are no shortage of techniques and tools and methods for doing forecasting. Yeah. Maybe everyone's got their own opinion about it. How did you and the team center in on what was the best choice for your problem? Yeah, so I think that this is one of the things that is always interesting with working with these analytics projects is that you have to think about your stakeholders.
But then you also have to think about, you know, like you said, there's a there's a bunch of different methodologies out there. So uh one of the things that was important for the data science team at Confama was interpretability. So they wanted for this not to be some black box prediction model where they didn't really have visibility into what's running the predictions. Ciao!
What we did was in and this is a little bit more art than science, as like I mentioned in one of the books that I've read, where you kind of have to trial and error or use multiple models and kind of see which ones produce the best output. And then also taking and relying on Confama as industry experts to kind of guide us through their current state of work.
And, you know, Isaac Newton said standing on the shoulders of gas giants, making sure that we're not uh just starting from scratch here. We have some good base skeleton to go off of. So not just ignoring what they were working on with their analytics uh team and and data science team, but kind of taking from that and then adding on our own twist.
being robust in in testing, but also being mindful of the end user, the interpretability that was necessary for these models, and then also the time constraint of, you know, this isn't uh our jobs are kind of like doing this in the evening of s as a part of the MBA program. So we only get one semester really to work on this and in having to put out something that we're proud of in that limited time. in that lead up time is is very important as well to be mindful of uh when when choosing a model.
¶ SARIMAX and Ensemble Modeling
And uh can you give us any hints as to the method you ended up picking and what worked best for your case? Yeah, so originally the data science team at Confama, they started off using an AREMA model, which is just a autoregressive moving average uh model, a very renowned statistical forecasting technique.
that many people who follow statistics probably are are familiar with. We then decided on an ensemble model. Obviously that there's there's great parts about all different types of models and reasons why you'd want to use a variety of different models.
But we did end up using a form of a REMA model, Ceremax, but then we sprinkled a little uh data science magic with a with a machine learning model and we created an ensemble model where we took uh we create a formula uh using the RMSE to weight the forecast predictions and ultimately we're able to greatly reduce the residuals of our forecast in our test section by doing so. So that was that was awesome. You had mentioned uh you guys picked up the Surimax model or I think it's one of those
So if they began using Arima, pretty good choice, as you mentioned, very popular, very used in industry, but they want to take it a step further. What does your model choice add that Arima lacks? This is a a great question. So one of the things that we looked at when when we're doing that uh descriptive analytics, uh understanding the data and where opportunities lie to improve the forecast.
seasonality and trend were one of the main things that we noticed in the model, especially variability within twenty twenty, which is COVID as well as twenty twenty one uh after effects of COVID year. So Using the CereMax model, so Arima is like we talked about very standard, but CereMax is this seasonal autoregressive moving average with exogenous variables.
So in addition to the data that we received from Confama, we were also able to use exogenous variables, economic indicators that we were able to pull from Columbia's Bureau of Labor Statistics called Dane. So it's very similar to Labor Statistics here. They have economic indicators. And what we actually decided to do was to increase the level of detail.
through one of Dane's uh the way they calculate the data is into departments or what we might consider industries. As you might already have the uh inclination or the thought is that these different industries probably act very differently. Obviously, Colombia being a a third world country, there's a lot of construction, manual labor, emphasis on those types of blue collar kind of industries are the main ones that Conformer serve.
But they also still do have these technological industries. They have uh academia industries. other industries. So what we decided to do was using Dane's twelve department framework is to break these economic indicators, employment rate, ISE, which is a just a economic indicator, GDP, all these various economic indicators as exogenous variables, and layer that into the base arema. So essentially, just to get a little bit more technical, is that
you have all these different combinations of arema. So you have The lowercase P, the lowercase D, lowercase Q, which are just the non-seasonal uh autoregressive differencing and moving averages effects. But then you also have those. uh capital P, capital D, capital Q, which when you get into seasonal are CereMax models are those seasonal counterparts for those lowercase PDQs.
So what we were able to do was we kind of did a a study in our studio where we were able to compare all of the different forms of seasonal differencing and effects.
and find the ones that had the least squared error or root mean squared error and we were able to use those and then ensemble that with the XG Boost machine learning uh model and then there's a great visibility within those arena models in terms of what's driving versus seasonality trend and then you have some noise, white noise effect, which we hope is small, which was in our case thankfully.
But then we also can use some of the data visualization to kind of show the importance variables or the importance metrics for these XG boost machine learning models so that there's visibility on both sides. And we found that we used uh some other models that we played around with, obviously exponential smoothing. We actually used the profit pock package, which is a very popular forecast model as well.
But we ended up getting the best prediction variables with this and working very closely with the industry experts with Confama. we were able to use those exogenous variables, run some a multicolinearity test to make sure that we're not having overlapping variables in our model.
And ultimately that methodology is what we found. Obviously we didn't test everything under the sun given the time constraint, but we found that that was a really good model that that gave us a prediction interval that Confama team was was very happy with and and improved upon the just base arema model.
¶ Data Science for Social Good
It seems like this is a really novel, like data science for good kind of example. Are you aware of, you know, or have any thoughts on other things going on in the similar space? Um and if you've seen any other community initiatives or excited about other projects you might want to call out.
I mean I would be remiss if I did not mention like I said, I I just recently started at Chick-fil-A and I've been going through their uh orientation process, essentials, which has been really great and I didn't know this before coming, but one of Chick fil A's aspirations is to be uh by twenty thirty, to be one of the most caring companies in the world, which is
It's a very big goal and a very lofty one, but it's one that they're working to do and just seeing the data science or decision uh science team, the analytics teams. that are working and how they're making decisions with supply chain. How are we moving? How are we being environmentally cautious of how we operate, how we open delivery centers. And the end goal in mind is ultimately the customers and the emphasis to care, the initiatives that they're able to do with their financial stewardship.
through their operational excellence, I think it's something that a lot of companies frankly aren't concerned with. I think as time has gone on, companies have gotten more concerned with that. But it's something just core values is something that's been really front of mind with everything that's been presented to me coming in. So that's something that I would have to shout out Chick fil A for and in the way that they're kind of
moving in the business world with analytics and how they're empowering that to make decisions that increase that operational effectiveness, which reduce cost, but then don't transfer that to themselves. But increase customer delight through that, which is really awesome.
¶ Balancing MBA and Career Growth
If I understood correctly, you were working while in school. Do you have any advice for people following a similar path? Yeah, do it before you have kids and uh get married. That's my that's my big one. Uh thank I'm still uh going through the process of my MBA. I started in spring twenty twenty four and I will finish in May twenty twenty six.
And it it's a gauntlet. You know, people go at their own pace. Some people try to knock it out as soon as possible. Some people take it one day at a time and it really exposes you to the sacrifices that are necessary for career growth. and, you know, I have admiration for uh I'm single and don't have any kids, but some of my cohort, they have families, they run businesses, they are married, and balancing all of that is really a struggle. So I think that
the best plan is one that you you spend time making, you know. Oftentimes things don't go to plan, but if you have a plan, you're already three steps ahead and and we live in the information age, so I would just encourage anyone who's interested in pursuing education or anything in their spare time to have a plan. Don't go into it blind. Take some time. There's such a wealth of information and if you
focus on like building a network, reaching out to people who have gone through it. You'd be surprised how many goodwill intention people who are a wealth of information to give you.
something that I heard is that uh a smart man learns from his own mistakes and a wise man learns from others. So I think that leveraging the the wealth that we have as just a society and getting that seeking that uh knowledge would be very helpful in terms of pursuing something like an MBA or a master's degree in the evenings because it definitely you have those days where you're questioning everything when you get home at
ten o'clock and you need to walk your dog and you haven't made dinner yet and you gotta wake up early tomorrow. So definitely those plans they make it much easier for sure. So I would I would suggest that. Well, the presence of COVID in the time period you're looking out, I mean it didn't make your life any easier, right? That's a weird anomalous kind of event. Yeah. Maybe a a really pessimistic person would argue it's a total anomaly and you've just gotta throw out that data as an outlier.
Yeah. I mean, that's definitely an option that you can do is just like remove the data, pretend like it didn't happen. But ironically enough, because it is Columbia, we actually talked to the data science team. And there are actually huge fluctuations very similar to COVID that happened throughout time, especially when government changed hands. Stuff like that can actually be uh very detrimental to the affiliated population. So uh we didn't
we didn't use those historical data points from like a change in government regime or something like that. But we actually create an indicator variable for COVID so that they could use that in the future of like a economic disruptive event because Unfortunately, that is something that Colombia is still impacted by on a basis that maybe a forecast in the US you might want to remove COVID year, but it's actually kind of helpful, ironically, using the business context
that we got from the Confama team to include something like COVID in there to understand that volatility of a big economic event like that.
¶ Future in Analytics and AI
What's next for you? As I mentioned, I did just begin a new role at Chick-fil-A. I'm continuing to work and study to attain my MBA at uh Sheller College of Business at Georgia Tech. I'm actually moving from general analytics into supply chain analytics at Chick-fil-A. So learning a new industry, learning new business rules.
understanding. There's a there's a lot that goes into the chicken business is is what I've learned in a very short amount of time. So uh drinking out of a a fire hose for the moment and I think that you know, long term future goals is to graduate from Sheller, continue working at Chick-fil-A and then just vertically grow within the analytics data science space.
being more traditional analytics and then getting into these machine learning AI space is something that's very exciting to me. Agentic AI is something that I believe on the horizon is or it's even here now. And just finding ways to implement that into our workflow. I think that that's something that's I'm really excited to learn and something that I feel
both at my day job and then my uh learnings at at Sheller, I think are preparing me in a in a really monumental way for that. So that's kind of what's next for me, short to long term and you know, hopefully, you know, kind of the role that I'm after ultimately is like a business data science, bridging the gap between uh business decisions and uh more rigorous uh data science. uh algorithm. So that's kind of what I'm hoping for. So I wish me luck.
Absolutely. Yeah, we're at the foot of the mountain, I think, in the opportunity there. So lots to be done in all industries over time. Aaron, is there anywhere listeners can follow you online? I don't have social media, but I do have LinkedIn, so you know feel free to shoot me a connection on LinkedIn, Aaron Payne. I should pop up Georgia Tech uh Sheller College of Business. That that should get you there. Sounds good. Well thank you so much for taking the time to come on and share your work.
Yes, no. Thank you so much for having me and and for taking the time as as well. And I wish you, you know, great success on your podcast. I think you're what five hundred and eighty something episodes. So let's get, you know, another five hundred and eighty at least. Thank you very much, Cole. Oh my pleasure.
