Data Management for Business Analysts

00:00

The Better Business Analysis Institute presents the Better Business Analysis podcast with Hi everybody. It's Ben Walsh from the Better Business Analysis Institute. And today we are going to be diving deep into data or data analysis. The, I guess the era or the capability or process we're talking about is data management. So if you're a BA who likes to think about things in terms of groups of processes, this is the whole area that talks about data management.

00:41

Now as we've already noted in our Generative AI podcast and earlier on when we talked about different disciplines that have spun off from business analysis. Data is really important. So data science, data analysis in the true sense, I guess anything to do with machine learning and big data are probably words that you have heard. And if you haven't heard of those words, you may have heard

01:10

of the tools that use. That new kind of data architecture which is ChatGPT for example, or actually anything these days that you that surface from the likes of Google or Microsoft or Amazon are using these technologies. So they are not new, they've been around for a while, but we're now at a point we are even smaller businesses, government departments are moving towards this. Cloud is usually the word that's in the front.

01:43

It doesn't necessarily have to be cloud, but cloud data architecture, which means that the way we did things on premise or you know with database servers back in the day, it's starting to change for all of us. So it's important for us as Bas to really understand what that looks like. So we're going to take a bit of a holistic view, we're not going to go. Too deep. And I'm going to actually call out where I think the line is here for Bas versus those in the

02:13

data specialized roles. So you know, you know how how much information you do need to know and then if it sparks your interest to know more than you can of course go below into the technology side and you know, understand what each component does. So yes, this episode is around data management and let's get started. Okay. So I guess there are three ways in which we could look at this topic. We could look at it from the bottom up.

02:46

The one way of talking about that would be the architecture or the data model. That's one way we could look at it or we could look at it from the kind of how it. Is exposed to our users in terms of our processes and functions, in terms of what we call capabilities. Or we could talk about it holistically at the top from the data governance going down to the framework. Now as Bas, it's always best to start from the top down.

03:19

Yes, bottom up or playing chess and working out what's behind the curtain is always useful sometimes, so we can test our assumptions. But at the end of the day, especially in this topic, if you haven't started from the top down, you won't get the results that you need. This is very, very similar to projects.

03:37

When objectives or problems aren't defined, you know it becomes a mess, or you know the projects become less and less likely to be delivered if you haven't got that up front kind of managed up. Governance and framework in place before you start. So that is true of BA work in terms of the BA plan, but it's also true of the data side.

04:03

So without going into boring detail around meetings and data governance boards and all the rest of it, which is part of this, let's talk about the fundamental elements of data management. Data management does incorporate what I've just said, the data governance layer, let's let's not go into that box, it's not very, I don't know, interesting

04:29

to talk about. But there is a you need to have a governance structure and processes then in the kind of functional side or if you like the operational side, there are two main parts of data management. And you could say that they are defined in a data management framework. The two parts are the data

05:01

management lifecycle. So the lifecycle in which data projects go through or a piece of data goes through, they're one and the same and they are a structured way, a lifecycle if you like, of. Defining the steps in which your data project should go through and you know, stage gate and be thought about and also a piece of data goes through this process and we're going to refer to some really useful predefined lifecycle Data management, Lifecycle terminology and

05:39

journey and steps referencing data dot. Gov dot NZ, which is the New Zealand government data kind of experts. It is one way of defining this journey. You may have come across others. There's, I think there are a lot of data bodies of knowledge which define what this life cycle is. I'm going to refer to this one because I quite like it and it's relevant here in New Zealand. So if you don't have one and especially if you live in New Zealand, I would suggest just

06:11

adopting this life cycle. So that's that's a big one straight away and I'll define the steps and what they meant in a minute. So we've got our life cycle. So this these are the steps in which data go through and your projects that have data in them should go through then. So if you like, that's the process. Then in order to support you doing that and in order to support data management, there are capabilities.

06:39

There are different skills, human capabilities, but technical capabilities that need to be in place in order for you to have true data management. Now those capabilities would have been quite different 20 years ago or 40 years ago. So when we talk about these technical capabilities, these data management capabilities, they are the, I guess, the architecture, the data architecture.

07:10

That exists in 2023, So if you have been involved in, you know, you've been a DBA or you've worked with databases, this might be an evolution of what you've already known. And some of these words will not be dissimilar to the standard pattern for managing data, but below the capability. When we actually look, you know, into the engine, the technical architecture, those technologies have definitely changed.

07:41

So we've got our lifecycle and we've got our data capabilities and that's what we're gonna be focusing on today in this podcast. As I mentioned earlier, above all that is data governance. So that's the more management side of data in terms of committees and organization.

08:03

And in terms of just making sure that you've got that governance across the top and then below all of this, below the life cycle and the capabilities, you should just have some data fundamental management and that is just best practice, things like data security classifications, just very basic ways in which people. Manage data. So they're not specific to the capabilities necessarily or the

08:29

steps you go through. They go across all of those and they're just basic ways in which we deal with pieces of data, Okay. So you've got the foundational, if you like, fundamental capabilities down at the bottom. You've got data governance across the top. But what we're talking about is the middle, the actual when the rubber hits the road, which is the life cycle, which data goes through and which your projects should go through when they deal

08:54

with data and capabilities. Now capabilities can be talked about in terms of from a BA perspective and that's what we're going to touch on. And then below those capabilities, you have the architecture. So we use capabilities as a way to talk about both the human and technology capabilities that are needed. And below that, that's the connection back to the architecture. Back to the solution, the Hal. OK cool, so let's get started on this data lifecycle now, the

09:26

data management lifecycle. As I said, I am going to refer to data.gov dot NZ for those who want to reference and see what I'm talking about here. And I'll take these on on on a blog, on our website on the Better Business Analysis Institute, a website as well so you can reference it. Now when we go through this lifecycle, it's made-up of seven steps, so not too many for you to understand. Usually 5, somewhere between 5:00 and 8:00.

09:58

Things are generally a good amount of steps, which is why I like this particular journey map, if you like. The first step is plan. So this is less technical and more around, you know the processes and resources that are that you need to manage the data. It's around the project goals for this data and just really making it really clear, you know, we can use something called the data management plan if we really like.

10:31

It's where you create, you know how you're going to handle this data, what's the plan for this piece of data or pieces of data. So we're not just talking about the name. On a customer, we're talking about customer data or we're talking about moving a whole server that has a whole lot of data sets to a new technology in the cloud. So how are we going to handle that? What is the plan? So that should be, you know, you should work with your project manager on that and that should

10:56

be very clear. But it's specific around the data side of things and it might be how you're going to migrate data. You know, what is the plan before you even start doing anything. So it doesn't necessarily involve any. Fundamental data management capabilities. At this stage it could just be pen and paper or a Google doc or you know, a Word document or a PowerPoint presentation, but you're really planning out what you need to do.

11:20

So you know, you would start with writing plan and then you would list out what are you going to do with that. Now it's really important that your stakeholders, your product owners, your sponsors understand what the plan is. So it should be written in their language. A bit like a spec, but just the same as an agile BA plan that we talk about at The Better Business. Analysis Institute. This is a plan for your data,

11:42

right? Cool. So you've got your plan and now you're going to execute on your plan. So the second step after plan is collect. So this is where data is gathered or generated. And this is the information that's in scope of your plan. So you need to be able to collect this information. And it's not. We're not. You may start. Your brain will be triggered and you will start to think about

12:07

how that's going to happen. But at the moment just identify the different datasets which you've got in your plan. And we need to, you know, we need to collect information. So this could be information and other data stores within your organization. It could be accessing or consuming data from other locations outside of your organization. It could be multiple data sources. It could even be capturing new

12:33

data through say survey forms. So it's all of that and you just need to define, you know, what are your collection points, how are you going to collect this data, what is that? And literally you need to collect first. That is the first you don't. If you don't have the data, you need to go and collect the data. So it's about gathering that. And you know when we get down to the capabilities, you know there is technical terms like ingesting that data.

12:59

So how do you get that data into your particular data management platform. So at this stage you're collecting it, where it goes is important, but we're not talking about the how right now. So we've plans got a plan, we then collect now as the data comes in from multiple sources. We need to describe the data. We need to have some agreement around the metadata standards in our organization and we need to describe the pieces of data that we've got in.

13:29

So you know, you may and your plan come up with a data model. So we're saying okay, we're going to collect information around our customers. We're a store. We're going to collect the information around our store and we're going to collect the information around our inventory and we might, I don't know, sell tshirts for example. So therefore you could have grouped that data into those three areas in terms of inventory, in terms of store and

13:51

in terms of customer. And you could have mapped out what data you need to collect, only collect data you're gonna use and you would have defined that. So at this point, you may have lots of data sources you may have downloaded. I don't know, bought e-mail marketing lists with a whole lot of customer names who might be

14:10

interested in your T-shirts. You may know what the inventory system fields are, so you know what those are and you may also know all the information you need to run the shop. So you've defined that you've run that down and that's what we talk about and it's important down to go the down to the data type level here, not database level, but just know that a name might be. Just a string or a piece of text, for example. Whereas if we were collecting a phone number, that would be a number.

14:42

So it is worth, you know, describing that data at that level at least as a BA, and passing that on to. Those in the data team to maybe go a little bit further or for you to do that later on if that happens to be your role. So yeah, describe the pieces of data, what are they, What are they used for, what are the, what would they, where did they come from, what are they going to be used for? So input and then the output, not necessarily the process.

15:12

Of course. So now that we've collected our data, if you like, or at least virtually collected our data and we've named and appropriately defined the metadata around that data describing it, we then need to store it. So the data needs to be stored somewhere. And data is digital these days, mostly digital. And if it isn't, we need to get it into digital form and we need to be able to store it in a digital repository. And that's the highest term, that's a very BA term for.

15:41

Saying you know what you might think of as a database for example, or data like, there's a whole lot of terms of ways you can store information these days. But we are talking about at the highest level here, digital repository. And that of course needs to be secure, reusable. You know, we need to be able to protect the data we're collecting, especially if it's customer data. But equally if it's around our operations, around our store and our inventory, you know, it's

16:10

not. If we, you know, don't store that appropriately, we put it into, I don't know, into save all of our information into a Google Sheet that's got a public link and that information is accessed, then you know that could have negative consequences both to our reputation but also to our business and to our store, so. We really need to, you know, think about how we're going to store information and that that quickly follows the collection.

16:37

So you're collecting it. You've you've obviously got to put it somewhere. So this is and you might say, well, Ben, why are we just thinking about this now? Well, you're not, you've done this in your plan. You've already got an idea about where you might have stored it, but right now it's the action of storing the data somewhere. OK. And that could be multiple places. We'll come back to that when we talk about technical capabilities.

17:00

Once the data's stored, so you've collected, you've described it, and you've stored it somewhere or in multiple places, you then analyze the data, Okay and so that is, you know, you explore it, you interpret it, You look at the data in different ways. It might be the number of sales for T-shirts in your store, for example. It might be the number of customers who visited the store. It might be the number of transactions, it might be your

17:27

stock levels. So you're analyzing the information you're really looking for, grouping that information into useful. So when you're analyzing, the goal of it is to interpret it into useful data sets or data views if you like. So you might combine data from your customer repository. And your inventory repository and your store repository to have, you know one. So you are able to create a view where you see all customers by store, by inventory for example, where they bought their stuff.

18:01

So the next step, so once you've analyzed the data and you should have an idea about how you want to use your data at that point and then you want to use it. So that's the next step. So we've planned, we've collected, we've described, we've stored it, we're analyzing and now we're using it. The data is used for a purpose. It was collected or generated. So you should have defined why you were collecting the information and now you want to use it.

18:25

It might be a report that that's a general use case, is that you are collecting information so you have better insights on your customers and so this could be a report or you know, a business intelligence dashboard if you like. Dashboard is a very good use of data and you know the idea is that you could reuse that data also for for other purposes. You might feed it, now you've collected it and analyzed it. You might feed it into your e-mail marketing system.

18:54

So using can can be consumed is another way of expressing that. It could be to present the data inside report, but it could also be make it available for use to be consumed. By other systems. So it could be, yeah, like I said, your e-mail marketing tool. So that'd be very useful to be used by MailChimp or you know, the meta platform or whatever marketing tool you're using. And the final step is to then save.

19:22

Or destroy the data. So if you're going to keep that data for a long time, and you know this is where records management policies come in, this is where you really need to understand archival rules and disposal rules and how long you're allowed to keep that customer data for. You need to make a decision here in terms of saving it, which is could be archived because it might not necessarily need to be accessed all the time, or you need to destroy it.

19:47

You need to take actions to safeguard that data for its long term. Viability and availability. So in practical terms, that data might not be, you know, saved on your desktop and might be saved in the database. So this is we really need to think about how we get a long term save our information. Now just to recap, when you've done your plan at your first planning stage, you should have these headers written down. Click describe store.

20:18

Analyze. Use. They will destroy and you've already mapped out how you're gonna do those things, at least at a high level. You know high level business requirements level, not necessarily the detail necessarily. And as you go into each level, each step you may define the detail level requirements. And the solution you're going to use, so that is the data management lifecycle as defined by data.gov dot NZ. It is pretty good. And the other useful side of this particular plan is they do

20:53

talk about digital capabilities. Now when I talk about capabilities, capabilities is a word that is used a lot you can talk about. It means skills, and you can also mean you know human capabilities or technical capabilities. data.gov dot NZ outlines the human capabilities, the processes or abilities that your particular team or people in your organization will need in order to kind of reach a high level of maturity. And so it's if you if you if you

21:29

have that requirement. If your job is to audit where you're at, please go to that website. It's fantastic and it will it aligns those capabilities with the data management lifecycle I've just talked about. So your your job is is nearly done but because we live in an IT world and our job is to. Jump between the world of business and IT we we should be really talking about what I would say the technical capabilities that we now require in order to support that life cycle.

22:02

And like I said at the start of this podcast, these capabilities have evolved over time. They to be honest these highest level capability names may well have existed 20 years ago, but definitely the techniques or some of the ways in which we do things inside these boxes have changed. So inside these boxes there are sub capabilities if you like, functions or features that these various systems allow us to do or do well. But at the highest level, we as Bas need to think about these things.

22:37

They're not complicated and I'm going to read them out. Now this particularly are names that really Amazon AWS use to define these areas, but I think they're pretty common across whatever platform you're using, be that Google Cloud or Azure or even an on premise setup. OK, so these capabilities are technical capabilities. Just to recap, they need to exist in order to support that data management lifecycle I just talked about. The first one is data sources.

23:11

So these are things. These are nouns. They're not just collect data or store data. They're actions. This is data sources. We need to have data sources. And that could be a database, it could be SQL Server, it could be a other systems like CRM systems. It could be devices these days, It could be small devices that feed in, it could be your contact center, it could be logs, it could be anything, right? It could be your social media marketing platform. But data sources need to exist.

23:40

You need to have some way of storing, sorry of sourcing information. You need to know what those are okay, so data source is pretty easy. Then in order to use that information, we need to get that information into a platform or to a series of components. So it's useful for us and we call that function ingestion Okay. So we ingest data. If you like, you could say capture is probably the very high level, you know, holistic word, but it's not quite right.

24:16

In this case, the reason why sorry, ingestion is really important is that we're actually taking the information and putting it somewhere now that's that's important. There's an important reason for using that word because it's suggesting you're taking a copy of that information and there are other ways of accessing information.

24:35

You don't have to move it. You could report directly from a source system or you could use something called data fabric and data virtualization to kind of pull that information when you needed. It but not ingest the information. OK. So if we're going to, if it's really critical for us and we're going to add we want to tag it and we want to use it effectively and we want to process that information and we wanted to join it with others, then it is best for us to ingest

25:02

that information. So we've got data sources and we're going to ingest at least some of that information into our system. We then need to store that information. So we've got storage. Now generally there are three categories within there. You've got you ingest yours saving what you've ingested and its raw form. So whatever format and what you've got it that we then then do a process of cleaning the information and we curate the data.

25:30

So there are storage involves kind of moving information around. You know, processing that information. So there might be different levels of kind of copies of that information in various states. So we need to have storage now as I just mentioned, we need to have processing and this is where advances in data management have really taken off. So we can you know in terms of AI, machine learning and some of the tools that we've got available, the way in which we

26:02

clean data. This is part of processing and we transform data. The tools out there are much you know are really advanced. And this is where you know the old ETL processes have really been replaced by using some systems that can do that automatically. So if we take raw data and our storage, we may process that data to clean it, transform it, and then we've got a clean data set. So regardless of what format. The data came in after we've cleaned it and transformed it.

26:31

We get it into a clean, useful way, and then we go back and we compress this a bit around, aggregating information, segmenting it. We might have multiple. Different copies of the same piece of data. So it could mash that up and we can enrich the data and then we kind of have this saved back curated data. And again this is kind of a almost a modern day processing

27:01

technique. So if all our data, if you think about it, all our different sources, we had 20 different sources, we inject ingest those, they're in raw format, we clean them and transform them so they get into our data model. We then enrich them and we. Segment them. So now it's in more than just our data model. It's in a useful format. Then we can start to use that information and we use that information in a couple of ways.

27:27

One, we use analytics. So data analytics, this is where the term data analysis came from is when you're starting to use the information and you can think about that and through statistical data analysis, data science and dashboarding. The other way we could use it is we could share it with others or collaborate with other systems. So we were just doing this, use this platform for a part of our business. Then we could provide an API so it could be used by another part

27:55

of the business or another tool. And we could also collaborate with other partners or stakeholders or agencies that want to use our information. So they're two consumption use cases. One is analytics, presenting the information and one is collaboration where we can share the information so other people can consume it.

28:15

Obviously what we also need to think about is there are some more technical kind of access and security capabilities we need, which is roughly called technical data governance, but we need to have a capability for access rights.

28:29

And in terms of controls and auditing and security and then there is another bit which is really important which is called cataloging where we assign, we have a kind of a metadata schemas and potentially data crawlers that will start to tag our data automatically so that they can be used more effectively and kind of self managed within the system. So just to recap, we've got data sources, we ingest those, we've got storage for those, we've got processing.

28:59

Then consumer either through analytics or collaboration, we then have some kind of security access audit capability, which you could call technical data governance and then we've got cataloging. And if you happen to be a I guess a social marketing company or a or an advertising company, you may what we what we term activate that data. So you could use that specifically for advertising or real time marketing, but most companies don't have that piece.

29:30

So those are the technical capabilities that need to exist.

29:34

And if you have those capabilities and you've planned and you have a way of saving or destroying your data, then those middle capabilities that we talked about before, sorry, those middle process steps of collection, collecting, describing, storing, analyzing and using data will be fulfilled by those technical capabilities we just talked about, which were data sources ingestion, storage, processing, analytics, collaboration, cataloging and some kind of data.

30:02

Technical data governance in terms of security and audit and access, OK, so those are the two main parts. If you have all of those pieces, the process and the capabilities, then you have a modern day, well, I guess you have at least a framework for data management. What then makes the difference here is what is doing or what is what technology is performing those capabilities.

30:32

So I mentioned before the likes of AWS, there's Google Cloud, there's Azure, there are many others and they all have their own flavor of tools that carry out those capabilities we talked about. So for example. AWS in terms of their customer data platform, they will suck in many or any data sources and ingest those and it uses things like Amazon Kinetics and it uses Appflow or Amazon API Gateway and then stores it into buckets which called S3 buckets for

31:09

example. It then uses step functions with the process and orchestrate the information. It uses AWS Lambda. AWS Glue for workflows and then pushes that information out to analytics through Amazon Redshift. It even has its own reporting tool. And for data collaboration it uses gateways again and cataloging. It uses AWS Glue for data cataloging. So Amazon has subcomponents, technical pieces of the puzzles or services that it. And allow which allows you to

31:46

fulfill those capabilities. So Sba's, we don't necessarily care what those technical functions necessarily are, features or tools, sorry that we don't really care what they are. We just need to know that we have a way of accessing our sources, ingesting, storing, you know, processing that information and consuming consuming it. And cataloging it. So if we have those in place, then we're not. We don't necessarily need to

32:16

understand these technologies. And most of the time, there's usually a hybrid approach where some of this information is still on premise and not in the cloud. Or you're using AWS to do part of this. And then when you do your reporting, you're using Power BI and it's connecting to this. So it doesn't necessarily mean you need to adopt one whole vendor to do all these things, which might become expensive. However, all those capabilities need to exist. So what does that mean for you

32:47

as a BA? Now you understand that there needs to be this data management framework which includes the life cycle and these capabilities and means that you can actually start to write requirements, start to plan, put together a data plan and start to write requirements that use

33:06

these features. So for example you might say as a user, sorry, as a Yep as a data analyst I would like the New Zealand Census household income data to be ingested into the data warehouse and therefore you can use these terms which means that the if you've grouped those under these kind of features and functional areas. Then the architect then knows that you've got a requirement to use those areas of components, and then it can start thinking about technology that might be

33:49

best fit for that purpose. And of course, depending on how many data sets you have, how complicated they are, that could determine what kind of tools they use in those areas. You can ingest data, for example into Google Sheets or into Excel, and so you know that might be OK as long as you have. The appropriate data governance and security and access rights, which is why we don't really use Excel for that purpose. But that might be OK and that might be fine for us depending

34:17

on the use case we've got. So I hope that that's given you some insights in terms of how data management works and what you need to think about as a BA. I'm sure there's lots of other subtopics we could go into, but I hope you have enjoyed this podcast and I'll see you next time.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript