GCERF reports percentage increases but fails to do the statistical analysis to support statistical s

00:00

Welcome to the Improving Development Evaluation Podcast. I'm your host, David Wand, and in this episode, we're going to revisit the Global Community Engagement Resilience Fund. You can learn more about the Global Community Engagement Resilience Fund, GSURF, at gsurf .org. You might be wondering why I am returning to this particular international development organization. because I featured

00:25

them on two previous episodes. The reason is I've discovered they've actually improved development evaluation in some ways, and in some ways they haven't, but it's a step in the right direction. And you may have figured out through my LinkedIn profile that I used to be at GSurf back in April of 2020. So that's more than six years ago. Back then, they were not in the mood. to do what I've been trying to get organizations to do when it

00:58

comes to improving development evaluation. That is putting on their website their performance measurement framework, or as the Americans call it, the activity monitoring evaluation learning.

01:10

plan or as other people call them results frameworks these are documents that list the outcomes they expect to achieve along with the outcome indicators they expect to use to measure hopefully validly the outcome statements that they're claiming they want to achieve also the data that's associated with those outcome indicators and finally the statistical analysis that shows that if the outcome indicator is indeed a valid measure of the outcome and if for example the percentage on that outcome

01:43

indicator has indeed increased over time from baseline to midline that that increase in percent is actually due to statistical significance if they're using samples rather than due to chance that's what we're trying to get proper evaluation of the project so that the claims that g -surf keeps making on its website and also on its podcast, actually have been achieved because of G -Surf and not due to chance. So far, they have failed

02:18

to do that. So just in retrospect, you may recall on one episode I showed they received $5 million from the United States government. I received two letters from the United States State Department and the United States Agency for International Development saying, We have no activity mail plan. We have no outcome indicators for that $5 million that has been spent. Bye -bye. So that's a problem. Number two, they received $1 .1 million from the Canadian government. Again,

02:51

I asked for the results framework. It took me about a year. I finally got a letter. what's more like a report from GSERF. And in that report, there was absolutely no mentioning of outcome indicators showing any evidence that they had achieved any of the four outcomes that they keep claiming they're achieving on their website.

03:14

Those four outcomes are one, increased social cohesion, two, increased ability to mobilize, organize, represent their own interests, three, increased vocational skills, followed by an increase. in livelihood, such as an increase in income. And four, increased confidence or increased critical thinking skills or increased life skills or increased self -worth or increased resilience, what they like to call sense of purpose. But finally, this episode, we are going to celebrate the fact that

03:47

GSurf has moved on. And despite their resistance over six years ago, when I was with them, they

03:54

were refusing to even mention. a results framework being on their website with indicators and they were also dismissing trying to measure all of the social psychological constructs that they are claiming they're achieving like vocational skills increases social cohesion increases attitude increases in the right direction or perception increases they were not interested at all and one of the reasons they were not interested at all, was because most of them there at GSERF

04:27

have the wrong degrees. They don't have an undergraduate degree in psychology, which is all you need to do these evaluation of these social psychology and clinical psychology constructs. And all you need to do is complete that second year undergraduate university course in inferential statistics, and you'd be off and running. But most of them, unfortunately, at GSERF have PhDs and masters in international relations or international affairs.

04:58

And I've met students who have master's degrees in international affairs and relations, even undergraduate degrees in international relations or affairs. And they never have to take a course in inferential statistics. Why? Because they're often looking at macro level indicators like trade statistics or whatever. So there's a mismatch.

05:21

That is one of the many reasons as to why GSURF was resistant to looking at these basic skills that you need to properly evaluate the services that they're delivering, where they're all focusing on changing individuals and groups of individuals in the four outcomes that they clearly state

05:43

on their website. But now they have finally, on their website, released and published some results and impact reports so I'm going to look at one of them from Ghana and go through it and give you an idea of how far they've come at least they've published it on the website which is a pioneering effort compared to a lot of NGOs and number two they've actually published the data on their outcome indicators so we're going to look at whether or not the outcome indicators

06:18

are indeed valid measures of the outcome? And number two, even if they are, the data that they provide, have they gone to the proper level of

06:27

statistical analysis? That is, have they gone back to their notes in second year university where they took that required inferential statistics course and figured out this is what I should do to make the claim that the percentage increase that I'm claiming has actually gone up due to to statistical significance rather than due to chance, that is understanding the basic second year concept of sampling distributions, then

06:58

we're in a good phase. So before I do that, if you are living in the global south or you, God forbid, have to go to the global south to Allah forbid, monitor and evaluate. an actual international development project in the field, you need a few things. So I'm not going to be talking about desiccated liver or granola. I'm going to talk about products that actually are related to going to the global south or living in the global south, where most of the time it's hot and humid, or

07:35

if it's not, it's raining. You get the idea. So sunglasses, check out Aquila. A -K -I -L -A dot L -A, where you can get sunglasses, including prescription lenses, all made from 50 % recycled material. You get a 10 % discount if you use my code, WANDCOOLT -SHIRTS, or if you use the actual link in the episode notes, which is mydeals .page backslash 1HJL. You'll see it in the episode notes. And I get a 15 % commission when you purchase

08:13

any product using that My Deals link. So you have your sunglasses to protect you from the sun in the global south, but you need the right clothes to be comfortable in that hot and humid climate. So just to remind you, AKILA .LA. That's the website. But better just to click on the My Deals. Link in the episode notes. The second product I've got is men's polos, tees, and button ups at hypernaturalstyle .com. You get 20 % off any product when you use my, my deals link or

08:53

the promo code wand, cool t -shirts. And I get a 20 % commission of the total purchase you make when you use that link. And we. Want to know, so what is so cool about Hypernatural products? They have a patent pending Hypercool Jade technology that when it's hot, lowers your body temperature by three to five degrees. They're also breathable thanks to Supina cotton and they are sold in premium retailers like Nordstrom and were voted the best men's polo shirt in 2024 in Men's Journal,

09:30

Forbes and Esquire. And I've got a new product. which is if you go to terrain .org, that's T -O -R -R -A -I -N .org, they have backpacks, totes, wallets, and bags. But what's really cool about them is, again, they're developed for the global south. They're durable, lightweight, water -resistant, perfect for travel and outdoor use. If you are in the global south, in the field, monitoring and evaluating a project. All Terrain products are crafted from upcycled feed and rice

10:09

bags sourced globally. Each bag's interior is lined with recycled plastic bottles, handcrafted by artisans supporting fair trade. And by purchasing a Terrain bag, customers contribute to global sustainability, which is what international development is all about. You get a 15 % off any product if you use my link that is in the episode notes. Again, mydeals .page backslash 1HJL. I get a 15 % commission of the total cart purchase. That's Terrain. Look at this Ghana report that's called,

10:52

if you go to the gsurf .org website, results paper. Ghana emergency grant. Now to be fair if you go to that grant it'll say it's an abridged version of a larger report. Unfortunately I couldn't find that larger report so maybe some of my comments that I'm going to raise right now are in that larger report. I do not know. I'm assuming they are not in there because you would certainly think for what they post on their website if they knew their statistics they would indicate

11:25

what I'm going to describe. in this abridged shorter version report. But I'm so happy that at least they've moved from trying to hide their performance measurement framework to a point where they're putting parts of it in this report on their website. So there's four sort of areas where there's a weakness in this Ghana report. First of all, some of their outcome indicators, just like in other episodes with other NGOs that

11:54

I've talked about, there's a poor design. where the indicator really doesn't validly measure the expected outcome. Number two, they don't mention in this report just the title of the psychometric tool that they're using to measure the outcome indicator. And I'm going to suggest some possible psychometric tools that they could use, reminding them. that if they had an undergraduate degree in psychology with that second year required course in inferential statistics, this would

12:27

not be an issue. And number three, there are some missing outcome indicators on skills training, but again, they might be in the larger report,

12:36

to be fair to them. Number four, and this is the new part, which I commend GSERF for at least putting the data on their website so we can at least have a... discussion and that is their failure to show that the percent increases that they've shown on certain indicators is actually quote statistically significant rather than due to chance and that's a basic concept that you learn in second year undergraduate social sciences or business because you have to take that stats

13:12

course so if you're majoring in psychology sociology political science or economics That's why you take that course, because later on in the literature, you're going to see p -value. And when I say p -value, I'm not talking about the quantity of urine in a bottle, right? So let's take a look. Poor design of outcome indicators. So one of the things they look at in their report is they talk about improved livelihood and employment. That's the expected outcome. for women. They

13:50

got 200 women they list in the project. And they have the indicator for that as being the percent of women from host and refugee communities with, here's the problem, quote, sufficient productive assets with access to economic opportunities. That's a problem. They've got two variables in there, sufficient productive asset and access to economic opportunities. Anybody who knows how to properly design an indicator, you don't throw in two variables within one indicator.

14:23

That's the first problem. Second problem is, as they've already stated on their website, and I've quoted it earlier, they are responsible for developing a theory of change. In this report, they indicate clearly that they are going to be training women in tailoring, parboiling rice, and shea butter processing. Training them on that. That's the theory of change. If we train them in that, we've already figured out they're going to earn more income from those vocational

14:57

skills they've acquired. They've also figured out, hopefully by their theory of change, that there is adequate demand for tailoring services, parboiled rice, and shea butter. They've already figured that out. I lived in Ghana for three years. You have to be aware they're experiencing importations in rice from Thailand and the United States. Maybe there isn't a market demand for their parboiled rice. Maybe there is because they're taking the imported rice, just boiling

15:29

it and then reselling it. That's fine. But they've already figured out from their theory of change, this is the training we're going to do and it's going to lead to an increase in vocational skill

15:39

followed by an increase in income. Done. indicator is too vague and too complex all they really need to do is look at the mean income of this group of 200 women before when the project started versus later on in the project they don't need this percent they should actually be tracking the income and also more importantly making sure that that income they're tracking is from tailoring par boiling rice selling it and selling shea

16:11

butter right So that's a poor indicator. They need to improve that because it's not a valid measure. The second thing related to that is they're missing, and again, it might be in the larger report, they're missing no indicators, outcome indicators, showing an increase in vocational skill due to the training. You need to test these women after they get the training to see if they can actually tailor properly, to see if they actually know how to process shea butter properly,

16:41

right? There's no measures of that short -term expected outcome first, because first they have to acquire the vocational skill. Pretty straightforward, pretty easy to show, especially when you have only 200 women that you're working with. So there's a failure there on the outcome indicator for skills training. For the youth, it's a little better, but for the women, it's missing. The third issue is... And it's a very simple one.

17:10

It's a basic one. And that is their failure to specify the psychometric tool that they're using to measure increased trust. For example, they have a perception change as an expected outcome for the youth. One of the indicators is percent of youth who trust people from other communities, ethnic backgrounds. So they don't indicate it. They may have a psychometric tool that they've designed. that the evaluation consultant was hired to design, or that they designed even before

17:44

the project started. All they have to do is indicate what that is. And what I've done is just for fun, I used Copilot and found four, well, I'll talk about three different standardized valid measures of generalized trust. The first one, generalized trust scale developed by Toshio. Yamagishi, highly reliable six -item psychometric questionnaire. It measures an individual's expectation of human benevolence and whether they view trusting others as a high risk. It is widely considered

18:18

the gold standard for cross -cultural. That's important because G -Surf is operating in different countries. Cross -cultural psychological research. Another one, it's only one question, comes from the World Values Survey, trust question. It's a single item. forced choice question used across global sociological studies. A third one is Rotor's Interpersonal Trust Scale, a classic comprehensive

18:43

25 -item tool designed by Julian Rotor. It measures a person's generalized expectancy that the verbal or written promises of others can be relied upon.

18:54

Now, another option for them is to hire a psychologist to develop a proprietary private just for g -surf measurement tool on trust because they may have a particular interest and i had recommended in a previous episode that i dealt with g -surf they could also use for example if they want to measure levels of trust over time between madrasa school students religious school students and non -religious public school students where they've held a soccer match and they want to

19:29

see if before the soccer match compared to during and after the levels of social capital or trust between those two groups has increased you can do it with the university of california irvine net uc net software package all they have to do is mention in their performance results framework like a report like this what tool did they actually use so those are the issues there the final one is failure to show that the percent increase is statistically significant based on sampling

20:05

or that the entire population was measured instead. It's not clear if you look at the Ghana report. They do use the S word. They do mention on page six, and I quote, the percentage of youth rejecting violence rose from 69 % to 74%, though these changes were not statistically significant. So that implies that they, out of the 600 youth that they mentioned, that they trained on critical thinking and digital literacy training, that the percent went up from 69 % to 74%. Okay, fair

20:51

enough. It's not statistically significant, but what would be ideal is that they actually showed the sample sizes from the 600. I mean, did they do it twice, two samples? And then you would understand they have to use a sampling distribution, one tail to the right, 95 % confidence. It takes on a Z normal distribution, the sampling distribution for two. population proportions, if you take samples once at baseline and once at midline, this is something they should basically be doing

21:23

and using in all of their reports. If they're going to go to the point of using the words statistically significant, then they have to go back to their second year undergraduate notes and say, ah, this is the test I use. And they can also do it for the mean income for the women at baseline and midline, right? There's a different test for that. You probably remember it. It takes on a T distribution. If I understand correctly, yep. I'm just doing it from memory, but you can

21:53

look it up. One tail to the right. You have to figure out your degrees of freedom. So they get the idea. So that's what they need to do next. Maybe they can go back to the reports and make it clear that they are drawing samples. The other issue that is a little confusing from the report is they're hinting here that they are doing sample proportions. But if you look at the populations, they're not too large to begin with. It's 200 women that got the training on tailoring, parboiling

22:24

rice, shea butter processing. And it's 600 youth individuals that got the training. And it's not clear if they measured the entire populations or samples. I think it's samples. And if it is, they need to show the sample sizes and do the statistical analysis. One tail. to the right, right? Get the critical value for T or Z, depending on the variable you're looking at, either a mean

22:53

or a proportion. And that would go a long way to showing that GSURF is getting slowly, after six years, to the point where they're producing data to support their claims that they've been making since 2020 with no evidence. So now they're getting to the point where they can actually make claims that the percentage increase that they observed and measured was not due to chance because of their knowledge of sampling distribution and instead was, quote, statistically significant.

23:28

So that's good. And I thank them for moving in that direction. It's great to see. So I think what I'll do is I'll look at the other reports

23:37

and do episodes. on this analysis and also their episodes they've also started a podcast and I made a comment to this effect on their first episode where again they made claims that the percentage had increased and it wasn't clear if it was an output or an outcome even if it's an output where they're trying to reach more scale it up as they say you need to show that it's statistically significant But more importantly, I'm focused on the outcomes because that's what

24:10

they like to claim on their website. So thank you for listening. And I'll be back with another episode on GSurf on another one of their reports. I'm particularly interested in their rehabilitation and reintegration, which they talk about on their first episode. And those are basic, again, psychological constructs that have been around for years. where there's all sorts of standardized measures of how do we define and measure reintegration? How

24:44

do we define and measure rehabilitation? And it'd be interesting to see if in those reports, they actually make an effort to do some statistical analysis rather than just saying the percent went up. Thank you for your time. Bye for now.

Transcript source: Provided by creator in RSS feed: download file

GCERF reports percentage increases but fails to do the statistical analysis to support statistical significance: Ghana Results Paper Emergency Grant

Episode description

Transcript