Quality In A.I. (with Adam Smith from Dragonfly) - podcast episode cover

Quality In A.I. (with Adam Smith from Dragonfly)

Apr 02, 202031 min
--:--
--:--
Listen in podcast apps:

Episode description

Join me for a valuable opportunity to explore the capabilities, challenges, and international standards for building quality AI testing systems with Adam Smith, the CTO of Dragonfly (AI testing solutions and consulting) and expert in QA, testing, and AI. 

Transcript

In the digital reality, evolution over revolution prevails. The QA approaches and techniques that worked yesterday will fail you tomorrow. So free your mind. The automation cyborg has been sent back in time. Ted Speaker Jonathon Wright's mission is to help you save the future from bad software. Hey, welcome to the show. Today I've got a very

special guest Adam Smith. I'd be working with Adam for probably the last five or six years. He's a global superstar when it comes to AI. He's on the ISO committee and also does some work with the European Commission. He's also the CTO and of Dragonfly. So we're going to have to find out a little bit more about what's what is Dragonfly.

Hey Jonathan, thank you for having me on your show. Love the intro, by the way. My pleasure. It feels like we're going to need to be sent back in time to get rid of coronavirus at this rate.

So yeah, thanks for introducing me. I'm Chief Technology Officer of Dragonfly. So Dragonfly grew up as a software testing advisory and delivery company. And over the last four or five years, we've moved increasingly into artificial intelligence as well as people solutions, finding the best engineers to solve the problem.

So my background is quite varied as you can imagine. And as you mentioned, I also involved in various international committees on the topic of AI, particularly with a focus on testing and quality. And also do a lot of work with the British Community Society and their special interest group in software testing.

And you were saying you do with with Dragonfly, you're doing AI solutions. So you've got both sides of the view, then the building, the standards and the guidelines on ethics and how to successfully deploy and test an AI platform, as well as actually implementing one. So what kind of start at that journey?

That's a great question. I guess we started developing a product that used machine learning in order to help us to make the right decisions on projects from day to day. So who should fix this problem, which test should we should be run next? And we got really interested being testing specialists in how to test this.

And when we started looking at this, I realized that this was actually one of the biggest problems with with the artificial intelligence space, the probabilistic nature of the technology, making it difficult to prove whether results were correct or not. And this is a common theme if you talk to people who are integrating AI into into systems is that the two biggest concerns are one, talk into customers and getting them to make decisions about what they need and to validating the solution.

So that's going to how we ended up getting involved in it. Now we are very involved in it. Lots of committee meetings, lots of work on standardization and lots of clients who have real, real questions, real problems around how to implement QA and AI context. And I guess you just mentioned briefly on there about the kind of the cognitive bias. I know you've been to see one of your speeches before.

Could you tell the audience a little bit about what you found in cognitive biases is some of the main themes? Sure, I guess cognitive biases are one type of bias. They're essentially biases that are present in humans. And they are performance shortcuts, essentially. Without gathering all the facts and making decision-based and all the information, shortcuts are taken for essentially performance reasons.

And when these cognitive biases start to impact system development, whether it is a case of someone making a decision that isn't relevant for all users of the system or all records the data we processed, that bias can get manifested in that system forever more. But that's compounded by issues relating to data and statistical bias, where for various reasons, which might be historical or might be relating to cognitive biases, data sets that are used to train AI systems don't reflect reality.

And what that means is those systems then propagate these biases into future processing. And there's also another type of bias, which is another form of cognitive bias, which is all about how people react to interacting with systems.

So an example, a really good example of this is automation bias, where you've got a self-driving car, and you may become overly complacent about the ability of the car to say avoid pedestrians. And then you may hear to pedestrians. So there's been a number of well-documented self-driving car accidents, where this is exactly what's happened.

And in a more meenial example, this is quite common in systems where things get pre-populated or made slightly easier for a human to do, and then they're simply approving a result. They very quickly become complacent. And for a testing point of view, this one thing is really important because you have to understand not only that there may be biases in the system design, in the data, but also in actually how it's used.

So really interesting topic with frankly hundreds and hundreds of different ways that it's those bias can manifest in systems. Absolutely, and I noticed you posted recently around doing a B.A.'s anonymous wheelchairs at JFK. I thought I was quite fascinated in the sense of you know, you're combining what is in essence CAR2X and people are working, you know,

being establishing autonomous vehicles to kind of smart building. So you know, the smart airports, you know, probably looking at how they navigate around. I remember, you know, when I was in Silicon Valley, seeing the security guards, the little robots that walk around there, there was this one incident which you might have seen in the press where some, you know, some kids ran up to it.

And so therefore it had a choice to go left or right, decided to go right and then fell into a fountain and a round which is a sad story. But at the same time, you know, it's interesting because you know, we always look at the obvious ones, which is, you know, a car, autonomous cars driving along on the left hand side, there's a someone on a motorbike, not wearing a helmet, which is irresponsible.

On the right hand side, there's a car with a, you know, a baby on board sticker, an SUV, high end cap, you know, which way does it do? Now obviously people always, you put those kind of, you know, questions in and we know based on, you know, Tesla's autopilot, you know, it's not that advanced, you know, it's not taking that kind of information and doing anything with it, it makes a decision based on the numbers it's got.

So, you know, are you finding when you're teaching your AI the large amount of data sets that you need with all those possibilities are, you know, it makes, is the biggest problem. It is, but it's also anticipating the risk, much like testing any kind of system, right? You're trying to establish all the different things that can go wrong and trying to come up with examples that prove that it doesn't go wrong in that example.

With AI systems, particularly ones that actuate in the real world, there's a lot of thinking you have to do around the all the things that could happen, whether it is complex ethical scenarios, like the one you're alluding to the trolley problem, or whether it is more menial, what will happen if these two things happen at once, and target samples, but all of these things need to be identified with a critical thinking mindset before you can really set out a testing strategy.

And there's part of this that are common with existing technology, you know, you can look at some kind of system that has lots of hardware interfaces and say, well, that's processing real world data, it's just as complex. But once you combine the amount of data that's coming in with the speed and the volume of decision making that can occur and the amount of feedback into the real world, it becomes a really complicated test environment planning problem in a way.

That's amazing, because I know you've worked and you've kind of managed global automation projects for large investment banks with thousands of projects. And test environments are always one of those complex things, you know, data is always a difficult one, you know, we had a huge price on the show quite recently and he was saying about, you know, even production only has a very small subset of all possibilities and journeys through the systems.

So, you know, we're able to model out those ones without any cognitive bias and, you know, train your system, you know, I noticed you've done a lot of work with Rex Black on the new A4Q AI and software testing course. You know, it's really tips, you know, for listeners, I mean, how they could potentially get started and learn more about the importance of AI and testing.

Yeah, absolutely. I mean, a really good way, if you are, if you have kind of a technical lens on the world is to do a $10, $10 Euro introduction to machine learning style course, because those courses will inevitably cover some of the quality problems. They'll explain them in a different way to a quality specialist would explain them, but from a statistical point of view, they'll really help build that understanding.

And that's what QA specialist need. It's first of all an understanding of all the different risks that can occur at different stages of the life cycle. And second, it is a list of tools and techniques that can use to mitigate or prove the absence of those particular quality problems.

Now, the first one of those, as I say, you can do a bit of machine learning self-study, but also you can read quite a lot about this in the press in terms of things like facial recognition accuracy or ethical issues. And these issues are primarily fairly headline grabbing things that are in the press, but you can quickly boil them down into more menial examples that you'd be likely to come across in a testing enterprise testing constructs.

The second piece of this, which is one of the tools and techniques available, there is some specific techniques out there that are designed for AI, so metamorphic testing spreads to mind. There's also techniques that are borrowed, if you like, from the medical side of the technical world, such as expert panels, ways of assessing whether a system has given the correct output when it isn't clear what the correct output would actually be.

Within the actual model training and testing process, there's a huge range of different statistical concepts that are useful to understand. They, again, they do require a bit of technical understanding, but those are really, really important because in a probabilistic system, you cannot usually report a simple pass or fail on some of the core functionality.

You have to report in terms of degrees of confidence, even the false positives, false negatives, the area under the curve, lots of different metrics that you wouldn't be familiar with as a testing specialist. It sounds like for those people who want to dip their toe in, there's some good recommendations there, and also worth having a look on the ISQI.org website as well, because I think that's a great course.

That's using Jupiter Notebook to do the AI stuff. Have you found that there's any other books or resources that maybe help people get to grips with Jupiter? Oh, I think Jupiter is pretty intuitive, usually. It's something Mr. Hayes sometimes, but I find it really useful in a training context because you can step by line by line, run each line once at a time.

If you want to have a go on it, it's actually an online service, if you do go to Jupiter Notebook, I think it might be called Saturn, that allows you to spin up Jupiter Notebooks without actually installing any software, which if you want to try out a bit of Python with some machine learning tools is the perfect platform to do it.

Yeah, absolutely. I also would recommend Kaggle as well. The amount that I learned from just doing the Titanic example was huge, because like you said, you're not going to get a direct answer out, you're going to build your confidence up, the more and more information, and the further you get.

I know some guys over there have got huge scores where they're able to clarify which people died and which people didn't die on the Titanic. Fair enough, that's probably not as useful real world example, but could you tell us some of the real world examples that you're doing with your AI? It's nearer your AI, isn't it?

It is, yes, but first actually on the Titanic example, there's a great example of bias built into that because if you took the survival data of the Titanic and tried to predict the survival rate on the Louisiana, which I think sank two or three years later, you would predict people to survive very incorrectly.

And one of the reasons is people stopped, or people think, people stopped putting women and children first on the life boats, whereas in the Titanic example, that was very much what they did. So there's a great example there of bias built into those two ships, which show how by taking data from one particular event or scenario and then trying to use it to predict a different event or scenario, you can end up with different results based on

immutable characteristics of people like their gender or their social class. But just to mention Neuro there, just to talk more about Neuro, so Neuro really is about saving time for middle managers on large IT projects. It's really about taking away the onerous information gathering from multiple systems, the processing of that data, identifying the outliers and identifying the so-what, and trying to automate common tasks like deciding what to fix next,

having which tests to run and things like that. And as well as having taught us a lot about AI and allowed us to really become the UK leader in terms of AI and software quality, it's a fantastic tool that we're enjoying rolling out to our clients at the moment.

When people realize how they can save that hour off their morning by not having to do many things like work assignment or large projects, their eyes light up realizing that AI can actually save time at the management layer, not just in Amazon warehouses, stuffing shelves and things like that.

And that's one of the things I was really impressed when I first got a bit of a demo. I think Dan gave me a talk about Neuro and some of the capabilities that you did, some of the stats that you managed to achieve as far as accuracy around assigning defects, predicting who to assign the defects to. And also being able to use different data sources, like whether it's Jura or ALAB as well of structured data, as well as a combination of unstructured data.

Do you find that it takes a bit of time for it to get up on the accuracy to start ramping up once you start taking different data sources from different client sites? Not unless it's a new company, because usually we have some history. Normally a company's done a project before and they've already got a wealth of structured data that can provide that initial baseline for training.

It's not like when you sign up for a Facebook account and Facebook doesn't really know anything about you until you fill up your profile and start clicking on things. We can profile the team and the organization immediately based on the last thing they did. That's really the difference between consumer enterprise machine learning.

I think enterprise AI is a really interesting one because the data visualization combined with, like you mentioned, the decision support, automation and analysis, those capabilities give it a single pane of view. The truth based on real data as it's happening. You've not got a whole stack of PMOs that are harvesting this data and as soon as they get it, it's instantly out of date.

What I class as a proxy where they're just passing the information across, they're not actually adding any value on top of it. A system like this would save so much time within an organization. Do you think you're going to expand it past project management as well?

Absolutely. We do intend to. We started off in a testing and defect space and now we've expanded to full kind of SDLC. We've got project management cost management. We're integrating with things like Git and Jenkins now to give more of a DevOps view. Our next stage is going to be expanding it to other sectors. You can obviously take the paradigm of building software and apply it to a lot of different project constructs. We can apply the same thing to say the construction industry.

We can also apply it to non-change constructs. We've got some clients who are talking about building more operational monitoring of business information functionality that gives them insights into their, whatever their business is, rather than necessarily their change projects. It's really fascinating. I know you do split your time between all these additional communities of practice. I know you're key to some of the new standards that are coming through with the ISO.

What recommendations would you have to go out and get information or understand what those standards, how they may potentially impact them and how to reach out to maybe you guys and help them with some of these challenges around testing and also AI? I guess first of all, in terms of standards, I think standards are incredibly important. You hear a lot of people in the community who are not that impressed or don't rate IS, TQB, etc. very highly, which is fine, that's their view.

My view is that everybody needs a common language. It's crucial to productivity that people are able to move from one project to another and be able to use the same words to describe the same things and the same methods at a level. If everyone meant different things by performance testing, it would take a lot longer to get things done on change projects.

One of the ways that people learn these things is through training courses and the way training courses develop their content is usually based on best practice and international consensus on things like standards.

I think these things are very important. I think it's very important that there is a wide range of people involved in them. People that made a tract from current standards in quality and testing, my ask to them is to come and contribute to help us improve standards and to build out the next level of standards for the next set of technology problems that we're going to be faced with.

Some of the standards that I refer back to with great regularity are square. I don't know if you're familiar with ISO IEC255,000 series, which is all about language for a specifying quality and quality requirements. This doesn't currently cater for AI, which it needs to. For example, a lot of people talk about AI robustness and people need slightly different things by robustness depending on the angle they're coming at it from.

If you talk to a traditional software tester, they're more likely to think about that being resilient over time. Whereas in an AI world, whilst it still means that, it also specifically means things around specific types of data that can cause models to react badly, adversarial attacks and things like that. We all need to talk to the same language. The only way we can get there is but through consensus and the mechanism that exists in the industry to do that is international standards.

I've been involved in from an AI and from a quality and testing perspective in that community. One of the things that I'm quite keen on is, as I implied there, is improving the ISO 25,000 series square so that it caches better for AI. There is a group working on a technical report, which is an extension of the other standard that I look at a lot. ISO AC IEEE 29119, which is a set of standards that covers software testing.

Now the technical report that's coming out is I believe part 11 of that standard is an extension upon it to cover AI. There will probably be a future international standard, which is specifically about how you can form with requirements to verify AI quality. In terms of how people can get involved in it, this kind of thing is generally done for your national body. In the UK, we have the British Standards Institute who decide who inputs to that process.

But there is other ways to get involved such as through a professional association. As you know, Jonathan, we're based on the committee for British Community Society Special Interest Group in software testing. We have the means to review new standards input on into standards development through that forum. For anyone that's a member of that Special Interest Group, they're able to reach out to us and participate.

Anyone that isn't a member, either needs to approach their national standardisation body or find a similar organisation that represents their profession that has input. I think that's great advice. I know you're pretty much revitalising the BCS. You've come back with a fresh approach. You've revitalised it. It's got the first event that's running this month, which is going to be a virtual event. How excited are you about the new lineup, the new format and maybe a new chapter for the BCS SIGGEST?

When I joined the testing industry, which is quite a long time ago, one of the first events I went to was a SIGGEST event. It really struck me as being fundamentally different to other events. It wasn't full of people trying to sell me things. I didn't have a commercial feel. I had a feel of expert sharing knowledge. A lot of the testing comments are, I go to don't have that feel. There are excellent exceptions. I think there is a huge space for non-commercial association for this profession.

That's not just so that we can run events. That's so that we can do things like participating, developing a university syllabus, giving people apprentices who are coming up through the modern apprenticeship scheme, but probably haven't yet developed beyond the initial understanding of testing, gives them a way that they can get more involved in the community. It's also about including more people in the events.

We had a conversation about doing an online only event specifically to make sure that we were able to participate. We've been overtaken by events slightly and now all events are online. That's the sort of thing that we're able to do as a not-for-profit that is much more difficult for a commercial organisation to do. Finally, as well as participating in standards, I really want to develop the links with academia and PhD level study around the software testing topic.

There are several universities in the United Kingdom that do have a specialisation in testing that I really want us to start working with as a professional body. That's why I've got involved in SIGGIST. Again, if you like, is to drive some of these things that are not necessarily commercial in nature and to provide that not-for-profit forum for the profession.

In terms of revitalising it, I think the team that had been running it and doing a great job for many years, I think some fresh energy was needed, particularly given the style of events has moved on a little bit over recent years. We've got a whole bunch of people including yourself, Jonathan, who joined with similar views to me, people who want to focus on things like inclusion and accessibility, people who want to develop the industry or the profession, I suppose, and it's profile.

And people who want to modernise things and get something back. Yes, it's absolutely excellent work. I'd recommend anybody who's not checked out the BCS website for a while, go in type in BCS SIGGIST. You can apply easy. It's actually quite low-cost. We're trying to get inclusion for also for kind of university students. So you work quite closely with universities because you're in Spain at the moment. One of your indeed locations is quite tied quite closely to the university, isn't it?

Yeah, here in Barcelona we work with Barcelona Tech. In the UK you also work with a number of other universities. Our focus here in Spain is more about data science, whereas our focus in the UK is more on the quality and testing side. I think it's a great push towards really changing the way that the new generation comes through. It just falls into quality and testing, actually has a curriculum, has the support.

How did you start? Was this something you did at university or how did you get into testing? Everyone's got a story, right? I'm not buying that question, I think. So I started actually, I was working at a large financial company doing complex financial administration. They needed somebody to go and do a UAT, a particular area that I specialised in. I ended up shipping to the other end of the country and living in a hotel for the duration of UAT.

As we all know, things don't go to plan. So ended up being a year in a Norwich Airport Hotel, which was exactly the same time as Alan Partridge was getting famous. I got quite a lot of jokes from my mates. But at the end of that long UAT phase, the IT department said, you seem pretty good at breaking stuff. Do you want to come and do this full time?

That's really when I started taking IT seriously as a career. I've been coding since I was a kid, but hadn't really considered the profession as what I wanted to do until this point. I think what made a difference to me was I realised that IT is not just about coding. There's a lot more disciplines, there's a lot more things involved. That's managing the complexity and the ambiguity is actually super interesting.

From then on, I studied various open university courses to get myself up to speed in various topics. 20 years later, consider myself a pretty competent tester and developer. I can definitely vouch for that. And for the Piccadilly group, there's some incredible talent that you guys have got there.

Just before we kind of wrap down, is there really a chance you could share the best ways to get in touch with you and also how to find out more about things like Nero and the Piccadilly side of things as well? I lost you there the best way to get in touch with you. For the listeners to find out more about what you've talked about today and also what you're doing with Nero. The best way to get in touch with me is probably Twitter.

I'm Adam Leon Smith or you can email me at Adam at weadragonfly.co. Our website we are dragonfly.co will tell you about some of the things that we're doing. It will tell you about Nero. It will also take you to the Piccadilly group website, which is where we advertise our training. We are actually doing a AI and software testing course in about three weeks that we're going to do fully online.

We're actually just doing it two hours a day, so 8 till 10 in the morning, over a series of sessions so that people can fit it in a bit easier around their job. Of course I'm on LinkedIn as well. Awesome. I might actually sign up for that course as well. So thanks so much Adam and it's been an absolute pleasure and you're doing some amazing work not only pushing the technology forwards in the AI space but also for the community. Thanks so much for being on the show. Thank you for having a giant.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.