Software Testing: A Craftsman's Approach

Speaker 1

00:00

Welcome to the deep dive, your shortcut to truly understanding complex topics. Today we're taking a fresh look at something often dismissed as well tedious, maybe just a box checking activity. Software testing.

Speaker 2

00:14

Yeah, it definitely gets that reputation sometimes.

Speaker 1

00:17

But what if I told you it's actually a deeply creative, intricate craft, almost like what a skilled artisan brings to their masterpiece.

Speaker 2

00:25

It's absolutely true. Software testing is far more than just you know, finding bugs. It's really about understanding the very fabric of how software is built, how it behaves, and ultimately how it's validated for quality.

Speaker 1

00:37

That's exactly right. Today we're going to pull back the layers on the craft of software testing, drawing insights from Pusey Jorgensen's Software Testing a crafts Fund's approach.

Speaker 2

00:46

It's a foundational text in the field.

Speaker 1

00:48

Our goal here is to unpack the fundamental concepts, the diverse techniques, and maybe some surprising real world applications, so you walk away with a much richer understanding and maybe even a new appreciation for this qui discipline that really underpins almost everything digital around us.

Speaker 2

01:04

Sounds like a great plan. Where should we start?

Speaker 1

01:06

So let's start right at the beginning, what testing really is. Jorgensen gives us this clear progression of terms that are fundamental. It all begins with a human error, a.

Speaker 2

01:19

Mistake, right, someone makes a mistake while coding.

Speaker 1

01:21

That's the error, and that human error then shows up in the software as.

Speaker 2

01:25

A fault, which is the bug or defect people talk about. Yeah, it's a representation of that mistake in the code.

Speaker 1

01:31

And it's crucial to remember this human element, isn't it. We test because we know that we're fallible.

Speaker 2

01:37

Especially in something as complex as software.

Speaker 1

01:39

We make mistakes, and it continues from there. When that code containing the fault actually gets executed.

Speaker 2

01:45

That's when you get a failure. The software doesn't do what it's supposed to do.

Speaker 1

01:48

Okay, And finally, the incident is what the user actually sees exactly.

Speaker 2

01:52

It's the symptom, the alert, the thing that makes someone say, hey, something went wrong here. So it's this chain reaction.

Speaker 1

01:58

Error leads to fault, leads to failure, leads to incident.

Speaker 2

02:02

You got it, human mistake, bugging code, wrong behavior, user notices.

Speaker 1

02:08

So given that chain, a test becomes this deliberate act right exercising the software with specific.

Speaker 2

02:14

Test cases, right and it has two main goals. Either you're trying to find those failures hut them down, or you're trying to demonstrate confidently that the software is working correctly under certain conditions.

Speaker 1

02:27

And a test case. It's not just throwing random stuff at the program.

Speaker 2

02:31

No, no, not at all. A craftsman builds a test case carefully. It needs an identity, a clear purpose like testing a specific business rule. Okay, it needs to find preconditions, specific inputs, and crucially the exact expected outputs.

Speaker 1

02:47

You need to know what right looks.

Speaker 2

02:49

Like absolutely, and even the expected state of the system after the test runs. Plus you keep track of its execution history. It's a complete, thoughtful construction.

Speaker 1

02:57

So that's the what and why. Now how does a craftsman actually approach testing? Jorgensen highlights too sort of fundamental philosophies.

Speaker 2

03:04

Yeah, specification based testing versus code based testing.

Speaker 1

03:07

Let's unpack specification based first. That's also known as black box testing.

Speaker 2

03:11

Right, or functional testing. The core idea is you're designing your tests based purely on the software's requirements the specs, without looking at the internal code.

Speaker 1

03:21

Like testing a car, you use the steering wheel of the pedals, check the.

Speaker 2

03:25

Lights, But you don't need to know how the engine or the wiring actually works inside. You just know what it should do based on the user manual. Basically, that sounds.

Speaker 1

03:33

Really useful during development. What are the big advantages there?

Speaker 2

03:37

Well, A huge one is that the test cases are independent of the actual implementation, meaning even if the developers completely rewrite a section of code, as long as it's supposed to do the same thing according to the spec, your test case is still valid.

Speaker 1

03:52

Oh okay, that's powerful.

Speaker 2

03:53

And it also means testing can start earlier, maybe even happen in parallel with development, which can speed things up.

Speaker 1

04:00

But I'm guessing there's a downside.

Speaker 2

04:02

I catch there is because you're not looking inside, you can end up with redundant tests, multiple tests checking the same underlying logic without realizing it wasted effort, and more critically, you can have gaps, big blind spots where parts of the software might just never get tested because the spec didn't explicitly cover some weird internal case.

Speaker 1

04:23

Okay, So that leads us to the other side. Code based testing right.

Speaker 2

04:27

Sometimes called white box testing. Here you are looking at the source code. The tests are designed based on the program's structure, its paths its conditions.

Speaker 1

04:37

What's the main limitation there? Then it sounds more thorough.

Speaker 2

04:40

It can be for certain things, but its main weakness is identifying behaviors that were never programmed at all but should have been.

Speaker 1

04:47

Ah omissions, things missing from the requirements in the code exactly.

Speaker 2

04:51

Or think about something malicious like a trojan warp someone slipped in if it wasn't in the spec and it's just extra code doing bad stuff. Spec based won't find it, and code based testing might just test the paths through it without realizing its malicious intent. It struggles with things that aren't there but should be, or things that are there but shouldn't be if they weren't specified.

Speaker 1

05:11

So this sounds like a classic debate, black box versus white box? Is one just better?

Speaker 2

05:15

Well, if you step back, you see pretty quickly that neither one alone is really sufficient.

Speaker 1

05:19

Why not?

Speaker 2

05:20

Code based testing, like we said, won't find requirements that were completely missed in the lamentation, and spec based testing won't find extra, maybe unwanted behaviors that got coded in but weren't in the spec.

Speaker 1

05:33

So the craftsman's approach isn't about picking a side in this great debate.

Speaker 2

05:38

Exactly, it's not either. The real answer, the craftsman's answer is a smart combination.

Speaker 1

05:43

Okay, how does that work?

Speaker 2

05:45

You use both You design tests based on the specification to ensure functionality, and you use code based techniques, especially coverage metrics, to see what parts of the actual code those tests are exercising.

Speaker 1

05:57

Ah, So the code coverage tells you about the gaps and redencies in your spec based tests.

Speaker 2

06:02

Precisely, it gives you that measurement, that confidence. You get the functional assurance from spec based testing and the structural assurance and efficiency check from code based testing. It's about blending the strengths of both views.

Speaker 1

06:14

That makes a lot of sense, combining perspectives for a fuller picture. Okay, So with these foundational ideas, the error chain, the two approaches, how does the craftsmen apply this in a real project. Jorgensen talks about levels of testing often shown using the V model.

Speaker 2

06:33

Right. The V model is a variation of the older Waterfall model, but it's really useful because it visually emphasizes how testing activities should mirror development activities. How So, well, on the left side of the V you have the development phases going down requirements high level design, detailed design coding okay, and on the right side. Going up, you

06:52

have the corresponding testing levels. Unit testing validates the code from detailed design, Integration testing validates the interfaces from high level design, and system testing validates the whole thing against the initial requirements.

Speaker 1

07:05

So each test level connects back to a design level exactly.

Speaker 2

07:08

It builds in quality checks at each stage rather than waiting until the very end. The idea is to catch faults as close as possible to where they.

Speaker 1

07:16

Were introduced, cheaper to fix them earlier, much much cheaper. Okay, So let's dive into that first level unit testing. This is where the craftsman is working on individual components, right, like a single function or class.

Speaker 2

07:28

Yep, the smallest testable pieces of the software. And there's a whole toolbox of techniques.

Speaker 1

07:33

For this level. Let's start with boundary value testing. You called these the off by one detectives earlier.

Speaker 2

07:38

Ah, yeah, that's a good way to think about it. The core idea is simple. Experience shows that programmers often make mistakes right at the edges the boundaries of input ranges, like using when.

Speaker 1

07:48

They meant or starting a loop counter at one instead of zero.

Speaker 2

07:51

Exactly those kinds of things off by one errors. So boundary value testing says, Okay, if an input is valid between one and one hundred, don't just test fifty.

Speaker 1

08:00

Test the edges.

Speaker 2

08:00

Test the minimum one, minimum plus one two a nominal value like fifty, maximum one ninety nine, and the maximum one hundred.

Speaker 1

08:09

Makes sense, and what about robust testing?

Speaker 2

08:11

Robust boundary value testing goes one step further. It says, let's also test values just outside the valid range, so minimum one zero in our example, and maximum plus one one oh one.

Speaker 1

08:21

Why test invalid inputs.

Speaker 2

08:23

Because that's often where unexpected crashes or even security vulnerabilities happen. How does the system handle bad data? Robust testing checks that. There's also worst case testing, which gets pretty complex testing combinations of boundary values.

Speaker 1

08:37

Okay, boundaries are critical. Yeah, but testing every boundary for every variable sounds like it could create a lot of tests.

Speaker 2

08:43

It can't, And that leads nicely into the next technique, equivalence class testing.

Speaker 1

08:48

Oh did that help?

Speaker 2

08:49

It's all about smart simplification. It tackles the potential redundancy you might get from just boundary testing. The idea comes from math, from partitions. Okay, you try to identify groups or classes of inputs that the program should treat exactly the same way. If you put in three, four, or five, and the code follows the exact same logic path for all of them, they form an equivalence class.

Speaker 1

09:12

Ah, so you don't need to test all three exactly.

Speaker 2

09:14

The assumption is if you test one representative value from that class, say four, it tells you how the program behaves for all the other values in that class three and five.

Speaker 1

09:23

That sounds much more efficient.

Speaker 2

09:25

Hugely efficient. It aims for completeness. Every possible input belongs to some class, and non redundancy no input belongs to more than one class. Ideally you get great coverage with fewer tests.

Speaker 1

09:37

Are there different types like with boundary testing.

Speaker 2

09:40

Yes, similar ideas. Weak normal tests one value from each valid class, Strong normal tests combinations of valid classes, and then weak robust and strong robust ad testing for the invalid equivalence classes inputs that should cause an error.

Speaker 1

09:55

So boundary values handle the edges. Equivalence classes handle the broad range efficiently. What if the logic itself is really complicated, lots of nested if statements or complex conditions.

Speaker 2

10:06

That's where decision table based testing shines. It's a very rigorous logical way to approach complex decision logic.

Speaker 1

10:12

How does it work? You literally build a.

Speaker 2

10:14

Table you do you list all the conditions, the inputs or system states that affect the decision, and all the possible actions what the software should do. Then you systematically map out every possible combination of condition outcomes true, false, and the corresponding action.

Speaker 1

10:29

Sounds like it could get huge.

Speaker 2

10:30

It can initially, but the real power comes when you analyze the table. You often find don't care conditions, situations where the outcome of one condition doesn't actually matter if another condition is met.

Speaker 1

10:43

Ah, So you can simplify the.

Speaker 2

10:45

Table exactly you collapse rules. Jorgensen uses the next date function exam, a function to calculate the date after a given date. The initial table might seem to have hundreds of rules when you consider day, month, year, leap yer rules.

Speaker 1

10:59

Yeah yeah, that sounds complicated.

Speaker 2

11:01

But by using decision tables and identifying those don't care conditions they could reduce it down to just a handful of essential test cases that covered all the logic. It shows how testing can actually improve the program's design by clarifying the logic.

Speaker 1

11:15

Pesting clarifying the code, not just finding bugs. I like that. Okay, so we've looked at inputs in logic. What about the actual flow of the code, the paths.

Speaker 2

11:24

It takes, right, that's path testing here. The graftsmanship's focus to the control flow through the program.

Speaker 1

11:30

How do you visualize that?

Speaker 2

11:31

You often use program graphs. Think of nodes as chunks of code like statement fragments, and edges as the flow of control between them, like an if statement creating a branch. We often look at ddpaths decision to decision.

Speaker 1

11:45

Paths, okay, paths between decisions.

Speaker 2

11:47

The goal is to design tests that exercise different paths through this graph. There are different levels of coverage you might aim for, Like what the simplest is node coverage or statement coverage. Just make making sure every single statement in the code gets executed at least once by some.

Speaker 1

12:04

Test seems like a minimum baseline.

Speaker 2

12:06

It is stronger is edge coverage or branch coverage. Making sure every possible outcome of every decision, like the shrewd and false branches of an if statement, gets executed at least once.

Speaker 1

12:19

That sounds more thorough it generally is.

Speaker 2

12:21

And this is where we often hear about cyclomatic complexity right VG.

Speaker 1

12:24

What does that number actually tell us?

Speaker 2

12:26

It's a metric calculated from the program graph number of edges minus number of nodes plus one. Essentially, it gives you the number of independent paths through the code. The independent paths basically the minimum number of paths you'd need to test to ensure you've covered every edge at least once. It's a measure of the code's structural complexity.

Speaker 1

12:45

Is there a rule of thumb for it?

Speaker 2

12:47

A common guideline is that if the cyclomatic complexity HERMBG gets above ten for a single function or module, that code is getting pretty complex. It'll likely be harder to understand, harder to test, and potentially more prone to errors.

Speaker 1

13:02

So it's a warning sign for developers and testers exactly.

Speaker 2

13:06

It suggests maybe breaking the code down into smaller, simpler pieces.

Speaker 1

13:10

But wait, can you have paths in the graph that look possible but you can't actually execute them?

Speaker 2

13:15

Absolutely? Those are called infeasible paths. It's a major headache in path testing. Why way happen because the structure of the graph doesn't always capture the semantic dependencies. Maybe path A set's a variable to true and path B requires that variable to be false. You can draw the path, but you can never actually make the program follow it. Designing tests for infeasible path is wasted effort.

Speaker 1

13:37

Tricky, and you mentioned something even more rigorous for critical systems.

Speaker 2

13:41

Yes, for safety critical stuff like aviation software level A, they often require modified condition decision coverage or MCDC.

Speaker 1

13:50

What does that involved.

Speaker 2

13:51

It's pretty intense. For every decision with multiple conditions like A and B or C, you need test cases that show that each individual condition A, B, and C can independently affect the outcome of the entire decision, while the other conditions are held constant.

Speaker 1

14:06

Wow, that's ensuring every part of the logic really matters precisely.

Speaker 2

14:10

It's about preventing situations where a condition seems to be tested but its effect is masked by other conditions. Extremely thorough.

Speaker 1

14:18

Okay, so path testing covers the flow, but what about the data itself? What happens to variables is they move along these paths?

Speaker 2

14:24

Excellent question. That brings us to data flow testing. This technique shifts the focus from just the control flow paths to the life cycle of variables within those paths.

Speaker 1

14:34

LIFECYC.

Speaker 2

14:35

Yeah, where does a variable get defined, get a value, and where does it get used? Its value is read. Data flow testing looks for paths between a definition of a variable and a subsequent use of that same variable. These are called definition use.

Speaker 1

14:48

Paths or do paths. Why is that important?

Speaker 2

14:51

Well, it helps catch errors like using a variable before it's been initialized, or defining a variable and then never actually using its value. It acts as a kind of reality check on pure path.

Speaker 1

15:02

Testing, so it connects the control flow with what's actually happening to the data exactly.

Speaker 2

15:07

There's a whole hierarchy of dataflow coverage criteria like all deaths tests at least one path from every definition, all uses test paths to every use. All the upaths test every simple definition use path. It's particularly good for object oriented code where data interactions can be complex.

Speaker 1

15:24

Makes sense, okay? One more in the unit testing toolbox program slicing. This sounds different.

Speaker 2

15:29

It is a bit different, but incredibly useful, especially for debugging and understanding code. A slice of a program relative to a specific variable at a specific point is the subset of program statements that could possibly affect the value of that variable at that point.

Speaker 1

15:45

So it's like highlighting only the relevant code exactly.

Speaker 2

15:48

You can do a backward slice starting from a vary goal, trace back everything that could have influenced its value, or a forward slice starting from where a variable is defined, see everything that it could possibly influence later on.

Speaker 1

16:01

I can see how that would help with debugging. Focuses your attention tremendously.

Speaker 2

16:05

It helps eliminate all the irrelevant detail and lets the craftsmen focus precisely where the problem might be. Jorgensen even suggests that developing programs in terms of compilable slices could be a powerful way to build and understand complex software.

Speaker 1

16:19

Interesting idea. Okay, so that's a lot of techniques just for unit testing, boundary values, equivalence classes, decision tables, path testing, data flow slicing. How does the craftsmen put it all together? You mentioned Jorgensen's testing pendulum earlier, right.

Speaker 2

16:32

Let's bring that back. It's that metaphor of testing swinging between the specification based view, the black box.

Speaker 1

16:39

High level functional requirement.

Speaker 2

16:41

Focused and the code based view, the white box.

Speaker 1

16:44

Low level structural implementation focused.

Speaker 2

16:47

And the pendulum highlights that neither extreme is sufficient on its own. The real skill the craft lies in the combination.

Speaker 1

16:55

How does that play out.

Speaker 2

16:56

Practically, A common approach is to start by choosing a specification based technique, maybe equivalence classes or boundary values, to define an initial set of tests based on what the system should do.

Speaker 1

17:08

Okay, get the functional coverage first.

Speaker 2

17:10

Then you run those tests and use code coverage tools which come from the code based world, to measure which parts of the code were actually executed.

Speaker 1

17:18

Ah, And that measurement reveals the gas exactly.

Speaker 2

17:21

It shows you the parts of the code your spec based tests didn't reach. It might also reveal redundancies. If multiple tests exercise the exact same code path, then you can design additional tests, perhaps using path or data flow ideas specifically to fill those gaps.

Speaker 1

17:36

So it's an iterative refinement using both perspectives.

Speaker 2

17:38

Precisely, you leverage the strengths of both. Jorgensen uses an insurance premium case study to show this how you'd pick different techniques based on whether variables are physical quantities or logical flags, whether faults are likely independent or interacting, and how to handle exceptions. It really emphasizes choosing the right tool or combination of tools for this job.

Speaker 1

18:01

The hallmark of a craftsman. Okay, let's zoom out. Now we've covered the unit level in detail. How does testing fit into the bigger picture across different software development life cycles? Good question.

Speaker 2

18:11

The traditional waterfall model, as we touched on with the V model, was very linear. Design everything first, then build it, then test it in stages unit integrash system. It kind of assumed you could know everything perfectly upfront.

Speaker 1

18:23

Which rarely happens in reality.

Speaker 2

18:25

Exactly, so iterative models emerge, things like incremental development, evolutionary prototyping, the spiral model. The big shift there was moving away from doing everything in one big chunk to doing things in smaller cycles, building and testing parts of the system, getting feedback, and then.

Speaker 1

18:40

Iterating more flexible, more adaptive.

Speaker 2

18:43

Much more. And this really paved the way for agile testing.

Speaker 1

18:47

What are the key characteristics of testing in an agile world?

Speaker 2

18:50

Agile testing is fundamentally driven by customer needs and feedback. It's typically bottom up, focusing on delivering working software components early and often, and it absolutely embraces changing requirements.

Speaker 1

19:03

Flexibility is key.

Speaker 2

19:05

Absolutely. Two big examples you hear about are Extreme Programming XP and Test driven development TDDDDD.

Speaker 1

19:13

That's the one where we write the test first.

Speaker 2

19:14

Right that's the one. It sounds backward, but it's powerful. In TDD, developers work in very short cycles. First, write an automated test case for a tiny piece of functionality that doesn't exist yet. The test will obviously.

Speaker 1

19:29

Fail because the code isn't there right.

Speaker 2

19:31

Then write the minimum amount code needed to make that test pass okay, and then importantly refactor the code, clean it up, improve its design, while continually rerunning all the tests to make sure nothing broke.

Speaker 1

19:43

So the tests act like a specification in a safety net precisely.

Speaker 2

19:47

It requires good automated testing frameworks like JUnit for Java, for instance, and it really keeps the focus on building exactly what's needed and keeping the code based clean.

Speaker 1

19:57

What about Scrum? How does testing fit in there?

Speaker 2

20:00

Rum organizes work into sprints, usually two four weeks long. Teams have daily stand up meetings to coordinate, and the goal is to produce a potentially shippable increment of software at the end of each sprint. Testing is integrated throughout the sprint, often with continuous integration and automated builds happening.

Speaker 1

20:17

Daily, so very rapid cycles and feedback very rapid.

Speaker 2

20:20

Some say scrum is mostly new names for old ideas, but that intense focus on short intervals and daily integration really forces testing to be a continuous activity, not an afterthought.

Speaker 1

20:31

Beyond specific methodologies, there's also model based testing MBT. What's the idea there?

Speaker 2

20:36

NBT takes a more abstract approach. Instead of deriving tests directly from requirements, text, or code, you first build a formal model of the system's.

Speaker 1

20:44

Behavior, like a flow chart or a state machine.

Speaker 2

20:47

Could be finite state machines, petri nets, state charts. There are various modeling languages. The key advantage, Jorgensen argues is that the act of building the model forces you to gain deeper insights and understanding of the system. You have to really think through the state's transitions and behaviors.

Speaker 1

21:04

Now the model itself is valuable immensely.

Speaker 2

21:07

Then you use algorithms or heuristics to automatically or semi automatically generate test cases from the model. You run the tests, analyze results, and potentially refine the model or the system. Some models are more powerful than others. Peterson's lattice shows a hierarchy where things like beatri nets can express concurrency that simpler finite state machines can't.

Speaker 1

21:29

Interesting okay, moving up the V model. Let's top integration testing you mentioned earlier. This is often seen as tricky.

Speaker 2

21:36

Yeah. It's often called the lyst well understood, and most poorly done phase. This is where you start combining those units that were tested individually, and you're looking for problems at the interfaces between them where things connect exactly. Maybe Unit A passes data incorrectly to unit B, or they make different assumptions about shared data. These kinds of issues only appear when you put them together.

Speaker 1

21:57

How did people traditionally approach this?

Speaker 2

22:00

Strategies were often decomposition based, like top down integration, where you start with the main module and use stubs dummy modules to simulate the ones it calls, or bottom up where you start with the lowest level units and use drivers to call them, building upwards or sandwich a mix of both. What was the problem with those A key issue is that the decomposition tree used for planning might

22:22

not match the actual call structure of the program. This could lead you to design tests for interactions between units that never actually call each other directly impossible.

Speaker 1

22:32

Test pair wasted effort again, right.

Speaker 2

22:34

So a better approach is call graph based integration. You look at the real call graph units or nodes, calls or edges and base your integration tests on actual feasible paths between units.

Speaker 1

22:45

Makes more sense. What about for object oriented systems?

Speaker 2

22:48

For UVO software, the concept of MM paths method message paths is useful. An MM path represents a sequence of method executions within a single object, interspersed with messages sent to other objects, which in turn trigger their methods. It helps trace feasible execution flows across object boundaries during integration.

Speaker 1

23:07

Okay, ensuring those interactions work. Finally, we reached the top of the v system, testing the whole show bang.

Speaker 2

23:13

Yep, testing the entire integrated system as a black box. Usually from the end user's perspective. Here, we often think in terms of atomic system functions ASFS. It's basically the smallest observable interaction a user can have with the system that accomplishes something meaningful, like inserting your card at an ATM, entering your PI, requesting a balance. Each of those is an ASF.

Speaker 1

23:37

Got it and we also use use cases here right heavily.

Speaker 2

23:40

Use cases describe sequences of actions involving the user and the system to achieve a specific goal. They can be high level user stories or very detailed step by step real use cases specifying inputs, outputs, preconditions, post conditions. They really define the system's behavior from that user viewpoint.

Speaker 1

23:57

So system testing is very much about does it do with the user needs exactly?

Speaker 2

24:01

And to prioritize system testing, we often use operational profiles.

Speaker 1

24:06

What do those tell you?

Speaker 2

24:07

An operational profile tries to estimate how frequently different features or sequences of actions threads will actually be used in the real world. You gather statistics or make educated guesses about usage.

Speaker 1

24:19

Patterns that it's so useful.

Speaker 2

24:21

It allows you to focus your testing effort on the high traffic areas, the parts of the system that users will interact with most often. Finding and fixing bugs there gives you the biggest quality improvement for the user experience. It's based on probabilities, often derived from models like finite state machines representing usage smart prioritization.

Speaker 1

24:40

Yeah, and system testing isn't just about function right, there's non functional testing.

Speaker 2

24:44

Too, absolutely crucial. This includes things like performance testing, security testing, usability testing, and stress testing.

Speaker 1

24:51

Pushing the system to its limit.

Speaker 2

24:53

Yeah, seeing how it behaves under extreme load or adverse conditions. How many users can handle concurrently? What happened when the database gets huge? How does it recover from network failures? Craftsmen use strategies like compression simulating large loads with smaller representative test sets, or replication simulating potentially destructive tests in a controlled environment.

Speaker 1

25:16

Okay, that covers the main levels. Let's shift to some more advanced perspectives. How do we even test systems made of other systems? Systems of systems sos?

Speaker 2

25:26

AH? Yes, SOS testing is a huge challenge. These are systems built by integrating large scale, often independently managed, constituent systems think air traffic control or complex financial networks.

Speaker 1

25:38

They weren't necessarily designed to work together initially.

Speaker 2

25:40

Often not. Jorgensen describes different types based on how much central control exists. Directed sos are built for a specific purpose, like a smart home system. Acknowledged sos have independent systems that know about each other and work together loosely, like maybe coordinating traffic lights across the city. Collaborative sos have systems that volunteer to work together for a shared goal, maybe during an emergency response. And virtual sos have no

26:04

central authority at all. They just interoperate dynamically when needed.

Speaker 1

26:08

Like parts of the Internet, testing those interactions must be incredibly complex.

Speaker 2

26:12

It is you have to understand the communication protocols, the potential emergent behaviors that arise from the interactions. Techniques involve modeling the communications, perhaps using things like Petree nets mapped to interface specifications, to understand how these independent giants influence each other.

Speaker 1

26:28

A whole other level of complexity. Now, let's come back to something fundamental, evaluating our own tests. After all this effort, how does a craftsmen know if their test cases are actually any good?

Speaker 2

26:39

That's the million dollar question, isn't it? How good are my tests? You need ways to assess their effectiveness. One classic, powerful technique is mutation testing.

Speaker 1

26:49

You mentioned this briefly. How does it work again?

Speaker 2

26:51

You take the original program that presumably passes all your current tests. Then you automatically introduce lots of small, single chain mutants. Each mutant version of the program has one tiny fault deliberately inserted.

Speaker 1

27:05

Like changing a plus to a flaint or to a frame.

Speaker 2

27:08

Exactly simple syntactic changes. Then you rerun your entire test suite against each mutant program.

Speaker 1

27:14

What are you looking for?

Speaker 2

27:15

You're looking for your test suite to kill the mutant, meaning at least one test case fails when run against that mutated code. If a mutant runs through your entire test suite and doesn't cause any test to fail, it's called a live mutant.

Speaker 1

27:28

Yeah, that's bad.

Speaker 2

27:29

That's bad. It means your test suite has a blind spot. It wasn't sensitive enough to detect that specific small change, which might represent a real potential bug. The ratio of killed mutants to total mutants gives you a mutation score, a quantitative measure of your test suite's thoroughness.

Speaker 1

27:46

That urban legend about the mariner I probe.

Speaker 2

27:48

Yeah, the story goes it failed due to a single character typo in the code. Something. Mutation testing is designed to catch. Whether the legend is perfectly accurate or not, It highlights the principle small mistakes can have huge consequences, and mutation testing helps find weaknesses and tests that might miss them, so.

Speaker 1

28:07

It tests the tests. What about fuzzing? That sounds less systematic.

Speaker 2

28:11

It is less systematic, but surprisingly effective, especially for finding security vulnerabilities and robustness issues. Fuzzing basically involves throwing large amounts of random, semi random or malformed data out of program's inputs.

Speaker 1

28:25

Just trying to break it pretty much.

Speaker 2

28:27

You're hoping that some unexpected inport combination will trigger a crash, an assertion failure, a memory leak, or some other vulnerability. It was famously discovered by accident when random noise on a modem line caused Unix utilities to crash. The noise was effectively fuzzing them.

Speaker 1

28:42

Accidental discovery. Okay, And then there's that analogy for ecology, phishing, creole counts, or fault insertion.

Speaker 2

28:48

Right. This is another way to estimate test effectiveness, or rather estimate the number of remaining faults. The analogy is wildlife managers tagging some fish, releasing them, and then using the portion of tagged fish caught later to estimate the total fish population.

Speaker 1

29:04

How does that apply to software?

Speaker 2

29:06

You tag the software by intentionally inserting a known number of representative faults seated faults. These should ideally be similar in nature to the kinds of real faults you expect. Then you run your test suite.

Speaker 1

29:18

And see how many of the seated faults it finds.

Speaker 2

29:21

Exactly. If your tests find, say eighty percent of the faults, you deliberately inserted. You might infer that they've also likely found about eighty percent of the original wild false that were already in the code. It gives you a statistical estimate of the remaining defects.

Speaker 1

29:34

Clever, very clever. Okay, Finally, let's talk of the human side of quality software. Technical reviews. Are these really a form of testing?

Speaker 2

29:42

Jorgenson argues strongly that they are. They're a form of static testing without executing the code. Reviews aim to find faults errors in code design documents, requirements before they have a chance to become failures during execution.

Speaker 1

29:57

And the economics are compelling, right.

Speaker 2

29:59

Hugely compelling. Barry Boehm's classic research showed graphically that the cost to fix a fault increases exponentially the later in the development cycle.

Speaker 1

30:08

It's found.

Speaker 2

30:09

Finding a fault in a requirements document during a review might cost pennies compared to finding the same underlying misunderstanding after the system is deployed. It's the ultimate stitch in time saves nine.

Speaker 1

30:19

So what kinds of reviews are there?

Speaker 2

30:21

There is a spectrum of formality. Walkthroughs are usually the least formal, often led by the author of the work product, code design doc their effectiveness can vary a lot depending on the author's goals. At the other end is the technical inspection, pioneered by Michael Fagan at IBM in the seventies. These are highly structured and generally considered the most effective type of review.

Speaker 1

30:43

What makes them so.

Speaker 2

30:43

Formal inspections involve documented processes, specific roles aither producer, moderator, leader, reader, recorder inspectors, formal training for participants, budgeted time, detailed checklists based on common error types, and a focus on finding and logging defects, not fixing them during the meeting. Places like telephone switching Labs developed incredibly rigorous inspection processes over decades to achieve near fault free systems.

Speaker 1

31:10

It sounds like a very disciplined process. What does it take to make reviews work well in an organization? It's not just process, is it?

Speaker 2

31:17

Definitely not? Reviews are a deeply social process. For them to be effective, the culture has to support them as they need to be seen as valuable, a worthwhile investment of time, not just overhead. Time needs to be budgeted for preparation and participation. There needs to be clear etiquette. You review the product, not the producer, the goal is

31:35

defect detection, not solution brainstorming or blaming. Keep it constructive, absolutely, and generally it's best of direct managers don't participate in the actual review meeting itself. That helps create a psychologically safer environment where people feel more comfortable pointing out potential issues without fear of evaluation.

Speaker 1

31:54

Makes sense, create the right environment for finding faults. So this up, What does this whole deep dive mean for you, our listener? We've journeyed through this intricate world of software testing.

Speaker 2

32:08

Yeah, from the basic definitions of errors and failures, through all those unit testing techniques, the different life cycle approaches, integration, system testing.

Speaker 1

32:17

And finally looking at advanced topics like systems of systems, evaluating our tests, and the critical role of technical reviews.

Speaker 2

32:24

And throughout it all we've seen that software testing, when done well, really is a craft.

Speaker 1

32:28

What does that mean?

Speaker 2

32:29

Ultimately, it means it requires deep knowledge of the principles and techniques. It requires this skill to choose the right tools for the job, just like any artisan, It requires extensive experience to build intuition, and fundamentally, it requires a dedication to high quality work, a sense of pride in doing the job well.

Speaker 1

32:46

Well said, it's about building confidence and quality through skill and diligence. So for your provocative thought this week, step back and consider how these principles of software testing might apply beyond just code. Think about identifying folds early, rigorously evaluating the effectiveness of your methods, or understanding complex systems through their inputs, outputs, and interactions. Where in your own work, or even your life might you benefit from applying a

33:11

bit of this craftsmanlike testing mindset. Where could a deep dive using some of these concepts reveal ways to improve quality or prevent problems?

Speaker 2

33:20

Something to think about

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript