Have you ever looked at something really complex, maybe a tangled mess of wires, and just, you know, felt something was off? Like it needed a good sort out, even if you couldn't quite say why technically. Yeah, that intuitive feeling. Exactly. Well, that same sense applies to software code. Subtle clues can point to deeper problems. Welcome to the deep dive. We take a whole stack of information and pull out the really important bits for you.
Today we're getting into the world of code smells. We're drawing mainly from Marco Tulio Valente's Software Engineering a Modern Approach, specifically Section 9.5. Right. And our mission today really is to help you spot these indicators. Sometimes they're subtle, sometimes frankly they're not so subtle. These signs of, well, it's a lower quality code. We'll look at why they matter, how they impact things, and crucially, what you can actually
do about them. It's about recognizing those red flags, whether you're building software yourself or just, you know, curious about how good systems are put together. OK, let's jump right in then. How would you explain the difference between a code smell and, say, a bug? What's the key thing there? That's a really important distinction. A code smell, or sometimes called a bad smell. It isn't a bug in the usual way, It doesn't make the program crash or give the wrong answer.
It's more like an an indicator, maybe a symptom that your code might be tricky to maintain down the line, or hard to understand, modify, or even test properly. Think of it like code that just doesn't smell right. A hint that maybe some refactoring is needed. Refactoring meaning. Ah yes, refactoring is all about improving the internal structure of the code without changing what it actually does from the outside. It's external behavior. Cleaning it up internally. Got it.
So it's more about the underlying structure being weak, not an immediate failure. Precisely, and the word indicators is really key. Not every single smell means you have to drop everything and refactor right away. The decision depends on, well, a few things. How critical is that bit of code? How often does it actually get changed or maintained? Is it a core part of the system? So there's some judgement involved. Definitely, but ignoring smells can lead to technical debt piling up.
You know, making future work harder and frankly, more expensive. OK, let's tackle the first big one you mentioned. It's maybe the most common duplicated code. Sounds simple, but what makes it so bad for a system? Duplication. It's incredibly common, yes, and it really has a high potential to damage how a system evolves over time. When code is duplicated, maintenance isn't just doubled, it gets much worse. Any change, any bug fix has to be applied in multiple places and.
You might miss one. Exactly. That creates a constant risk of inconsistency. You fix it here, but forget it over there. And it's not just the extra work. It can actually stifle development. People become afraid to make changes because they might miss a duplicated instance. That fear can paralyze things, make important updates seem too risky. Wow, OK. And fundamentally, it just makes the whole code base more complex.
You've got logic scattered around that could and should be nicely contained in one single place. So once you spot these duplicates, what are the typical ways to fix them? What are the refactoring moves? Well, there are specific refactorings for this. For instance, if you see the same block of code inside two or more methods, extract method is usually the way to go. That just means you take that duplicated block, put it into its own new method, and then call that new method from the
original places. OK, and if you see the same attributes and methods popping up in several different classes, then extract classes. Often the answer you create a new smaller class to hold that shared stuff and another common one. If a method is duplicated across several subclasses, like children inheriting from a parent class, you might use pull up method that moves the common method up into the parent class. All the children can share it. The source talks about clones here.
Can you break down the different types it mentions? Yes. These blocks of duplicated code are often called clones, and the source categorizes them into four types basically based on how similar they are. Type 1 clones are pretty much identical, the only differences might be things like comments or maybe some extra spaces or blank lines. Purely cosmetic. OK, exact copies mostly, right? Then type 2 clones are like type one, but they might use different names for variables or
other identifiers. So the structure is the same, but the names are changed. Still pretty similar then. Type 3 clones build on type 2. They have the same structure, maybe different names, but also some minor differences in the actual code statements. Perhaps one has an extra print command for debugging or a slightly altered condition? Small variations. And type 4, you said they were the tricky ones, ah. Yes, type 4. These are the hardest to spot automatically.
They are semantically equivalent, meaning they achieve the exact same result or purpose, but they're implemented using totally different logic or algorithms. They might look completely unalike on the surface. Spotting these often requires more sophisticated analysis tools or just a really deep understanding of what the code is trying to do. Simple text comparison won't find them. That's a really crucial difference. The source uses a factorial
function as an example. Can you like describe how these clone types might show up there? Just conceptually, no need for the actual code. Sure, yeah. So imagine a basic function to calculate factorial. You know, NN 1 and 2 and so on. A type 1 clone would be just the same function copied elsewhere, maybe with a different comment or indentation. Simple. Copy paste. Exactly.
A type 2 might rename the input variable from north to member or the result variable from fact to result, but the calculation logic is identical. OK, a type 3 could have a small difference, Maybe it prints the value at the end at the start just for logging, while the original doesn't. A minor statement difference right at a type 4 clone? Well, that could be a completely different approach.
Maybe one version use a for loop to to calculate the factorial, while the other uses recursion where the function calls itself. They both get the factorial, but the way they do it looks totally different. But functionally, they're duplicates. You only need one. Precisely. That underlying duplication is still there, just hidden much better in type 4. And this isn't just theoretical academic stuff, is it? The source mentioned a study. That's right, It's very
practical. Back in 2013, researchers Yamasita and Moonen surveyed. I think it's 85 developers. They asked them about their biggest headaches, their main concerns with code smells, and duplicated code was by far the top answer. It scored almost double the points of the next one down the list, which was long method. So that really highlights how much duplication impacts developers in their actual day-to-day work.
It's a real pain point. OK, so that brings us neatly to the second biggest concern, long method. What exactly makes a method long? An why is that a problem? Well, ideally methods should be short and sweet. They should have names that clearly explain what they do and contain, you know, relatively few lines of code. A long method is a smell because honestly, it just makes life difficult. It becomes really hard to understand the whole thing, to follow the logic from start to finish.
Cognitive Overload. Exactly. Trying to maintain it without accidentally breaking something becomes a challenge. Debugging can turn into a real slog because you're juggling too many steps and variables in your head. The usual fix, again, is extract method. Break it down. Carve out smaller, more focused pieces, each doing just one thing well. Is there a specific number of lines? Like a hard rule over 50 lines maybe? Not really.
Not an arbitrary number. There's no universal magic number that says this is too long. What counts as long can depend on the language you're using, how important the method is, the complexity of the problem it's solving, things like that. Context matters. It does. However, there is a strong trend in the industry these days towards writing very small methods. Often you know fewer than 20
lines, sometimes even less. The goal is really clarity and making sure each method has a single well defined responsibility. Right, that single responsibility idea keeps coming up. If methods can be too long, I guess classes can too. That leads us to large class. Precisely same principle, just scaled up. Classes like methods shouldn't
try to do too many things. A large class is a smell when it takes on too many different responsibilities or offers services that aren't really related to each other. Lacks cohesion. Exactly. Low cohesion. This makes the class hard to understand, hard to maintain, and especially hard to reuse. If a class does 20 different things, you probably only need three of those things in another part of your system, but you have to drag the whole behemoth along.
Often you'll see these large classes have tons of attributes data fields that don't really belong together conceptually. And the fix for that kind of, well, class bloat? The main approach is extract class. You look for a group of responsibilities or a set of related data within the large class and you pull that out into a new, smaller, more focused class.
The original large class then just holds a reference and attribute of this new smaller class type and delegates work to it. OK, delegation. Yeah, and sometimes these classes get so big they start doing almost everything in a part of the system. They become the central brain. These get a special rather negative name, the God class or sometimes a BLOB. You often spot them by their very generic names, like System manager or main controller or something equally vague.
God class, I like that. OK, next smell feature Envy. That name is pretty evocative. What's going on there? It is a great name, isn't it? Feature envy happens when a method inside one class seems more interested in the data and methods of another class than its own. Basically, it spends most of its time calling methods or accessing data on an object of a different class. It envies the features of that other class. So it's like a method that's in the wrong place.
It should belong to the class it's constantly interacting with. That's exactly it. It suggests the method is misplaced. If you see this, the common refactoring is move method. You literally move the method over to the class. It seems to envy the one whose data it uses so much. The source gives an example of a method that calls lots of functions on an object called App It, which comes from another class, but barely uses anything from its own class.
Clear sign, yeah. A very clear sign that the method probably belongs in the class where AP comes from, which was abstract tool in that example. Moving it improves the overall design and makes responsibilities clearer. OK, shifting gears from location to inputs, we've got long parameters list. Why is having too many parameters to a method considered a smell? Right. Methods ideally should have very few parameters. A long list makes the method signature complicated and harder
to use correctly. When you call it, it also often indicates the method might be trying to take on too much responsibility itself. It needs a lot of information because it's doing too many things. Makes sense? What are the solutions? There are two main things to look for. First, can the method figure out one of the parameters itself?
For example, if you're passing in parameter P1 and parameter P2, but inside the method you could always calculate P2 just by using P1, then you don't actually need to pass P2 in the method, can derive it internally. That simplifies the call. OK, eliminate redundant inputs. What's the second approach? The other, often more powerful solution is to group related parameters together into a new
object or class. Like instead of having a method like process dates, date start date and date string format, maybe you can create a date range class that holds the start and end dates and then the method just takes a single date range object. Maybe like process date range, date range range string format. It cleans up the signature, makes the code more readable, and often bundles related data with behavior too. That makes a lot of sense. OK, let's talk about global variables.
I've definitely heard programmers warn against using those. Why does the source list them as a code smell? Yes, global variables are generally considered problematic. The main issue is that they create a really bad kind of dependency, sometimes called common coupling. They make it incredibly hard to understand what a function or method actually does just by looking at it. Its behavior can depend on this global value that could be changed by any other part of the program at any time.
So hidden dependencies everywhere. Exactly. Imagine a function calculating something and at the end it adds the value of a global variable counter. To know what that function will return, you can't just read the functions code, you have to know the current value of counter which could have been set or modified by completely unrelated code somewhere else entirely. This makes reasoning about the code extremely difficult and
debunking a nightmare. A bug might appear in one function, but the actual cause could be some distant piece of code messing with that global variable. That sounds fragile. Very. It creates spooky action at a distance. And it's worth pointing out, in languages like Java, static attributes behave essentially like global variables within their scope, so they carry the same risks and are also considered this smell. Good point. OK, finally let's tackle primitive obsession.
This sounds like maybe using the basic building blocks like numbers and strings too much. That's a great way to think about it, actually. Yes, Primitive obsession is when we rely too heavily on these fundamental primitive types. Integers, strings, booleans, floats instead of creating small dedicated classes to represent concepts from our problem domain. Can you give an example of that? Sure, let's say you need to handle a zip code in an address.
The quick and dirty way is just to use a string variable, store it as text. Seems simple enough. It seems simple, but the source suggests and it's good practice to create a dedicated zip code class instead. Why? Well, the main benefit is that this dedicated class can hold not just the data the I code string itself, but also the behavior associated with it. For instance, the I code class constructor could automatically validate the format. Is it 5 digits or 9 digits with
a hyphen? Does it match known valid ranges? It can handle that logic internally. So you encapsulate the rules with the data. Precisely. You encapsulate that responsibility. Now, other parts of your code don't need to worry about validating ZIP codes, they just use the ZIP code object, trusting it's valid. It makes the code cleaner, safer because of type checking, and more expressive.
You're dealing with AZIP code, not just some arbitrary string, so the idea is don't be obsessed with primitives. Elevate these simple values into more meaningful objects when they represent a distinct concept in your domain. And that wraps up our deep dive into the, well, sometimes
fragrant world of code smells. We hope this is giving you a much clearer picture of what makes code smell, so to speak, and why spotting these indicators is really quite crucial for building software that's robust and, importantly, maintainable overtime. It's all about that underlying quality. Thank you for joining us on this exploration today.
