5 signals that can indicate its time to re-factor

Re-factoring code is a part of any programmers life or should be. Very few people write beautifully clean and well factored code from the start, and I am certainly not one of them. Most times my code starts out a sprawling tangle that takes an additional pass or two to look like something I would consider clean. I find that when programming and designing I start with a rough idea or sketch of what I want to accomplish. Through my process I refine and reshape it into something beautiful. Where programming and art differ is that when programming once it starts ‘working’ there is the temptation to stop tweaking it. If you do stop, then only the people looking at the code will know what a mess has been made. Whereas with art & design, if you try and pass off a rough unfinished piece as something complete and polished you will often be met with raised eyebrows. What follows is part of my personal hit list of things I look for when I clean my code. And while there are far more exhaustive and thorough lists of things to look for. I kept to five because it made for a better title than 7 or 8 signals that indicate its time to re-factor.

1. Methods with too many parameters

Methods that require lots of parameters require you to remember more and make for constant referencing to an API. Its also easier to make mistakes using these methods. A perfect example of this is Model::find() Compare the 1.1 method use vs. 1.2

Show Plain Text
  1. $this->Post->findAll($conditions, $fields, $order, $limit, $page, $recursive);

6 parameters! While all necessary, remembering which goes where was difficult, and required memorization. Contrast that with the 1.2 usage

Show Plain Text
  1. $this->Post->find($type, $options);

Only two parameters makes this method far easier to remember, and far easier to use. I say this method is easier to use as you will make less mistakes by getting the parameter order incorrect. Another thing I’ve grown to dislike is flag parameters. You know those little boolean parameters we tack onto method to make them do more than one thing. The problem with flag parameters is they open you up for methods that do more than one thing, and again they are harder to remember. How many times have you forgotten the little false on the end of Model::bindModel(). Now how often would you make the mistake between Model::bindModel() and Model::bindModelPermanently(). By dropping flag parameters and introducing an additional method you make a trade off though. You end up with more methods, but in exchange you get more readable code.

2. Methods that do too many things.

Often times methods that do many things are often the same as those that have lots of parameters. However, methods that do more than one thing are double trouble. First they are harder to test, and secondly they can be confusing to use. To figure out if a method is doing more than one thing I describe what the function does. If when describing my method, I use and, or or but, then there is a pretty good chance that what I have written should be two methods and not one. At that point, I normally try to split my method into more than one public method. I would also create a protected method if there is some common features shared between the two public methods.

Show Plain Text
  1. public function getNode($id = null, $addView = true){
  2.     $conditions = array();
  3.     if (is_numeric($id)) {
  4.         $conditions['Node.id'] = $id;
  5.     } elseif(is_string($id)) {
  6.         $conditions['Node.slug'] = $id;
  7.     }
  8.     $node = $this->nodeInfo($conditions);
  9.     if ($addView && !empty($node)) {
  10.         $this->id = $node['Node']['id'];
  11.         $this->saveField('views', ($node['Node']['views'] +1));
  12.     }
  13.     return $node;
  14. }

This method both gets a Node object and updates the view count based on a flag parameter. The incrementing of the view count is not very clear from reading the name of the method. While this particular method is not overly difficult to test, it could be confusing to use, as the view count could accidentally be incremented.

Show Plain Text
  1. public function getNode($id){
  2.     $conditions = array();
  3.     if (is_numeric($id)) {
  4.         $conditions['Node.id'] = $id;
  5.     } else {
  6.         $conditions['Node.slug'] = $id;
  7.     }
  8.     return $this->nodeInfo($conditions);
  9. }
  11. public function updateViewCount($id) {
  12.     return $this->updateAll(
  13.         array('Node.views' => 'Node.views + 1'),
  14.         array('Node.id' => $id)
  15.     );
  16. }

There are now 2 separate methods that do one thing each. With this setup, it is easy to figure out what each method is going to do by simply reading the method name. We also benefit from simpler tests as we don’t have as many paths of execution to test. By making methods only do one thing, and giving them appropriate names our code is easier to read, easier to use, and easier to test.

3. Repetitive code blocks.

Repetitive code can happen when you are in a hurry, and just need a loop here, or that method there. Removing these duplicated blocks of code is really important as its easy to forget to update them all when you update one. Ideally when you find repeated code, you want to move it into its own method and make that method flexible enough to handle the variations required. For example:

Show Plain Text
  1. if (count($helpers)) {
  2.     $this->out("Helpers:", false);
  4.     foreach ($helpers as $help) {
  5.         if ($help != $helpers[count($helpers) - 1]) {
  6.             $this->out(ucfirst($help) . ", ", false);
  7.         } else {
  8.             $this->out(ucfirst($help));
  9.         }
  10.     }
  11. }
  13. if (count($components)) {
  14.     $this->out("Components:", false);
  16.     foreach ($components as $comp) {
  17.         if ($comp != $components[count($components) - 1]) {
  18.             $this->out(ucfirst($comp) . ", ", false);
  19.         } else {
  20.             $this->out(ucfirst($comp));
  21.         }
  22.     }
  23. }

Two identical loops. Now how easy would it be to forget to update one when you fix the other. We can more easily express this as

Show Plain Text
  1. $properties = array(
  2.     'helpers' => __("Helpers:", true),
  3.     'components' => __('Components:', true)
  4. );
  5. foreach ($properties as $var => $title) {
  6.     if (count($$var)) {
  7.         $output = '';
  8.         $length = count($$var);
  9.         foreach ($$var as $i => $propElement) {
  10.             if ($i != $length -1) {
  11.                 $output .= ucfirst($propElement) . ', ';
  12.             } else {
  13.                 $output .= ucfirst($propElement);
  14.             }
  15.         }
  16.         $this->out($title . "\n\t" . $output);
  17.     }
  18. }

Not only did we use less code, we also reduced our ability to make mistakes. Removing duplicate code makes things easier to maintain and more homogenous. With fewer lines of code to maintain, and less places to make mistakes bugs are easier to fix.

4. Bad Names

As programmers we have to name many things, classes, methods, variables, and files all need names. Names that are descriptive and befitting of their purpose are the best. They help describe our code and give meaning to what we are trying to communicate. At its root programming is communication. You need to communicate ideas to both the machine and those who will read the code after you. Compare the following:

Show Plain Text
  1. protected function _filterFolders(&$fl, $recurL = true) {
  2.     $cn = count($fl);
  3.     foreach ($this->exDir as $blkLst) {
  4.         if ($recurL) {
  5.             $blkLst = DS . $blkLst . DS;
  6.         }
  7.         for ($i = 0; $i < $cn; $i++) {
  8.             if (isset($fl[$i]) && strpos($fl[$i], $blkLst) !== false) {
  9.                 unset($fl[$i]);
  10.             }
  11.         }
  12.     }
  13.     $fl = array_values($fl);
  14. }

I personally find this code totally opaque, its very hard to discern what the author was trying to do with this code. In comparison the same code with good names makes the code far more accessible.

Show Plain Text
  1. protected function _filterFolders(&$fileList, $recursiveList = true) {
  2.     $count = count($fileList);
  3.     foreach ($this->excludeDirectories as $blackListed) {
  4.         if ($recursiveList) {
  5.             $blackListed = DS . $blackListed . DS;
  6.         }
  7.         for ($i = 0; $i < $count; $i++) {
  8.             if (isset($fileList[$i]) && strpos($fileList[$i], $blackListed) !== false) {
  9.                 unset($fileList[$i]);
  10.             }
  11.         }
  12.     }
  13.     $fileList = array_values($fileList);
  14. }

As a corollary, hungarian notation and single letter variables outside of loop variables should be avoided. Hungarian notation in dynamic languages just does not make sense. If you change the type $iFoo to null at some point. The value of the embedded type is lost as $iFoo is no longer accurate. What you are left with is a more confusing variable than if you had just given the variable a name that accurately describes its purpose in the first place. Lastly, knowing a type is useless if you don’t know what the variable is for. The only place I can see embedding a type into a variable name is when the language supports or requires it, like ruby.

5. Mixed levels of abstraction and muddled areas of concern

Mixed levels of abstraction can be difficult to spot at first. The idea behind the concept is that each object or layer should have one role, and should delegate to other objects the implementation details. By separating your concerns you encapsulate your logic better and create more reusable code. As an added bonus your code becomes easier to test, more focused, and shorter. For example in an MVC application the controller should be responsible for handling user requests, and providing the View layer with the necessary information. It should not be directly talking to the database in terms of SQL. Another example is having inline CSS and Javascript in your HTML. Doing so couples the two layers together creating inflexible and often brittle code.

Show Plain Text
  1. //Posts Controller
  2. function view($id = null) {
  3.     $post = $this->Post->query("SELECT * FROM posts WHERE id = '$id'");
  4.     $this->set('post', $post);
  5. }

Not only is this code terribly insecure, but it demonstrates a mixture of concerns and abstraction levels. The controller is sending SQL directly to the database. If your database structure ever changes you need to update objects that should not even be related to the database domain. A much better solution is to use a model method to encapsulate the read logic. So that each of our objects are only responsible for one role.

Show Plain Text
  1. //Posts Controller
  2. function view($id = null) {
  3.     $post = $this->Post->getPost($id);
  4.     $this->set('post', $post);
  5. }
  7. //Post Model
  8. function getPost($id) {
  9.     $id = Sanitize::sql($id);
  10.     return $this->query("SELECT * FROM posts WHERE id = '$id'");
  11. }

This approach keeps our controller and model concerns separate. We could improve our code further by using Model::read(), as that method better encapsulates and abstracts the database.

So there you go, those are some of the more common things I look for when re-factoring code. Obviously this is not an exhaustive and complete list. My goal in using these re-factoring methods is to create simpler code. Simpler code is easier to maintain, and easier to test. If code is easy to test, it is easy to fix when something goes wrong. I hope you can find these useful in your on going quest against stinky code.


Excellent list with clear and simple examples.

I think I’ll just turn this into a checklist for all my project releases from now on.

In any case, great article!

dr. Hannibal Lecter on 31/5/09

This is a great article, strongly articulates the concepts we all know (in the back of our minds) we should do

Nathan on 31/5/09

As often, a great article.
Now working on “real” websites, i’ll make an effort to take time to write clean code. But i also know that’ll not be easy to do…
It’s difficult (i’m sometimes lazy) when you always works alone to take this time, even if you already know it’s better on long term.

Emmanuel on 31/5/09

I like the comparison between art and code in the introduction. Very adequate. We’re all striving for beauty. The visual appearance of code is important.

David Persson on 31/5/09

Good list! Another thing I try and think about when refactoring is how easy it is to re-use the code in other situations. Although it’s really an extension of your #2 tip, I try to refactor my methods so they are easily overridable. For example:

Instead of doing something like this:

function whatever($other, $thing = ‘else’) { // blah

I might do this:

function whatever($other) { $thing = $this->thing();

function thing() { return ‘else’;

This allows me to override/extend in subclasses without having to change the method arguments (which might also exist in a parent method).

Adam Royle on 1/6/09

In #3, what is the $this->out() method?

Daniel Royer on 2/6/09

Thanks for the feedback everyone, glad you liked the article.

Daniel Royer: That was part of a shell class, so $this->out() writes to stdout.

mark story on 2/6/09

Really great article Mark and some very useful tips/insight into writing cleaner code.

Thanks for sharing!

Chris on 2/6/09

Mark, you’re my favorite Professor on the internet.

Jon on 5/6/09

Excellent ! … these indicators are a “must-know” … and I think most good programmers constantly find more reasons to refactor their code, hard to be content, or may be thats just me … this list is a good starter to give you a programmers itch :)

SayB on 8/6/09

Ah, Code Smells. :) Here’s a couple links that expound on these great tips:

Thanks for posting these, Mark. Refactoring is a lifelong pursuit and one CakePHP in particular could benefit from.

Benjamin Young on 24/6/09

I got one, when you have trouble testing your code you should really re-factor.

fernyb on 16/8/09

Hi Mark,
Very useful and nice post, thanks a lot, i just have one thing that really bothered me all along while reading this pretty post, its something related to if/else, why you keep doing it the way:
if (is_numeric($id)) { $conditions[‘Node.id’] = $id;
} else { $conditions[‘Node.slug’] = $id;

While you can simplify the code and write less of it by simply:
$conditions[‘Node.slug’] = $id;
if (is_numeric($id)) { $conditions[‘Node.id’] = $id;

Wouldnt that be much better approach to take while coding something?!!

Thanks again for the post, and please keep up the great work :-)

Ma'moon Al-Akash on 26/10/09

Have your say:

* You can use Textile markup, but be reasonable