Tuesday

How to refactor without reFing it up

Sometimes technical debt is unavoidable. There are lots of good reason why the code you have today is increasing your maintenance costs and slowing feature development. Sometimes you have to accept a hack in order to ship (shipping is a feature). Or if your product has been around a while, it's probably evolved far beyond the original design, and it just needs to be restructured. Or maybe it's not a good reason but it happened anyways, e.g., you trusted a less than stellar developer, forgot to write unit tests, or adopted a framework that sounds great in theory but just doesn't make any sense in practice. In any case, taking an iteration or more (when you can afford it) to deal with technical debt is just par for the course.

The first mistake made by many developers when they start refactoring is to start by cleaning up code. That way is a rabbit hole that will take you through multiple iterations to increasingly madder tea-parties (read: standups) and end with your PM screaming "Off with his head!" because you've made the codebase unworkable for weeks, all in an effort to make things better. It starts so innocently, with a single class. Then you notice that the problem with that class is related to one of its dependencies, so you chase that dependency down and discover that the general architecture is too small (or too large) so you eat the mushroom or drink the drink and soon you're drowning in your own tears. The caterpillar may console you momentarily by giving you a hit off his hookah, but it's just going to get weirder. Better to stay out of the rabbit hole.

The key is limiting the scope and hammering it into everyone involved that "WE DO NOT TOUCH ANYTHING OUT OF SCOPE". I'll say that again, in all caps WE DO NOT TOUCH ANYTHING OUT OF SCOPE. "But, but, the service layer is still a mess, which is why w-" WE DO NOT TOUCH ANYTHING OUT OF SCOPE.

The first step in a successful refactoring is for a senior developer or lead to sit down and figure out the smallest possible refactoring that can be done. Then create specific instructions on how the refactoring should be done. Literally, instructions like "Move line X to class Y and put Z where X was." You don't want to leave it up to the developer to come up with their own way of doing things. Developers get bored easily, and a single developer can try six different ways of doing things before he decides on one he likes.

Once you have detailed instructions and guidelines, cut your developers loose. After the first commit, be prepared to devote at least 10 minutes of your next standup to explaining the instructions again, because it turns out half of them didn't read them. It's not that you have stupid or bad coders on your team, it's that they're smart and think they don't need instructions. It's more fun to get coding and figure it out as you go, right?

Because you made the original refactoring small, once the effort is underway, you need to start planning the next one. Small and iterative works for features and it works for bugs. It will work for refactoring just as well. Don't waterfall refactor.

It goes against every technical design instinct we have, but the better course of action will almost always be to incrementally improves components of a system rather than redesign it from the ground up. You also have to keep the app running. If HEAD is unusable for even a day because of your refactoring, you're trying to change too much at once. Sometimes that means writing some glue code that is ugly and inelegant so that the component you're refactoring can be pretty and elegant. Your inner architecture astronaut will be screaming as he burns up on rentry, but even NASA sometimes needs duct tape.

Here is an example to get you started: Take your JSP/ASP/GSP or PHP templates. Define model classes for them, based on the needs of the UI, not based on whatever model you have in the database or ORM or the DTOs your services return. You should end up with simple inline expressions. e.g., ${model.foo} rather than ${_POST['foo'].replaceAll("\\.","-")} or ${fooList[i].bar.baz.foo}. Move all that ugly logic into your controller. That part is easy. The hard part is resisting the urge to go back your service layer or even your ORM mapper and develop a factory and mapping system that trasforms the output to perfectly match your templates so you can just write Foo.list(params) in your controller and have it be oh-so-elegant. It will be done in 2012. Remember: Worse is better.

Monday

The Ugly American Programmer: Bill Burke

I originally tried to post this as a comment on Bill Burke's Polyglotism is the worst idea I ever heard, but as far as I know, my comment is still awaiting moderation. So I'll go ahead and dissect it here. Burke starts with an alleged tongue-in-cheek:
Let me first start off with some tongue in cheek…I’m an American. I have no need to speak another language because well, I speak English. 99% of the civilized world speaks English so WTF should I ever learn another language? Case in point, I minored in German in college. For two semesters I went over to Munich and worked at Deutsche Aerospace so that I could learn German better. The thing is, besides the fact that my coworkers spoke damn good English to begin with, all the documentation they put out was in English, the units used were feet, pounds, miles. When the French came over to work with us, we also spoke English. So what is the freakin point of learning German? I was pretty damn disappointed that I wasted all this time in college learning German when in reality it was just a freakin useless exercise…
So we've taken the ugly American attitude and are now applying it to software development. Now, if he was trying to make the point that there is a lot of benefit to learning another language, his tongue-in-cheek disclaimer would make sense. But instead the above ignorant statement is followed by more ignorant rambling about using multiple languages in a project. He seems to think tongue-in-cheek means "I can say something ignorant and not be criticized for it".
Which brings me to the point of this blog. Polyglotism in software has to be the worst idea I ever heard. The idea of it is that you use the language that is best fits the job. Some say this is a huge boon for the developer as they will become more productive. In practice though, I think this is just a big excuse so the developer can learn and play with a new language, or for a language zealot/missionary to figure out a way to weasel in his pet language into a company. Plus, you’d probably end up being average or good at many languages but a master of none….But lets pretend that it is a benefit to the developer. Developers need to realize that there are implications to being polyglot.
Notice his focus on the single developer who is out to ruin things through his own selfishness (he must hate America!) No consideration is given to the benefits that a team and organization can derive from using multiple languages (which we will get to). His next point, maintenance:
So, you’ve added a Ruby module to that big flagship Java application or product your company is so proud of. You did it fast. It works. And management loves you for it. They love you so much for it, they’ve promoted you to software lead and now you are running a brand new project. Now that you’ve left your polyglot project, somebody needs to take over your work. Unfortunately, your group is a bunch of Java developers. For any bug that needs to be fixed, these developers need to be retrained in Ruby, a new Ruby developer needs to be hired, and/or a Ruby consultant/contractor needs to be brought it. Multiply this by each language you’ve introduced to your project.
Oh no, a Java programmer has to learn some Ruby! If you have any talented developers, they'll probably jump at the chance to escape Java hell and learn something new, assuming they haven't already learned Ruby on their own. "Multiply this by each language" is meaningless scare language. This assumes that skills are not transferable between programming languages. It's not at all difficult to be proficient in multiple languages. Many Java/PHP developers are also well versed in JavaScript (if you discount JavaScript at this point, I hope you have a career path that doesn't involve any programming for the web). Knowing multiple languages demonstrates mental flexibility and teaches a developer to think about problems in different ways. I would never hire a programmer that was only proficient in a single language.
The JVM is pretty cool now. We can run Ruby on it, Python on it, and even PHP on it. Your JRuby apps can work with Java APIs. Same with Jython and JPHP. Great. So now your developers can use any one of these language to build out extensions to your Java app. But what happens when you want to refactor one of your re-used Java libraries? OOPS!!!
Groovy and Scala are conspicuously absent from Mr. Burke's list. Could that be because they both compile to bytecode, and the resulting classes can therefore be used by any JVM language? So you could use Java, Scala and Groovy together in the same project in a completely interchangeable way, right now.
Ah, so you’ve weathered through the maintenance and refactoring nightmares and you’ve finally shipped your product. Hmm, but you’ve just added the complexity of installing multiple runtimes on your user base. Its not hugely bad if you’ve used the JVM as your base virtual machine. But you still need to package and install all the appropriate Java libraries, Ruby gems, and Python libraries on the appropriate places of your user’s machine. You also need to hope that all the desparate environments don’t conflict with anything the user has already installed on his machine. And that’s just with languages running in the JVM. Imagine if you used vanilla Ruby, Python and Java all as separate silos!
These new languages are used to reduce maintenance by dramatically reducing the size of the codebase, and simplifying application setup and ramp up times. Give me a domain model and I can have a functional Grails or Rails application running in less than an hour. And the code base will be a fraction of the size of the equivalent Java project. As for packaging, Maven handles Groovy and JRuby quite well. I'm sure PHP and Jython will have support soon enough, if they don't already (I don't know).
A support call comes in for your product or application. Its obviously a bug, but where is the problem? Is it your application code? Your JVM? Your Ruby VM? Your Java library? Your Ruby gem?
What an absurd statement. Where is the bug? I don't know, how do you normally find the bug in a "pure" application? Lets see, the date is parsed incorrectly and we're using some Ruby code to parse dates... Where could the bug be? Is it in the PHP templates? Only if you're an idiot. You could make the same argument about the MVC pattern and Hibernate. 3 layers? How will I know where the bug is? What if the bug is in a library? I better write it all myself in C. (Note that this is the correct way to make a tongue-in-cheek statement).
All and all, let me put it this way. We all work in multi-national environments. What if each developer documented their projects in their own native language, because lets face it, they are most productive in that language. Where would we be? Doing what’s best for oneself isn’t always best for the big picture
This last point about documentation doesn't even make sense. You write for your target audience. The compiler/interpreter is the intended audience of a programming language. You write Ruby code for the Ruby interpreter, Java for the Java compiler, etc.

The whole thing is absurd. You might as well say that SQL is a waste of time. The simple fact is that modern programmers use multiple languages in the same project with regularity, and the only ones who have trouble are the poorly trained & ignorant (who can hardly be called master of even the single language they know), and the old fogies, like Burke, who consider learning something new a waste of time.

Thursday

Buy my House!

My apologies for posting something almost personal, but this is mostly as a note for myself.

If you live in the Austin in area and are looking for a good house, mine is for sale.

Tuesday

When to go dynamic

The most typical complaint I hear from Java developers about Groovy is, "I spent X hours trying to figure out this problem that would've been a compile time error in Java." And they're right. Very little compile time checking is done in Groovy. It can't be done because methods may be available at runtime that aren't resolvable at compile time. That's part of what it means to be a dynamic language.

Some developers seem to think using the optional types will save them. e.g.,

int foo (String bar) {
return bar.size()
}

Looks good, right? Try running this script:

int foo (String bar) {
return bar.size()
}
if(false) {
String s = foo(123)
println s
}
println "Done"

Go ahead. Run that script. No error. Types aren't checked at compile time. If we change false to true, we get a groovy.lang.MissingMethodException at runtime, but that's it. Your compiler can't save you here. Good unit tests will catch some errors if you use the optional types, but you have to know what the error is before you can test for it, so you've already spent the X hours trying to track it down.

What makes Java so good for enterprise work is the static types. Static types enforce a contract. Static types make it possible for Eclipse to take you to the exact method that will be invoked by a particular line in the source. If the method is on an interface, Eclipse can tell you all types that implement that interface method and take you to any implementation you choose immediately. This is extremely useful when you're trying to navigate a complex, unfamiliar system. While a human may be able to easily figure out what file to open, having a tool that can do it for you automatically allows your concentration to remain on what the code does and lets you work at a higher mental level of abstraction.

If I have to learn a new framework that someone has written, I'll get through it 10x faster if it's in Java. No/bad documentation? Not a problem, I'll just look at the code for that method. Whereas if I come upon an undocumented Groovy method, what do I do? Ugh, I have to grep through all of the source to figure out where it's defined, how it's used, yada yada. Not quick. Breaks my concentration.

Java is very good at making large systems manageable. However, the trade off is that
any system that has been under active development for more than 2 weeks quickly becomes a large system. There is nothing concise about Java. There are very few shortcuts. The simplicity of the language and the rigidity of types means you have to write more code and produce more classes to get things done and God help you if you want anything remotely flexible. Code that needs to do something even slightly dynamic can quickly become unreadable and strewn with all kinds of reflection objects, odd exceptions and other nastiness. Or worse. You could end up in XML hell.

Enter Groovy. Dynamic languages do best when there are conventions and consistent ways of doing things. Grails is proof of this. Controllers are in the controllers folder, services in the services folder, etc. Each class follows a certain arechtype and there are good idioms in place that imply what is going on. However, when you start writing plugins, or doing things outside of the normal scope of Grails, things can get messy fast.

Anytime you're coding in uncharted territory, consider going back to Java. Writing a plugin? Implement as much of it as you can in Java. Only go back to Groovy if the equivalent Java code gets too ugly or would be many times longer. Most of your plugin/framework/new invention's plumbing should be in Java. The plumbing is where you connect everything together, and it's important to verify that thingDoer is of type ThingDoer, and that all interceptors properly implement the ThingDoerInterceptor interface. You want your API to fail fast so your users don't end up with a NPE exception deep in your code that they have no hope of debugging.

Keep Groovy at the fringes, handling the nasty implementation details, and protected by static types and good assertions. Make sure your Groovy class implements an interface, or better yet, have a Java class implement the interface and wrap the Groovy implementation so you can jump to it in Eclipse. Use Java to provide enforce the type contracts and provide a well defined framework. Use Groovy to make the implementation readable.

Thursday

Mock object are making your tests stupid

Mock objects were created because when you're writing unit tests (especially in a statically typed language), it can be a pain to create a subclass/implementations to use for your tests. You need a mock whenever the class you are testing depends on another class. You want to guarantee that changes to the other class don't break your tests for this class, because that would defeat one of the purposes of unit testing: isolation of the failure to a particular class/method. So you want an object that, whenever getTemperatureOutside is called, returns 32 degrees.

Sometimes you also want to make sure that the method you are testing causes some side effect. e.g., if the temperature outside is less than 32 degrees, it should call deployIceScraperNarwhals(). So, you want a method that records that deployIceScraperNarwhals() was called (but doesn't actually deploy the Narwhals) and something that checks that it actually was called, failing the test if it wasn't. Thus, the idea of verifying method calls was inextricably bound to the idea of providing fake methods to isolate behavior and every genius-pants writing a mock object framework decided that the verification was the most important part, to be emphasized throughout the documentation.

Then you came along, read the mock object documentation and proceed to go and write unit tests like this (mock object syntax is made up):

class UnderTest {
int calculate(Foo foo,Bar bar) {
int a = foo.getA()
int b = bar.getB()
return a + b
}
}

class UnderTestTest {
void testCalculate() {
Foo foo = createMock(Foo).getA().andReturn(3)
Bar bar = createMock(Bar).getB().andReturn(2)
assertEquals(5,instance.calculate(foo,bar))
}
}
Honestly, I'm looking for the stupid thing you did here, and I can't find it. The test passes, you didn't have to write MockFoo and MockBar classes, and all is right with the world. Until you decide to make an optimization:

class UnderTest {
int calculate(Foo foo,Bar bar) {
if(foo.getA() == bar.getB()) {
return foo.getA() << 1 // a+b = a*2 = a left shifted 1 bit
}
int a = foo.getA()
int b = bar.getB()
return a + b
}
}
Ignoring the fact that this probably hurts performance instead of helping, lets rerun the test and see what we get (error message made up because I'm lazy): "Unexpected call to getA." That's right, you only told it to expect one call to getA, not two. You didn't change the result, you didn't remove an important side effect, in short, you did nothing that should break a test. Everything is still working perfectly. But it broke the test.

This is the part where I get to tell you you're an idiot. Instead of realizing that what you really want is a stub object, you decided that you must have written the test wrong so you rewrite it to expect between 1 and 2 calls to getA(). Because these mock object things can be tricky, you proceed to write all of your unit tests from then on by examining the source of the method you are testing, creating a mock for each dependency, adding a method call for each method call in the source, and basically rewriting the method in mock form. You idiot. You just wrote a test that confirms that the method was coded a particular way, not that the method behaves correctly.

Here's an example:

class OtherUnderTest {
/**
* @return a Map with a result entry, which is a message to the user.
*/
Map deployNarhwals(Foo foo,Bar bar,Narwhals narwhals) {
if( underTest.calculate(foo,bar) < 32 ) {
narwhals.deploy()
return [result: narwhals.getStatus()+' narwhals deployed']
}
return [result:'ok']
}
}

class OtherUnderTestTest {
void testDeployNarwhals() {
Foo foo = createMock(Foo).getA().andReturn(3)
Bar bar = createMock(Bar).getB().andReturn(2)
Narwhals narwhals = createMock(Narwhals).getStatus().andReturn('Deadly')
narwhals.deploy()
Map result = instance.deployNarwhals(foo,bar,narwhals)
assertEquals([result:'Deadly narwhals deployed'],result)
}
}
You did so many things wrong here I'm going to have to make a list:

  • You mocked the parameters, but didn't stub the internal dependency. See that instance of UnderTest in there? You're not testing it's calculate method, you're testing deployNarwhals, so you need to replace it with an instance that returns something like 31 so that you'll always be testing the condition (which should be a part of your API documentation) "if the temperature outside is less than 32 degrees, the narwhals will be deployed." Stupid.

  • You verify that the side effect occurs, but you also couple your test to your implementation.What if I decide that the user doesn't need to know the Narwhal's status, I change the wording of the message, or make an extra call to getStatus()? Broken test. But the test shouldn't break just because your marketing department decided that 'released' was a more green term than 'deployed', or you cleaned up your code and change the number or order of non-side-effect-causing method calls.

  • You check for equality on a MapDo you really care if there are extra keys in the Map? Probably not. Instead, check that all the key value pairs you expect are there.

This is how you should have written your test:

class OtherUnderTestTest {
void testDeployNarwhals() {
boolean deployed = false
Narwhals narwhals = [
getStatus: { return "Deadly" },
deployNarwhals: {
deployed = true
}
] as Narwhals
instance.setUnderTest([calculate:{ return 31 }] as UnderTest)
Map result = instance.deployNarwhals(new Foo(),new Bar(),narwhals)
assertTrue(deployed)
assertNotNull(result.result)
}
}
You could use a mock object framework for the Narwhals object, but you should configure it to only require the call to deployNarwhals, and not to fail when other methods are called. The above test will not fail unless the API contract is broken. I only assert that the expected side effect occurred, and that a non-null message was returned.

The problem is that you morons seem to think that more assertions == better test. That's simply not true. A good test covers as many cases as possible, but only asserts what is defined in the API contract. If you start asserting things that are not part of the API, like getA() is called, you make the test brittle. Even worse, someone could come along and see your assertion and think that getA() being called is part of the API and code accordingly! Now you've introduced a bug somewhere else in the code that your tests wont catch and could become a complete mess if it isn't caught quickly. The reality is you should assert only what you have to to catch a bug. Sometimes tests don't even need assertions, or it's not worth the effort to write them. It's enough that they show that the code under test ran without error, which is no small matter. Knowing that retrieveNarwhals(null,null,null) doesn't throw an exception can be very important.

Now, get back to work and try to stop writing brain dead tests. You thick ninny.

EDIT: Here's a good alternative: Mockito.

Friday

Kill the Singletons. (And those things you thought were Singletons too)

You guys think you're so smart. You read something about design patterns once, and now you can justify any code you write by showing the pattern it implements. Except you only remember the Singleton, and a few others. You probably remember the Factory, because you usually implement it as a Singleton. Maybe you remember the Facade and/or the Adapter pattern. You may even have some code somewhere that implements the Visitor pattern, probably where it was not at all appropriate. But Singleton is the one that everyone knows. Even those of you who never even read a poorly written article on design patterns. And you use it everywhere.

Or you think that you do. Probably you're using it as a Factory or a Locator. Or you have a class with static methods that you use to load resources or something like that. Those are singletons right? No, but you thought they were. There are even some clever and occasionally useful Singletonish things implemented using ThreadLocals, but that's for later. The point is, you've used Singletons as an excuse for what amounts to a bunch of global variables and methods. Bad programmer! No copy and paste for you!

The real problem is in how you're using them. Singletons can't (easily or at all) be proxied, decorated, replaced or mocked. You're losing all kinds of flexibility, without gaining any sort of benefit. This is your code:

public void doSomething() {
MyConfig config = MyConfig.getConfiguration();
config.doSomething();
// etc....
}
This is your code on inversion of control:
private MyConfig config;

public void doSomething() {
config.doSomething();
// etc....
}

public void setConfig(MyConfig config) {
this.config = config;
}
Of course, somewhere else you need to myClassThatDoSomething.setConfig(configInstance);, or have your container do it. This pattern is also called dependency injection, but unlike the Singleton, it's easy to get right. Don't call static lookup methods (or any lookup methods if you can avoid it) and make everything you need an object property. It's a simple change, but it gives you tons of flexibility. The only downside is that you have to learn how to use an IoC container. Fortunately, they're getting more idiot proof by the day, so with a little work, you can probably figure it out.

Now, before you go and delete every last one of your singletons, there are a few situations where IoC wont work. For IoC to work, you have to have control of the creation of the object, or at least be able to get a reference to it before it's used. A good example is logging. Logging has to be available before the container even initalizes, so you can't dependency inject logging components (easily). And sometimes, the thing you need to inject can't be managed by the container. For example, suppose you need access to the current HTTP request object, or the session object? Ignore for a second that your container can usually mange to inject special objects like these and recognize that what makes these object special is that, in a particular scope, there is only one of them at a time. By their very definition, there can only be one HTTP Request and one Session at a time. To have two requests, or two session doesn't make any sense. That means you actually have a good candidate for a singleton. A filter like this can be very helpful (some types & casts omitted for brevity):
private static ThreadLocal request = new ThreadLocal();

public static HttpRequest getRequest() {
return request.get();
}

public void doFilter(request,response,chain) {
request.set(request);
try {
chain.doFilter(request,response);
} finally {
request.clear();
}
}
That lets anything that's executed inside the filter call MyFilter.getRequest() to get the current HTTP request object, regardless of whether the framework was kind enough to pass it in. Even this should only be a last resort. There may be a legitimate case where someone needs to wrap, replace or mock the request, but your singleton code gets in the way. How will you unit test code that accesses the request in this way? There is no real HTTP request when you're running unit tests. Dependency injection solves a lot of problems.

Sometimes you can't avoid the Singleton, but considering the mess that you've made of things so far, maybe you should ask someone smarter than you next time you think you really need a Singleton. You probably don't. Now get back to work, and when you have some time, get the original (and best) design patterns book, read it, and stop abusing them. Design patterns are meant to describe good code, not magical pixie dust that you sprinkle on to make your crappy code suddenly good. Alas, that is another post.

Thursday

JS Tips & Tricks

http://javascript.crockford.com, Secrets of the JavaScript Ninja and JavaScript: The Good Parts will cover almost everything you need to know about JavaScript. Still, there are a few tricks that I use almost every day that I feel are often underrated:
  • && and || aren't your father's boolean operators. Thanks to JavaScript's notion of truth (anything but false, null, 0, undefined and the empty string), they don't have to return boolean value, and will not unless you coerce it into one with !!. That means you can use them inline as part of assignments and in parameter lists. e.g,
    function(someParam) {
    // provide a default value
    someParam = someParam || {};
    // ...
    }

    // guard against null objects
    var count = foo && foo.count || 0; // if foo is null, count == 0, else foo.count
  • function scope:
    (function($){
    $('#foo').doSomething();
    // ...
    })(jQuery);
    The key here is that you're taking a global/wider scoped variable and making it a local variable, which allows you to rename/replace the original variable with little to no impact to your code. For example, suppose you need to use 2 versions of jQuery in your page. If you have the good sense to wrap all of your scripts in a function, you can include jQuery version A, then include your scripts that need version A, then include jQuery version B and your other scripts. Or if you wrote all of your scripts using $ as a reference to jQuery and then wanted to use Prototype in the same page.

    The other nice thing about wrapping your scripts is that any vars and function your define wont bleed into the window scope unless you specifically assign them to window. e.g., window.GLOBAL_FOO = 123;

Sorry for the short post. I have some more code heavy stuff I think I'm working on for next time, I promise.