Who ya gonna call and what ya gonna call it?

So, this is my last planned post on antipatterns, but it was this antipattern that sparked me into writing the mini-series in the first place.

It all started with Ryan and I doing some work on Idea Group Rules a few weeks back. We dived in a little, and suddenly my fury was enflamed. I noticed that this functionality was filled to the brim with my second least-favorite “type” of class, the “Helper” class. (My least favorite are “Manager” classes.)

Nearly every time that I have come across one of these “Helper” classes, the class just doesn’t mean anything.

It is just a lump of code.
It is not part of the domain.
It does not represent a function of the application.

Our Helpers are just a lump of code.

This antipattern is called the Poltergeist. C2 describes the Poltergeist antipattern as:

Unnecessary and redundant navigation paths in the course of development, highly transient associations of a particular class with another one, presence of stateless classes, occurrence of temporary and short duration objects/classes or classes that exist only to invoke other classes through temporary association

That explanation is bit wordy, but I think that the key words in there are “highly transient”, “stateless”, and “only exist to invoke other classes”. They are just little bits of code that flit in and out of existence, push some objects around mysteriously, and make little sense in terms of the rest of world around it.

Ghostbusters

Ghostbusters

I will be the first to admit that Object Oriented Design is hard to do and takes some time. I will also readily admit that we are often pushed to get things done, or feel pressured to just add that one line of code and not think about that yucky legacy code. By adding (or enhancing) these Helpers we are just increasing our code debt, and making it even harder for the next person to figure out what is going on…

For those of you who remember the last antipatterns post may wonder what is the difference between Managers and Poltergeists. Well, there is no difference really. Our Managers are just Poltergeists that are seriously out of control.

Manager and Helpers are symptoms of the same problem: a need to focus on what our TIM/HIP/IDS objects mean and how they interact with each other.

It is hard to do, but let me see if I can start with one tip to get us all moving in the right direction.

Jason’s Object-oriented design (OOD) Tip #1: Get the name right.

Think long and hard on what you are creating and make sure the concept fits into the domain of the rest of the objects. Ask somebody if the name makes sense to them. Remember that I am always there to help you if you need some to talk through it when you get stuck.

Now, I do NOT mean get the name right the first time. You may not, and that is okay. Get it right the next time.

Also, do not be so naive to think that the name will never need changing. All of our codebase is growing, and what is called an Idea today, might be a Basket tomorrow…

Naming applies to classes, methods, fields, and really anything in the code. If you see something that has gotten out of sync with the current naming standards, fix it. There are so many tools in Eclipse to make renaming trivial.

Let us try to get the names right first, and see if that starts to push through the right structures and relationships for better code.

Equals and HashCode in Java

This is one of my pet peeves. Partially because it isn’t really explained that well in Java documentation, but mostly because I’ve seen it wrong so many places. As I’ve come across this twice in the past two days in our code, I felt the need to blog about it.

Let’s look at the contract for hashCode.

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

The summary is “if two things are equal (according to equal()), they must have the same hashCode”. Which probably means that we should be considering the same data in the “equals” method as we are in the “hashCode” method.

What’s the deal with HashCode?

Let’s say you need a map. You want to store key/value pairs, and you want to be able to get a value via a key. How would you implement this? Well, one thing you could do is have a list of all the key/value pairs and loop through the whole list. For each key/value pair, you could see if that key matches the one you are looking for. If it does, you return that value. Hooray!

The only problem with this is that it is slow. What if you have a map of a million items? A billion? You potentially have to loop through the *whole* list. On every lookup. That makes puppies sad.

Sad puppy

How could we do better? Well, one way would be to keep a bunch of lists of key/value pairs inside your object. When you add a key/value to your map, you put it in the “right list” somehow. Then when you go to get something from your map, you look in just the right list, and only have to go through all the key/values in that list. If you have a lot of lists, each list should hold very little, so there isn’t much to search through. Now you’re fast. Yay!

What is this magic “somehow”? This is the HashCode. It’s an int that lets you say what “bucket” to stick your key/value pair into.

The HashCode/Equals connection

This “bucketizing” is why equals and hashCode have to match up. Let’s look at an example:

Map capitals = new HashMap();
capitals.put("Alaska", "Juneau");
capitals.put("Arkansas", "Little Rock");
capitals.put("Virginia", "Richmond");
String foo = capitals.get("Virginia");

Let’s imagine for a second that HashMap made 26 buckets, and hashCode returned a number based on the first letter of the string. So, the “A” bucket would have two items in it and the “V” bucket would only have one. When we try to get the capital of Virginia, the hash map would know to only look in the “V” bucket. If we were to ask for the capital of Nevada, it would go straight to the “N” bucket, see it was empty, and return null. For getting the capital of Arkansas, it would go to the “A” bucket and through everything in that bucket, calling equals() on each key.

So that means that if two things are equal but don’t have the same hashcode, HashMap might not look in the right bucket, and therefore wouldn’t find the key in the map, even though it is really there.

Real example
Now let’s look at a real example (only slightly obfuscated for publication):


Message upper = new Message("I like e. e. cummings!");
Message lower = new Message("I like E. E. Cummings!");

HashMap myMap = new HashMap();
myMap.put(upper, "PoetryFan");
System.out.println(myMap.get(lower));
System.out.println(upper.equals(lower));


which outputs:

null
true


So what this means is that these two Messages are equal, but the HashMap doesn’t think so. BAD!!!

The implementation of this is:

    
public boolean equals(Object obj) {
    if (!(obj instanceof Message)) {
        return false;
    }
    if (!caseSensitiveCompare()) {
        return StringUtils.nullSafeEqualsIgnoreCase(
                 this.getString(), 
                 ((Message) obj).getText());
    } else {
        return StringUtils.nullSafeEquals(
                 this.getString(), 
                 ((Message) obj).getText());
    }
}
    
public int hashCode() {
    return string.hashCode();
}

HashCode is hard – let’s go shopping!

As we can see from the above example, it is very easy to get this stuff wrong. Here, the “case insensitivity” was added later, but the hashCode method wasn’t updated to match. Any time I touch anything having to do with equals/hashcode, I have someone else review what I did. We should all probably do that.

So, what makes a good hashCode method? I’ll leave that for the subject of a future post, but for now it suffices to to say that first and foremost it should be correct as stated above. Making it good is a bit harder.