Getting rid of unused code with UCDetector

Introduction

The Java code base of TIM Funds (one of our products here at TIM Group) is around seven years old and in very good shape, considering its age. (This is a highly subjective statement as I joined the company less than a year ago. I am comparing with other Java projects of similar age and size from previous work experience.)

Nevertheless, cruft inevitably builds up over the years, so constant effort is needed to keep it down to a minimum. To accommodate this, we follow the boy scout rule while developing new features – trying to check-in code that is a little cleaner than when you found it. Furthermore, we have made room for a separate ongoing track of work named Sustainability, which is intended for concentrated efforts on removing problems that we have spotted in the current code or keeping our automated tests fast and flicker-free.

A couple of weeks ago, I was working on the Sustainability track, trying to refactor CollectionUtils, an old custom utility class with lots of public static methods that we’d like to remove. While doing this, I noticed that many methods had become completely unused over time, probably because developers had replaced their calls by using Guava, our collection library of choice. I found myself repeatedly pressing Ctrl-Shift-G (the Eclipse shortcut for finding references) to be able to see what can be safely deleted. After a short while I got tired of this and assumed that somebody must already have had the same problem. As usual, Stackoverflow knew the answer – several posts around this topic mentioned UCDetector, an Eclipse plugin which detects unnecessary public Java code. It looked like what I needed, so I gave it a try.

Configuration of UCDetector

My first impressions after installing UCDetector were mixed – although I only ran it on a package structure only containing around 50 classes, it was quite slow and produced an overwhelming amount of warnings with many false positives. (For example, code only called via means of reflection cannot be detected reliably.) The default settings of the plugin are quite lenient. They try to find many possible problems at once, but this way you can’t see the wood for the trees.

Fortunately, the plugin can be configured in many ways. It is worth spending some time finding appropriate settings for your project (in Window→Preferences→UCDetector). The suggestions below should get you started:

  • Ignore all warnings for possible use of final and visibility (tab Keywords). Your mileage may vary, but this increased the number of warnings considerably and gave some false positives for me – I consider this to be a slightly different aspect of cleanup and would rather do it separately.
  • Ignore code with annotation Override (tab Ignore, section Marked code). This is a compromise that might result in some truly unused code not being detected. You may want to try leaving this out, but I got a lot of false positives when detecting code only called by tests without it. For example, we implement Guava’s Function interface quite a lot and only call its apply method directly from tests.
    • If using Guice, add Inject to the list of ignored annotations (these members are called by the Guice injector).
    • If using Jersey, add GET, Path, etc. to the list of ignored annotations (these methods are called reflectively by Jersey)
    • Same goes for other annotation based third-party libraries.
  • Refine the settings by scanning some packages, examining the warnings, deleting some code and running your tests – you may find more incorrect warnings and need to extend the configuration.
    • For example, I had to ignore all classes extending our custom class PersistableEnumeration (tab Ignore, section Classes), because methods of sub-classes will be called reflectively via Hibernate mappings.

After these initial adjustments had been made, UCDetector worked really well for me. The scanning process finished significantly faster and the number of false positives was greatly reduced. I found UCDetector’s feature to warn about code only called by tests especially useful, because these cases are even easier to overlook when checking manually. Two short usage recommendations:

  • For bigger projects, prefer scanning package by package, not everything at once. This way, you get faster feedback and can concentrate on one area of the code (different packages might even need different settings). It is a good idea to keep the check-ins small anyway – I broke the build once or twice when starting out…
  • After removing some code, scan the same package again because more code might have become unused (domino effect).

Findings

My personal “kill count” after two or three hours spent with deleting code (spread across two weeks, using the very restrictive configuration mentioned above, and only covering under 50% of the code base): over 40 complete files (classes/interfaces), over 60 public methods, and even one database table (the corresponding Hibernate entity implemented an interface whose methods were all unused).

Although the number of deleted files was surprisingly high for me, I still stand by my statement that the code base is in good shape for its age. So how come there is so much unused stuff anyway? Looking at the source control history for the findings, it seems there are three main scenarios that are likely to leave unused code around:

  • Removal of functionality: Like everywhere else, requirements for TIM Funds change over time. Some features become obsolete and need to be removed. It is easy to forget or overlook to delete some service that was only used by the obsolete feature, because you don’t get any warnings from Java or the IDE itself. The one unused database table belongs in this category.
  • Bigger refactorings: So far, I have had most “success” deleting code within our file import component. This is an area of the code that people were never completely happy with and different people had different opinions about. Consequently, several attempts to improve the code were started. Now there is a “new way of doing it”, but several “old ways of doing it” were still lying around. This was true on a smaller scale in other areas as well.
  • Introduction of new libraries/frameworks: After we introduced Guava and started replacing calls to CollectionUtils, these methods gradually became unused. Another case is replacing the functionality of a third party library (Apache Commons) with a different one (Guava), likely making the JAR unused in time. We still have to tackle this particular task for our code base. Other tools like Classpath Helper are needed to fully help us here.

In hindsight, all this might sound rather obvious, but it’s worthwhile keeping these three scenarios above in mind. They provide good opportunities to start up UCDetector and delete some unused code, which is always useful, often quite educational, and sometimes even fun.

Double considered harmful

or: How the number 1.2 cramped my style.

So, everybody who paid attention in college CS knows this. Heck, I didn’t always pay attention, and I knew it, but it still caught me by surprise when I hit my head on this issue in the real world.

Recapping the background: The double (as we know it in Java) is short for “double precision binary floating-point format”, and it is the float‘s more precise big brother. Floats and doubles represent a trade-off where precision is traded for the ability to store them in a fixed amount of space and to perform very fast arithmetic on them. Basically a float or a double is two separate numbers, called a significant and an exponent, and they are then subjected to some mathemagic to yield any of a very wide range of numbers.

If you don’t need the floating point – and often you don’t – you’re better off storing your numbers as fixed-point (e.g. the DECIMAL columntype in most databases). Unfortunately Java (and many other not-so-modern languages) doesn’t have such a type natively.

So doubles aren’t perfectly precise, we know that, but that’s only for large and weird numbers, right? No. For one, a double can’t be 1.2. You wouldn’t know that if you didn’t look for it, because when you print doubles, Java rounds them off for you. And rightly so, the error is exceedingly small: had this been 1.2 meters, the error would amount to a fraction of the size of an atom. Obviously, this is perfectly fine – right up until the point when it’s not.

Our application dabbles quite a bit in arbitrary precision. We (among other things) deal in sequences of daily return ratios and foreign exchange – numbers that are notoriously unwilling to just be nice and round. By the time we get around to adding up your daily FX-adjusted returns for the past year, the error on a simple double-value starts to add up to real money, so we implement these numbers as BigDecimals (specifically our own wrapper for them that adds some useful functionality, including the ability to manage fractions).

This is all very well – until someone decides to stick something like 1.2 into BigDecimal. Recalling how printing 1.2 is fine, you’d be excused in assuming that using the double-constructor of BigDecimal is fine. But it’s not, and this has potentially disastrous consequences.

Consider this simple method:


    public BigDecimal iterate_multiplication(
                                   int iterations,
                                   BigDecimal multiplicand) {
        BigDecimal bd_out = BigDecimal.ONE;
        for (int i=0; i<iterations; i++) 
                  bd_out = bd_out
                           .add(BigDecimal.ONE)
                           .multiply(multiplicand);
        return bd_out;
    }

And these two invocations – on the surface they are equivalent:


iterate_multiplication(1000, new BigDecimal(1.2));
iterate_multiplication(1000, new BigDecimal("1.2"));

(BigDecimal will parse a decimal number from a string – IMHO the easiest way to get an exact value into a BigDecimal.)

The latter invocation returns well over 1000 times faster. Three full orders of magnitude. 15 milliseconds in place of 20 seconds. This is because the former is called with an argument that, behind the scenes, has 51 decimal places, which, obviously, is significantly more complicated to multiply.

The two methods returns rather different numbers, but that is due to iterating the multiplication, which amplifies the error of the double. Both numbers have 81 digits before the decimal point, and share the first 13. The really interesting – but at this point, hardly surprising – difference is after the decimal point: The former has 52,000 digits after the decimal place, the latter just 1,000.

Even though fortune-cookie conclusions are often over-general, I will venture into that territory: Don’t ever construct a BigDecimal with a double. It doesn’t do what you think it does.

Lightning Talks

After my recent enthusiasm on the topic, I organized a “lightning talks” session over lunch today. It went even better than I hoped!

We had five presentations:

  • I talked about Testability Explorer
  • Paulo talked about a script he wrote to convert Selenium Core tests to Selenium RC tests
  • Jason talked about “Java Puzzlers vs. Checkstyle” (how Checkstyle can help us catch the type of issues presented in the book)
  • Julian talked about fuzzy logic
  • Andy talked about ‘horizontal vs. vertical slicing’

All of them went really well – I definitely enjoyed the whole thing. We did a quick retrospective afterwards and determined we would start doing them every week, but just block out half an hour over lunch. We liked the format we did and are going to stick with it. Even the ‘hold off all questions until after the whole thing’ went over well – spurs conversations after the meeting.

Overall, I consider the experiment a success!

Lightning Talks Are Go!

More lightning talks goodness today:

  • Andy talked about how our calculation framework relates to map/reduce
  • Joel talked about alternative ways we could express calculations
  • Jason showed us that XSL has cool dispatch ability
  • I blabbed on about JProbe

I didn’t know what people were talking about before, and it was a bit of a surprise that we got two on our calculations, but it was a pleasant surprise. Also, we decided we want to add a “talks we’d like to see” section to the wiki (I added it on the LightningTalks page), so if you want an idea for something to talk on, look there!