Code Debt

I went to a really neat workshop on ‘Code Debt’. The idea behind this term (as I see it) is that you should constantly maintain your software. When you make a change, but don’t take the time to refactor the code to play well with your change, you incur code debt. You didn’t ‘pay the price’ at the time so it stays around. As I read one some blog once, I think developers understand what this phrase means when they hear it, even if they’ve never heard it before.

It’s really easy to incur code debt. Who hasn’t had to fix a bug where adding the one-off change takes a minute, but “doing what you should do” would take hours? You have some edge case that just doesn’t move in the same way that the rest of your abstraction does. You’re releasing soon. Hey, it happens.

But just like keeping a monetary debt costs you over time, so does having a code debt. In the workshop, they illustrated this in a very vivid way. They broke the class into two groups, and asked each group to add a small piece of functionality to a bit of code. They gave us all party poppers and told us to pull them when we finished. Within five minutes, people on team A started popping their poppers. No one on team B did. Afterwards, they showed us the two code bases, and we all knew why – team B’s starting code was *awful*. I mean, really bad.

I should stress that both code bases were developed using TDD. Both had tests that fully passed (except for the cases where we were adding the new stuff). But when the presenters had written the ‘B’ code, they started from one requirement, and as they added new ones, they just added new cases – no refactoring. They didn’t really change existing code – just tacked on more stuff at the end of the function. They took us through the history of developing it. In each case, you could sort of tell where they were coming from. “This is just a little different from the other case – I’ll just add a boolean to change between the two cases.” By the end, you couldn’t make heads or tails of it.

The ‘A’ code, on the other hand, was really clean. They added had added functionality bits one at a time (just like ‘B’), but when things got complicated, they refactored to make things more clear. They got rid of duplication. They abstracted out the bit that changed between the cases. And the code at the end was great. But more over, it was easy to modify. What they showed us was that while they may have been able to write the ‘B’ code faster, that productivity wasn’t free – we *had* to pay that debt some time.

They went on to talk about that there is a continuum on this debt scale – from “aware of debt” to “critical of it” to “addressing it” to “managing it”. The last stage was “unencumbered by it” – the state at which there wasn’t any code debt that held up developers from making changes. Which I think was pretty clearly illustrated by the exercise.

I really like the code debt metaphor, because I think it makes things clear to people, even outside of development. Telling someone that the code “smells” doesn’t really mean anything to them. Telling them that we are incurring debt is a lot clearer. Plus it is really true – writing code later really does cost more because of it.

Greetings from SPA

I’ve been meaning to post since Sunday night, but I don’t think I’ve had a single minute of down time since I’ve gotten here. Even just hanging out after dinner I’ve either been in discussions with people or hacking on code. Or drumming.

So, greetings from SPA – the Software Practice Advancement conference.

Sunday

I got here on Sunday and went to a session on “Functional Finance”. The idea for the session was to introduce the functional language O’Caml, introduce a problem in the financial domain, then solve the problem in the language. As I work in financial software and have developed a renewed interest in functional programming as of late, it seemed like a great choice.

First, let me say that I’ve been doing some Haskell programming as of late, so I figured O’Caml wouldn’t be that hard – I’d just focus on the ‘finance’ side of the coin. However, what I ended up doing was continuously trying to write Haskell in O’Caml. Which almost works for a little bit. Then you get into what I think is one of the wackier syntaxes I think I’ve encountered. And then you start cursing at the screen.

I think this session would have been better if it had been longer. The functional paradigm is new to a lot of people, and O’Caml is sufficiently weird, so I think they maybe needed more time to really get a handle on things. Also, while the domain problem was interesting, I think it was really over the heads of most people there (me most definitely included). So when we implemented part of the problem based on some C/Python source, I didn’t really understand what I was implementing did, I just translated C into O’Caml. And while I did clean up that code a bit, I didn’t really feel like what I produced on the other side was really ‘better’. Though after a half hour of pulling out hair trying to get it to compile, it did work on the first try.

I’ll post more soon.

Web Server in Python

Wow! I thought Jetty was easy to set up, but a web server in Python takes the cake.

I need a server that responds with the same file no matter what GET request comes in – about the simplest server imaginable. And here it is:


from os import curdir, sep
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer

#Very simple web server that just gives the time of last request check
class MyHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        try :
            FILE = open(curdir + sep + "lastrun.txt") 
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(FILE.read())
            FILE.close()
            return
        except IOError:
            self.send_error(404, 'File Not Found')

try:
    server = HTTPServer(('', 9090), MyHandler)
    print 'started httpserver...'
    server.serve_forever()
except KeyboardInterrupt:
    print '^C received, shutting down server'
    server.socket.close()

Twenty-four lines, and I didn’t even need to import anything outside the core language!

Performance tuning sucks

So, I have been doing a lot of tuning lately, and I will be the first to admit it. Performance tuning sucks, even here at YouDevise.

Let me start by telling you the reasons why performance tuning does NOT suck at YouDevise.

  • We DO have the smarts. Every developer (and most of the non-developers) can solve these performance problems once they understand the nature of the performance issue.
  • We DO have the tools. Or if we don’t today, Management (i.e. Rich/Colin/Squirrel) is sensible and can give us access to the tools tomorrow.
  • We DO have the diagnostics. Over the intervening years, YouDevise has developed some nice tools into and around IDS (like the Task Performance report, PerfManager, System Monitors, etc.) that give us great insight into problems when they appear.

Now, why does performance tuning suck at YouDevise?

  • Our application features keep changing. The easiest way to keep performance constant is to not change the code. If no one adds new code that changes the behavior of the previously tuned screen or process, then the previous optimization will not be lost. Now this isn’t fool-proof…
  • Our application’s data profile does not stay constant. A little over a year ago, when we last tuned the main Trade Idea review screen in the TIM, the heaviest user of the system was viewing 500 ideas at a time. Today, we have users regularly viewing 5000+ ideas at a time. An order of magnitude difference in data volumes warrants another round of tuning (because our customers always assume constant performance.) There are many reasons for the order of magnitude jump: simply more users, the network effects of more TIM users (i.e. the potentially exponential increase of TIM data) but…
  • Our applications are transforming. The TIM and the HIP both were initial designed to be applications that they are not today:
    • The TIM was going to be a simple trade tips system that acted as a basic central repository and transport mechanism. Now, it is morphing into a multi-purpose solution for portfolio management, feeding algorithmic systems, a compliance back-end, and a research tool. Oh, AND a trade tips system.
    • The HIP was initially conceived as a simple portfolio management tool that facilitated communication of position information between fund of funds and hedge funds. That last part has been mostly forgotten, and now the HIP is a full-fledged middle office system with performance, liquidity, and investor relation tools that is being used as a central system for fund of funds and their administrators.

I guess what I am trying to say is that performance tuning a moving target is hard, even with smart guys and good tools.

We will never win the war against performance problems.

There is no single, super-duper strategy that will make performance problems go away. The battlefield keeps changing. What we are fighting for and against keeps changing. You can win one battle, which causes you to lose two other ones.

What are we to do? Even though we will never win, we still must fight this battle.

  • Shore up your defenses. Any ground that you gain you should defend. If you tune a screen to open in 3 seconds for a certain 1000 rows, then you better create a test that confirms that. Even better automate that test case, and throw it into continuous integration. Then you will know when that one line of code is added that breaks your optimization.
  • Gather as much information you can about the enemy. Blindly attacking is worse than not attacking at all. Gather metrics and numbers. Never assume that you “know” what the problem is. Make sure you have something (slow SQL queries, profiler results, log files, etc.) to back up your performance optimization before you start on it.
  • Take it one battle at a time. Attack the enemies today, not the ones that you may have tomorrow. Wait for the performance bottleneck to appear. Do not assume that your code needs some fancy Hibernate second-level caching until you have some proof that those DB hits are there.

    Now, there needs to be some common sense applied to this rule. This does NOT mean that you should ignore performance of your code. Just keep in mind that any optimization generally adds complexity to your code and therefore decreases the maintainability of your code. There is a balance to be had. Further discussion on the topic.

Unfortunately, all of this is the nature of the business that we are in. If the TIM and the HIP stop innovating and stop providing value to our customers, then we are sunk. So, we need to accept change and realize that we are just going to need to be flexible and agile. Solve the performance problems when they appear and make sure they stay fixed.

The Release Rush

Two blog posts (The Crunch Mode Paradox – Turning Superstars Average and Exception Handling in Software) really reminded me of something this past week.

Fortunately at YouDevise, we have a very strict No Death March policy. Working 40 hour weeks is company policy. (You don’t want to see Squirrel angry, do you?) But, in our past, we have seen the Death March’s sneaky cousin, the Rushed Out Feature/Bugfix.

The warnings signs are often a reasonable (or unreasonable) “customer deadline” coupled with too much other “high priority” work. The developer will usually get the feature assigned to them too late to do the correct thing and spend the time on it that it needs.

Now, all of our developers are smart guys, and they will usually figure out a way to squeeze the feature out in time, but it will always be at the cost of rotting the code around it. Now, being a smart guy, the developer has good intentions and plans to fix it later. Inevitably though, given that the feature is now “complete”, all of the refactoring, testing, and clean-up gets lost in the priorities shuffle.

We need to be diligent and focus on doing things correctly and not simply fast. All of us have stories of leftover code that was just slopped together to get it out the door quickly. Then, many months later, the bugs surface, and we end up spending many times more effort to spelunk the code, figure it out, and find a fix that doesn’t cause more problems. That doesn’t even cover the time spent needing to fix the original technical debt needed to refactor, test, and clean up the code.

We need to be diligent to make completing features (and bugfixes) correctly our first priority. Trust me, if we make all of our code changes correct, making code changes quickly will usually follow right behind it.