Distributed Pair Programming @ TIM Group with Saros

There’s a particular technology used within TIM Group that has been too useful for too long to continue to go uncelebrated. The tool is called “Saros“, it is an Eclipse plugin for distributed pair programming that is used extensively at TIM Group.

Why We Need A Distributed Pairing Solution

We have developers spread across London and Boston offices, and a few “Remoties” working from home in various far flung lands. As one of said Remoties, I believe I would not have the opportunity to continue to work at TIM Group and live where I want to live (hint: not London) if it not for Saros.

Having been entirely colocated just a few years ago, our choice to use pair programming by default didn’t really have technical limitations. With an extra seat, keyboard and mouse at every developer’s workstation, the barrier to pairing was low. Just pull up a chair and pair. When we first started having distributed developers, we didn’t want to give up on pairing for technical reasons, so the search began for how to adapt pair programming to a distributed environment.

Why Other Solutions Don’t Match Up

An oft-cited solution for this problem is to use a screen sharing technology and remote access technology. There’s plenty to choose from: Windows’ Remote Desktop; TeamViewer; LogMeIn; GoTo MyPC; VNC; NX; etc. However, we found them all far too limited for pairing. Just like colocated pairing, there’s only one cursor, so only one person can use the keyboard and mouse at any time. Unlike colocated pairing, there’s a large bias towards the person hosting the session, as they get much faster feedback from typing, since they don’t have to suffer the latency. When it’s (figuratively) painful to type, it’s much easier to shy away from being the “driver”, which hinders the collaboration. We never found a solution that reduced that latency to an acceptable level.

Another cited solution is to use a terminal editor like tmux, that does not have the overhead of screen sharing. The feedback when typing is much better, however, there’s one major drawback: being limited to terminal editors only. I’ve seen people whose terminal environments are well suited for developing in a language like Ruby or JavaScript. For those of us coding in Java and Scala, we didn’t want to give up the powerful features we appreciate in our IDE, so tmux is not suitable.

Saros To The Rescue

Thankfully, we discovered Saros, a research project from the University of Berlin. The most relatable way I’ve found to describe it is as:

Google Docs within the Eclipse IDE

It works by connecting two or more developers through Eclipse, so that when Alice enters some text, Bob sees it appear in his editor. The experience for both users is as if they were editing files locally[0]. Rather than sharing the image of a screen, edit commands are serialised and sent over the wire, changing the other participant’s local copy. This comes with several other benefits over remote access technologies:

  • the feedback when typing is instant, for both parties
  • the latency for seeing your partner’s keystrokes is much lower than when transmitting an image of the screen[1]

There are even benefits over colocated pairing:

  • neither the host nor guest has to leave the comfort of their familiar work environment to pair; you can set up fonts and shortcuts however you like
  • since you are both editing as though it was local, each participant has their own cursor, and can be editing different files, allowing ad-hoc splitting of tasks[1], which can be very handy
  • you can have more people involved (up to 5) which we’ve used for code review and introducing a project to a group of people, sparing the discomfort of hunching around a single desk

There are other distributed pairing solutions, such as Cloud9 IDE, and Kobra.io, but none (that we’ve found) that let you stick with your own IDE setup, retaining parity with how you would develop locally.

There are of course drawbacks to distributed pairing, regardless of technology, which I’m not going to cover here; they’re generally not problems Saros, or any technology, will be able to solve.

IntelliJ Support Is On The Roadmap

After a long search, we have not found anything that really compares with Saros. For me, it really is a killer feature of Eclipse. So much so that I’ve basically kept IntelliJ in a dusty old drawer, even when it would be better for certain tasks (like Scala development). Fortunately, it looks like that’s about to change. Work has just begun to port the Saros platform to IntelliJ, which is really exciting. Whenever I’ve talked to people about Saros, alternative IDE support inevitably arises. If the core Saros technology was agnostic of the IDE, it could be a huge leap forward for collaborative development in general.

At TIM Group we were so keen on the idea that a handful of us spent a “hack week” throwing together the first steps towards an IntelliJ plugin for Saros. We were able to demonstrate a proof-of-concept, but didn’t get anything truly viable. Having brought this effort to the attention of the Saros team, I hope that in some small way it inspired them to start work on it, but I doubt that’s something we can take credit for. Hopefully, during the development of the IntelliJ plugin there will be something that we can contribute, and give something back for our many hours of happy usage.

If you’re looking for an answer to the problem of distributed pairing, I heartily endorse Saros!


[0] for more details of the theory behind this aspect of collaborative editors, see http://en.wikipedia.org/wiki/Operational_transformation
[1] we have acquired a habit of being vocal about whether we are in Driver/Navigator mode, and if your pair is following you, since you can’t assume they are

Released scalaquery-play-iteratees 1.1.0

Announcing the release of scalaquery-play-iteratees version 1.1.0.

This version is for use with Slick 2.0.0+ and Play Framework 2.2.0+ on Scala 2.10.x.

This library provides the “glue” between Play’s concept of Iteratee and Slick’s Query, making it super-easy to process chunks of database query results in a reactive fashion.

Please leave us a comment if you find this library useful — we are in the process of porting it to be part of the play-slick project, and welcome any and all input on syntax, documentation, usage, etc!

TopicalJS: A rose amongst the thorns?

We have a codebase in semi-retirement to which we occasionally have to make limited changes. With active codebases, you are continuously evaluating the state of the art and potentially upgrading/switching third-party libraries as better solutions become available. With inactive codebases, you live with decisions that were made years ago and which are not cost-effective to improve. However, not all old code is bad code and there are some really excellent patterns and home-grown tools within this codebase that have never seen light outside of TIM Group.

It was a great pleasure, therefore, to be able to extract one such gem and make it open-source during the last round of maintenance programming on this codebase. We’ve called it TopicalJS, mostly because this was one of the few words left in the English language not already used for a JavaScript library. Topical is concerned with the management of events within a client-side environment (or indeed server-side if you run server-side JavaScript).

This old codebase uses Prototype and YUI on the front-end and a custom event-passing system internally (which is not very inspiringly) called the “MessageBus”. Our newer codebases use Underscore, jQuery, and Backbone. Backbone comes with its own event system which is built into every view, model, and collection. You can raise an event against any of these types or you can just use a raw Backbone Event instance and use it to pass around events.

Without Backbone, and in fact before Backbone existed, we invented our own system for exchanging events. Unlike Backbone all it does is exchange events. So, it could even be used to complement Backbone’s event system. It’s best feature is that it encourages you to create a single JavaScript file containing all of the events that can be fired, who consumes them, and how they get mapped onto actions and other events. This is effectively declarative event programming for JavaScript, which I think might be unique.

You use it by creating a bus and then adding modules to that bus. These modules declare what events they publish and what events they are interested in receiving. When a module publishes an event it is sent to every module, including the one that published it. Then, if that module subscribes to that event type, its subscription function will be called along with any data associated with the event. Events can be republished under different aliases, multiple events can be awaited before an aggregated event is fired, allowing easy coordination of multiple different events.

As an example, this is what bus configuration code might look like.

topical.MessageBus(
  topical.Coordinate({ 
    expecting: ["leftTextEntered", "rightTextEntered"], 
    publishing: "bothTextsEntered" }),

  topical.Republish({ 
    subscribeTo: "init", 
    republishAs: [ "hello", "clear"] }),
  topical.Republish({ 
    subscribeTo: "reset", 
    republishAs: "clear" }),

  topical.MessageBusModule({
    name: "Hello",
    subscribe: {
      hello: function() {
        alert("This alert is shown when the hello event is fired");
      }
    }
  }))

The Coordinate module waits for two different text boxes to be filled in before firing and event saying that they’re both present. The Republish modules raise the hello and clear event, ultimately causing an annoying alert to be shown, as well as providing another module the ability to react to the clear event and flush out any old data.

Full documentation and a worked example is available here: https://github.com/youdevise/topicaljs.

Feedback or contributions are most welcome.

Report from DevOpsDays London 2013 Fall

This Monday and Tuesday a few of us went to DevOpsDays London 2013 Fall.

We asked for highlights from every attendant and this is what they had to say about the conference:

Francesco Gigli: Security, DevOps & OWASP

There was an interesting talk about security and DevOps and a follow up during one of the open sessions.
We discussed capturing security related work in user stories, or rather “Evil User Stories” and the use of anti-personas as a way to keep malicious users in mind.
OWASP, which I did not know before DevOpsDays, was also nominated: it is an organization “focused on improving the security of software”. One of the resources that they make available is the OWASP Top 10 of the most critical web application security flaws. Very good for awareness.

Tom Denley: Failure Friday

I was fascinated to hear about “Failure Fridays” from Doug Barth at PagerDuty.  They take an hour out each week to deliberately failover components that they believe to be resilient.  The aim is not to take down production, but to expose unexpected failure modes in a system that is designed to be highly available, and to verify the operation of the monitoring/alerting tools.  If production does go down, better that it happens during office hours, when staff are available to make fixes, and in the knowledge of exactly what event triggered the downtime.

Jeffrey Fredrick: Failure Friday

I am very interested in the Failure Fridays. We already do a Failure Analysis for our application where we identify what we believe would happen with different components failing. My plan is that we will use one of these sessions to record our expectations and then try manually failing those components in production to see if our expectations are correct!

Mehul Shah: Failure Fridays & The Network – The Next Frontier for Devops

I very much enjoyed the DevOpsDays. Apart from the fact that I won a HP Slate 7 in the HP free raffle, I drew comfort from the fact that ‘everyone’ is experiencing the same/similar problems to us and it was good to talk and share that stuff. It felt good to understand that we are not far from what most people are doing – emphasizing on strong DevOps communication and collaboration. I really enjoyed most of the morning talks in particular the Failure Fridays and the The Network – The Next Frontier for Devops – which was all about creating a logically centralized program to control the behaviour of an entire network. This will make networks easier to configure, manage and debug. We are doing some cool stuff here at TIM Group (at least from my stand point), but I am keen to see if we can toward this as a goal.

Waseem Taj: Alerting & What science tells us about information infrastructure

At the open space session on alerting, there was a good discussion on adding context to the alert. One of the attendee mentioned that each of the alert they get has a link to a page that describes the likely business impact of the alert (why we think it is worth getting someone out of the bed at 3am), a run book with typical steps to take and the escalation path. We have already started on the path of documenting how to respond to nagios alerts, I believe expanding it to include the perceived ‘business impact of the alert’ and integration with nagios will be most helpful in moments of crisis in the middle of night when the brain just does not want to cooperate.

The talk by Mark Burgress on ‘What science tells us about information infrastructure’ indeed had the intended impact on me, i.e. I will certainly be reading his new book on the subject.

You can find all the Talks and Ignites video on the DevOpsDays site.

High Availability Scheduling with Open Source and Glue

We’re interested in community feedback on how to implement scheduling. TIM Group has a rather large number of periodically recurring tasks (report generation, statistics calculation, close price processing, and so on). Historically, we have used a home grown cyclic processing framework to schedule these tasks to run at the appropriate times. However, this framework is showing its age, and its deep integration with our TIM Ideas and TIM Funds application servers doesn’t work well with our strategy of many new and small applications. Looking at the design of a new application with a large number of scheduled tasks, I couldn’t see it working well within the old infrastructure. We need a new solution.

To provide a better experience than our existing solution, we have a number of requirements:

  • Reliability, resiliency, and high availability: we can’t afford to have jobs missed, we don’t want a single point of failure, and we want to be able to restart a scheduling server without affecting running jobs.
  • Throttling and batching: there are a number of scheduled tasks that perform a large amount of calculations. We don’t want all of these executing at once, as it can overwhelm our calculation infrastructure.
  • Alerting and monitoring: we need to have a historical log of tasks that have been previously processed, and we need to be alerted when there are problems (jobs not completing successfully, jobs taking too long, and so on). We have an existing infrastructure for monitoring and alerting based on Logstash, Graphite, Elastic Search, Kibana and Nagios, so it would be nice if the chosen solution can easily integrate with those tools.
  • Price: we’re happy to pay for commercial software, but obviously don’t want to pay over-the-top for enterprise features we don’t need.
  • Simple and maintainable: our operations staff need to be able to understand the tool to support and maintain it, and our users must be able to configure the schedule. If it can fit in with our existing operating systems and deployment tools (MCollective and Puppet running on Linux), all the better.

There is a large amount of job scheduling software out there. What to choose? It is disappointing to see that most of the commercial scheduling software hidden behind brochure-ware websites and “call for a quote” links. This makes it very difficult to evaluate in the context of our requirements. There was lots of interesting prospects in the open-source arena, though it is difficult finding something that would easily cover all our requirements:

  • Cron is a favourite for a lot of us because of its simplicity and ubiquity, but it doesn’t have high availability out of the box.
  • Jenkins is also considered for use as a scheduler, as we have a large build infrastructure based on Jenkins, but once again there is no simple way to make it highly available.
  • Chronos looked very interesting, but we’re worried by its apparent complexity – it would be difficult for our operations staff to maintain, and it would introduce a large number of new technologies we’d have to learn.
  • Resque looked like very close match to our requirements, but there is a small concern over it’s HA implementation (in our investigation, it took 3 minutes for fail-over to occur, during which some jobs might be missed)
  • There were many others we looked at, but were rejected because they appeared overly complex, unstable or unmaintained.

In the end, Tom Anderson recognised a simpler alternative: use cron for scheduling, with rcron and keepalived for high availability and fail-over, along with MCollective, RabbitMQ and curl for job distribution. Everything is connected with some very simple glue adaptors written in Bash and Ruby script (which also hook in with our monitoring and alerting).

Schmeduler Architecture

This architecture§ fulfils pretty much most of our requirements:

  • Reliability, resiliency, and high availability: Cron server does no work locally, and so is effectively a very small server instance that offloads all of its work ensure that it is much more stable. rcron and keepalived provide high availability.
  • Throttling and batching: available through the use of queues and MCollective
  • Alerting and monitoring: available through the adaptor scripts
  • Price: software is free and open source, though there is a operations and development maintenance cost to the adaptor code.
  • Simple and maintainable: except for rcron, all these tools are already in use within TIM Group and well understood, so there is a very low learning curve.

We’re still investigating the viability of this architecture. Has anyone else had success with this type of scheme? What are the gotchas?

§Yes, the actor in that diagram is actually Cueball from xkcd.

The Summit is Just a Halfway Point

(Title is originally a quote from Ed Viesturs.)

This past week, TIM Group held its Global Summit, where we had nearly all of our Technology, Product, and Sales folks under the same roof in London. For those who aren’t aware, we are quite globally spread. We have technologists in both London and Boston offices (as well as remote from elsewhere in the UK and France), product managers in our NYC/London/Boston offices, and sales/client services folks across those offices as well as Hong Kong and Toronto. This summit brings together teams and departments that haven’t been face-to-face in many months — some teams haven’t been in the same place since the previous summit.

On the Monday, we got together for an unconference (in the Open Spaces format.) This was a great way to kick off the week. We were able to quickly organize ourselves, setup an itinerary for the day, and get to it. We discussed what people truly thought were the most important topics. We had sessions on everything from “Where does our (product) learning go?” (about cataloging the knowledge we gain about our new products), “Deployment next” (about where we take our current infrastructure next), and many other topics across development, product management, DevOps, QA, and beyond.

For the more technically-focused of the group, the rest of the week was filled with self-organized projects. Some of these were devised in the weeks leading to the Summit — some were proposed on the day by attendees. There were so many awesome projects to be worked on last week. We had ones that ranged from investigating our company’s current technical problems (like HA task scheduling), creating tools to solve some of our simple problems (like an online Niko-niko tracker), or solve some bigger-than-just-us problems (like Intellij remote collaboration.) You should expect to see us share more about these projects on this blog. Stay tuned!

To be clear, it wasn’t all fun. We also had a football game, beer tasting, and Brick Lane dinners. Altogether, this was a great opportunity to re-establish the bonds between our teams. While there are many benefits to having distributed teams as we do, there are many challenges as well. We do many things to work past these challenges, and our Global Summit is one of the shining examples. Getting everyone face-to-face builds trust and shared experiences which helps fuels the collaboration when everyone returns to their home locations.

Drawing back to the title, I am optimistic that our teams will build on top of the code, experiences, and collaboration of the summit. We will move forward with a clear view from the Summit, and be able to move into 2014 with more great things in the form of new products, services, or even just great blog posts.

Bring the outside in: gov.uk

Recently, Jake Benilov from the Government Digital Service (GDS) team visited the TIM Group office in London. His presentation on “7 types of feedback that helped make GOV.UK awesome” was a very interesting tour of the key success factors in building the UK central government’s publishing portal.

The talk filled the whole 90 minutes time slot – partly because a lot of ground was covered, but also because we asked many questions in between. One aspect that sparked discussions even in the following days was the definition of “DevOps” in general and the question whether developers should have full access to production systems in particular. The latter seems to work well for the GDS team but is not current practice at TIM Group.

If you want to know more, please head over to Jake’s blog entry explaining the seven types of feedback and/or view the slides of the talk.

Dip Your Toe In Open (Source) Waters

One of the qualities that TIM Group look for when filling vacancies is an interest in contributing to open source projects. We think that when a candidate gets involved in open source, it indicates a passion for software development. If you’re like me, at some point, you have wanted to join a project. Perhaps you wanted to improve your skills, try a different technology, or brighten up your CV. But alas, you didn’t know the best way to get started.

I am trying to provide that exact opportunity, in conjunction with RecWorks’ “Meet-A-Project” scheme. I have an open source project, called Mutability Detector, with issues and features waiting to be completed, specifically earmarked for newcomers to the project. I promise a helpful, friendly environment, in which to dip your toe in open (source) waters.

If you want to know more details, head over to the project blog for a description on the how and why of getting involved.