Like many busy software shops, we often suffer from the cobbler’s-children-have-no-shoes problem at Flex. Meaning we’re usually so busy working on new features and customer driven work that we neglect internal projects we need to continue doing our work effectively. This often means living with less than ideal solutions to internal problems because the customers don’t see those problems.
The most common way this problem manifests itself at Flex is that common tasks are done manually that could be automated. For example, we used to do all software updates manually, which in the Java space means logging into the server, downloading a new war file and restarting the app server — and this had to be done for every customer.
As the number of customers grew, this process became unwieldy and we wrote crude shell scripts to automate parts of the process, beginning with the basic stop app server, download war, deploy war, start app server process. This saved some time, but that script still had to be manually started for each customer instance. We’re up to 140+ customers now, so that process would no longer work without an unreasonable expenditure of human effort.
Then we added a new script that invoked the old script for every customer on a server. This shaved off some time. When the number of servers grew – we now manage around 18 customer and cloud servers – even this became too cumbersome. We broke down and developed a deployment tool called Cluster Bomb that allows all instances of Flex to be updated with a single command using message queues.
One aspect of our deployment process that’s been lacking is the notification process: the process for informing customers when an update is pending and what changes or fixes are in the update. Chris does the notifications using the same tool we use for doing newsletters and other forms of marketing spam (Oops, I meant opt-in premium content).
This was better than nothing, but it required a close degree of coordination between Chris and the Engineering Team – and as busy as every one is, especially Chris, it’s sometimes hard to make that work. It was also imprecise. It gave customers a vague notion that an update was coming, but no specific warning when the update started and no confirmation when it completed.
In addition to annoying customers, it also generates support tickets, which costs us time and support hours.
Yesterday we broke down and added automated notifications to Cluster Bomb. This meant adding distribution lists to each customer instance configuration, a system for assembling release notes, including email text that includes those release notes. (Our first test of this system was responsible for a brief outage yesterday afternoon.)
From now on, with no coordination with Chris, customers will receive notification emails whenever we start the deployment process and another email when the process completes for their instance. The big change here is that it’s now physically impossible to trigger an update without also triggering notifications. The process also requires us to provide release notes or it won’t permit us to start a deployment. And we’ve looped Twitter into the process and will send a status update to our @frssupport Twitter account when a deployment starts, with the version number and a link to the release notes.
We’re going to seed the notification system by making some logical guesses based on customer email addresses we have on file. This is likely to miss people who’d like to get the notifications and spam people who don’t care about them. I believe Chris is going to do an email blast and ask everyone to provide their preferred distribution list for updates, but just in case, feel free to send an email to firstname.lastname@example.org with your company name and the email addresses you’d like to get notifications (or not). Once that feedback comes in, we’ll fine tune the list.
Throw Away Code
We’ve talked about doing this before. The reason we didn’t, other than the cobbler’s-children-have-no-shoes problem, is that we’re about to radically change our deployment architecture and any work we do on Cluster Bomb will become obsolete once the new architecture goes live. This is why we didn’t go the Full Monty and provide a web based way to subscribe and unsubscribe to notifications. We would also have liked to make pretty HTML release notes and email notifications. Still, high availability Flex, the TrueCLOUD as we call it, is about a year away, and one day’s work is a small price to pay for a year of better communication.