This week we start deploying version 4.6.16 of Flex. If you haven’t looked at the release notes, check them out here.
While the release notes may not signal a big release, 4.6.16 introduces two new technologies into our stack: RabbitMQ and Riak. Another notable feature of this release is that it requires 13.04 of Ubuntu. More on that later.
RabbitMQ replaces our old ActiveMQ JMS server for asynchronous messaging. This was a major undertaking because it meant getting rid of JMS as a standard for messaging in favor of AMQP, a protocol developed by the financial industry for high speed trading, etc.
We use asynchronous messaging throughout Flex. We use it for Quickbooks Integration, pushing data to Production Exchange, eager availability recalculations, search indexing, and many other applications. Switching to RabbitMQ makes this messaging faster and more reliable – and is a necessary step for our high availability architecture where asynchronous message processing will be handled by dedicated servers.
Roger Diller did most of the work on our new Riak search implementation and he did a great write up of Riak earlier today. I’d only add that Riak is perhaps the best of the so called NoSQL databases that have become popular in recent years as a way around the scalability problems of relational databases.
The new version of Flex uses Riak instead of Lucene to process project element searches and store the indexes. Over time our use of Riak will likely increase. We’re kicking around the idea of using Riak for document storage and will likely leverage it for inventory and contact search before long.
As of this release, we will require all servers to run Ubuntu 13.04. For customers that run cloud based instances, there’s nothing to worry about. We’re already running 13.04 in our Sydney, Chicago, Montreal and Roubaix clusters.
All self-hosted customers will have to upgrade to 13.04 before 4.6.16 can be deployed. If you run Flex on a self hosted system, we’ll contact you shortly to schedule a maintenance window to perform the Ubuntu upgrade and install Flex 4.6.16. Part of this process will include upgrading MySQL, installing Riak and RabbitMQ – in addition to basic upgrade process. We’ll also archive your Flex data and configuration prior to any OS upgrade just in case there’s a problem that may require reinstalling the operating system from scratch.
Home Grown Fault Tolerance
Something we’d like system administrators to start thinking about is how they might want to deploy a high availability system on site once our software supports it. To get started, you’d need 2 or more servers – and some method of distributing load across the different servers and redirecting traffic when a server fails.
The most common way of doing this is with a hardware load balancer. Flex won’t require any particular kind of load balancer, but you’ll get better performance if the load balancer supports sticky sessions or server affinity that can be tied to configurable http headers or cookies (and most do). Flex doesn’t and won’t use volatile user sessions (which means a server crash won’t interrupt user logins), but it will use a hybrid local/remote caching system – and sticky sessions will increase the efficiency of this cache – and overall performance by reducing cache misses.
You could also use a software load balancer if you have an old server lying around. We recommend HA Proxy if you go that route.
A poor man’s load balancing would be to simply use round robin DNS. This wouldn’t make server crashes as painless for users as a true load balancer, but it would help break up your user load across multiple servers.
You’ll also want to make sure any servers you provision for a multi-server Flex install have two network interfaces as the second interface will be used to form a private network between the servers for Riak synchronization and database IO.
And, if you wanted to be really cool, you could deploy a four server Flex configuration – two database servers configured as master and slave – with two applications servers in front of those – with load balancers in front of those. This configuration would closely mirror the hardware configuration we’ll use in our next generation cloud.
If all this sounds daunting or like overkill, don’t worry. Even once Flex has been modified to run on a multi-server high availability architecture, it will still run just fine on a single server, just like it always has.