Summer of Sockets Part 1: Playing With ZeroMQ and NodeJS

T

he last few months I've spent working distributed systems for web and tel-com systems. Web applications are notorious for being monolithic, tightly coupled pieces of software under a single code base. This just fine when you application really just needs to work within itself. When your application need to share data with other applications, possibly written in different languages, frameworks, or even off load long running tasks, things can quickly become a challenge. To rectify such problems, most developers reach for a central message queue like RabbitMQ. Now, RabbitMQ is a great piece of technology, as the tag line says, `It Just Works`. However, it posses a couple of interesting problems that has turned me off to it.

Single point of failure. When the rabbit goes down, the world goes down. RabbitMQ is a broker based system and while you may have decoupled a few things from one application, you usually end up putting all off it on the lofty shoulders of Your Rabbit server. RabbitMQ supports clustering, but that also brings it's own problems and complexities.

Single Point of Failure

The part of a system that, if it fails, the rest of the system ceases to function as expected.

Needless Complexity. RabbitMQ exposes a lot of features, functionality and out of the box awesome. However the first time someone explained how to set up and use RabbitMQ - it seemed like a lot of work to send a message. Run a broker, setup exchanges with queues, configure durability and set routing key headers. Bah! I just want to send a message and receive a message.

Enter ZeroMQ. ZeroMQ removes much of the complexity from RabbitMQ and the Advanced Messaging protocol ( AMQP ). ZeroMQ is a networking library. Unlike RabbitMQ, and other broker based solutions, ZeroMQ doesn't require an additional server to be running to function. It is just another part of your application. There are no brokers, exchanges or queues with ZeroMQ. Instead, it provides a simple socket API and easy ways to connect them in common patterns. Patterns like Pub / Sub, Request / Reply, Push / Pull, etc. In fact it is so easy to build and plug together independent applications, you'll start making up projects to use ZeroMQ. But, first thing is first. We need to get up and running with ZeroMQ. I will be using ZeroMQ 3.1 and NodeJS v0.10.13. Installing ZMQ is a pretty simple download, make && sudo make install.

I Just Want To Send A Message

To kick start the ZeroMQ juices, we're going to set up the simplest scenario. Sending a message to some other application. This is the Push / Pull in the ZMQ world. We are going to start be creating a **PUSH ** socket on a tcp socket address and start sending some message.

// producer.js
var zmq = require( 'zmq' )
  , socket = zmq.socket( 'push' )
  , counter = 0;

// bind to an address and port
socket.bind('tcp://0.0.0.0:9998', function( err ){
    if( err ){
        console.log(err.message);
        process.exit(0);
    }
    // send a message every 350 ms
    setInterval(function(){
        socket.send(counter++ );
    },350);
});

The ZeroMQ bindings make it incredibly simple to get started. In this snippet, all we need to do is used the socket factory function to create a PUSH socket, bind to an open address and start sending data. DONE! Easy, straight to the point, and no more complicated than it needs to be.

I Just Want To Receive A Message

Sending a message is, of course, only half of the picture. We still need to get those messages. To do that, we need to set up a socket on the same address of the opposite type. In this case, it would be a PULL socket.

// consumer.js
var zmq = require( 'zmq' )
  , socket = zmq.socket( 'pull' );

// messages come via events
socket.on('message', function( msg ){
    console.log('got a message!');
    // messages come as buffers
    console.log(msg.toString('utf8'));
});

// connect to original address
socket.connect('tcp://0.0.0.0:9998');

Again, very simple. In fact, the code is nearly identical. However, on the receiver side we set up a PULL socket and connect to the address rather than bind to it. Actually getting messages is as simple as attaching an event handler to the message event on the socket. All of the data will come across as a Buffer object, so you will need to decode it before you can do much with it.

Bind vs. Connect

So you might be wondering what is the difference between binding to and address versus connecting to an address. In general, you should use bind in on the "durable" part of the application. This is the part that is assumed to be the piece that stays up for long periods of time. When you use bind, the address will become occupied. A single address can only have on application bound to it at any given time. Calling bind on an address more than once will result in an error.

Bind Once - Connect Many

Address binding should be reserved for the application that is the most durable and stable piece of the application topology.

In contrast, you should use connect in transient applications that are expected to come on and off line in unpredictable points in time. Connect does not occupy an address meaning you can connect as many different applications as many times as you like. This is one way ZeroMQ allows you to scale up you apps, by connecting as many things as you want to a bound address. In our simple example you can start as many consumer.js applications as you want and they will all start receiving messages in what ZeroMQ calls a "Fairly Queued" manner.

Fair Queuing

Fair queuing isn't really a queue at all. It can better be thought of a round robin distribution of messages to connected clients. This is important to understand when using the PUSH / PULL socket pattern.

  • Not all of the connected clients get all messages
  • Some connected clients may get more messages than others ( late joiners )

If you start 4 consumers ( C1, C2, C3, C4 ), C1 will get the first message, C2 will get the second, C3 will get the third, and so on. If or when one of the clients goes off line, ZeroMQ will distribute the message evenly to the remaining 3. The opposite is true when more clients come one line. Try it! Start up some more consumers. As you spin up more consumers each will start getting message less frequently. The attraction here might be to use something like that as a simple load balancing set up. However, all this does is to distribute work evenly. Busy workers will still continue to get messages.

This simple PUSH / PULL pattern is often referred to as a Pipeline - Data only goes in one direction - from push to pull. You can, however, have applications that have both PUSH & PULL sockets and build up rather complex pipelines of data to create elaborate distributed systems. And that is really what ZeroMQ enables you to do. Build small, simple pieces that can be put together easily in interesting ways.

Devices

In the ZeroMQ world, you will hear people refer to devices. This is another way of saying application. A ZeroMQ device is a fully function application that can both, send, and receive messages. More importantly it sends an receives messages in a reliable pattern. Meaning, how a device delivers messages is well established, leaving what happens to a message up to your application layer. With that in mind, we can create a simple forwarding device that binds on two separate ports and shuttles message between them. This will allow any number of producers to push work to any number of consumers.

// device.js
var zmq = require( 'zmq' )
  , backend = zmq.socket( 'pull' )
  , frontend = zmq.socket( 'push' );

// bind a backend port for producers
backend.bind('tcp://0.0.0.0:9998', function( err ){
    if( err ){
        console.log("Backend socket %s ", err.message);
        process.exit(0);
    }
    backend.on( 'message', function( msg ){
        frontend.send( msg );
    });
});

// bind a front end port for consumers
frontend.bind('tcp://0.0.0.0:9999', function( err ){
    if( err ){
        console.log("Front end socket %s", err.message);
        process.exit();
    }
});

Our forwarding device doesn't do much. Basically we bind two sockets this time. The back-end socket just sends a message to the front-end socket untouched. This keeps in line with the Pipeline pattern as data flows in one direction. Here is what the other two pieces of the puzzle look like.

// multi_producer.js
var zmq = require( 'zmq' )
  , socket = zmq.socket( 'push' )
  , counter = 0;

// bind to an address and port
socket.connect('tcp://0.0.0.0:9998');

// send a message to forwarder
setInterval(function(){
    console.log('seding %d', counter++);
    socket.send(counter );
},150);

Notice our producer connects to the address here instead of binding. Finally the consumer will look like this.

// multi_consumer.js
var zmq = require( 'zmq' )
  , socket = zmq.socket( 'pull' )socket.on('message', function( msg ){
    console.log("got forwarded message %s", msg.toString());
});

// bind to an address and port
socket.connect('tcp://0.0.0.0:9999');

In this situation, we can start as many consumers and producers as we want in any order we want and everything works as expected. However, if the device goes down, messages will be queued up on the producer side until a device comes online or the number of queued messages reaches the High Water mark. At which point, messages will be lost. The forwarder can become the bottle neck and a single point of failure in this set up. That is why it is important to keep any individual device simple and doing as little work as possible.

ZeroMQ makes it very easy to build complex message based systems in a fraction of the time it usually takes when using more traditional solutions like AMQP. Moreover, the even-based, asynchronous nature of NodeJS makes building ZMQ components into your existing application a guilty pleasure. It literally takes a few lines of code to up and running!

The source code for these examples can be found here, in the pipeline playground directory.