Add Search to Your Ghost Blog With ElasticSearch

O

one of the things that makes ghost great is it's almost forced simplicity. However, with that comes a bit of rigidity. If you want something more than the provided simplicity, you might find your self pulling your hair out. Search is one of the thing that is lacking from the default ghost set up. The last incarnation of my blog had everything indexed in a Xapian search index. I really wanted to bring back a local search index. Luckily, in the 0.5.x series, they pulled some string so that you can use ghost as a plain npm module. This means you can build your own app around ghost. This should make search possible.

Xapian is a search library that you basically just import into your code and use like any other. Which was nice, but  the index chews up local disk space, and that doesn't really fly on shared hosting providers. So I settled on elastic search because most clients talk over http, and there are plenty of hosted solutions out there.

So the first thing we need to do is set up ghost. You can't really do anything without first configuring the ghost module. It's configuration is still pretty rigid, you basically pass it an object which has a path to a config file. Not ideal, but we can work with that. At some point ghost is an express server, and you can effectively chain express apps together. So what we want to do is make our own express app, configure ghost and chain it off our our own server.

Configuring Ghost

'use strict'

var express = require('express')
var app = express()

ghost({
  config: path.join(__dirname, config.js)
}).then(function(server) {
  // ghost must be configured before you can require any models.
  posts = require('ghost/core/server/models/post')
  d.run(function() {
    logger.info('subdir: %s', server.config.paths.subdir)
    app.use(server.config.paths.subdir, server.rootApp)
    server.start(app)
  })
})

This is pretty much it takes to get your own express instance  wrapped around ghost. From here we can add any middleware, plugins and routes to our app, and they will take priority over what is provided by the ghost instance.

Search Endpoint

Next we need to set up an endpoint to handle searching through the index. I won't go into detail about indexing documents in elasticsearch here. However, my index has the full content, tags, title, summary and the url of every post. There was a name collision with a field in elastic search with tags, so that field is actually named tagged in my index.

I want to get partial word matches a across the title, tags and in the content, giving priority to the title and tags. The query object for elastic search for that looks like this.

Search Query

'use strict'
var searchquery = {
  query: {
    filtered: {
      query: {
        multi_match: {
          query: <TERM>
        , analyzer: 'standard'
        , type: 'phrase'
        , slop: 1
        , fuzziness: 'AUTO'
        , fields: ['title^5', 'tagged^3', 'content^1']
        , tie_breaker: 0.3
        , operator: 'and'
        }
      }
    }
  }
}

A fuzzy, sloppy multi_match search. Perfect.

Define An API Resource

I've been working on a side project tastypie which is a port of a popular Django project of the same name. This was a perfect opportunity to put it into practice.

Tasypie is all class based, so everything is extensible. A resource represents a finite data source. It defines a set of fields that describe the data, maps HTTP methds to functions you define and handles serialization / deserialization. Out of the box, the base resource is really bare bones. It is mostly empty methods that you need to define. For search, all I care about is GET request. It would look like this.

'use strict'
/**
 * @module dependant-core/resources/v1/search
 */
const Resource = require('tastypie/lib/resource')
const Class = require('tastypie/lib/class')
const ESClient = require('elasticsearchclient')


// set up the search client
const client = new ESClient({
  host: 'localhost'
, port: 9200
})

/**
 * An api resource to allow for full text searching across posts
 * @constructor
 * @alias dependant-core/resources/v1/search
 */
const SearchResource = new Class({
  inherits: Resource
, options: {
    collection: 'data'
  }

, fields: {
    url: {type: 'ApiField'}
  , title: {type: 'ApiField'}
  , tags: {type: 'ApiField', attribute: 'tagged'}
  }

, _get_list: function(bundle, cb) {
    client.search('idx-post', 'post', bundle.res.term, (err, data) => {
      var data = JSON.parse(data)
      data = data.hits.hits
        .sort(function(a, b) {
          return a._score < b._score ? 1 : -1
        })
        .map(function(hit) { return hit._source })
      cb(err, data)
    })
  }
})

That it! We really just need to define the internal _get_list method to return a list of objects. Tastypie, does the rest. In this situation, _get_list executes the search query, sorts the results on score, and returns it. And the set of fields we want to return. You notice the tags field has an attribute property. This tells the resource to map the data during the serialization/deserialization cycle. This way it will the data will look the way I want it too.

Register The API & Endpoint

Before you pass your express app to ghost, pass it to an tastypie API Namespace, and register your search resource.

v1 = new Api('api/v1', app );
v1.register('search', new SearchResource());

Done! We have a fully functional REST Endpoint - /api/v1/search for searching that deals with HTTP Method mapping, data preparation, and serialization to json, jsonp, and xml, and its all apart of the same app as the ghost blog.

We can use the query string parameter term to get data.

curl -H "Accept: application/json" "http://localhost:2368/api/v1/search?term=node" | python -m json.tool

And we'll get back something like this:

{
  meta: {
    count: 10
  , limit: 25
  , next: null
  , offset: 0
  , previous: null
  }
, data: [
    {
      tags: [
        'node.js'
      , 'child process'
      , 'fs'
      , 'watcher'
      ]
    , title: 'Create A File Watcher With NodeJS and Child Process'
    , url: '/2012/04/15/create-a-file-watcher-with-nodejs-and-child-proces/'
    }
  , {
      tags: [
        'node.js'
      , 'blog'
      , 'ghost'
      ]
    , title: 'Rebuilding The Codedependant Blog'
    , url: '/2015/01/12/rebuilding-the-codedependant-blog/'
    }
  , ...
  ]
}

Notice that we only get back title, tags and url. This is because because those are the only fields we've defined on the resource. pretty simple.

Complete Source

You can find the source for this project on my bitbucket. Here is the source files for the server and search resource.

var path    = require( 'path' )
  , express = require( 'express' )
  , domain  = require( 'domain' )
  , ghost   = require( 'ghost' )
  , logger  = require( 'dependant-log' )
  , conf    = require( 'dependant-conf' )
  , Api     = require( 'tastypie/lib/api' )
  , d       = domain.create()
  , gconfig = require( 'ghost/core/server/config' )
  , app     = express()
  , SearchResource = require('dependant-core/resources/v1/search')
  , posts   
  , v1
  ;


v1 = new Api('api/v1', app )
v1.register('search', new SearchResource())

ghost({
	config:path.join(__dirname, conf.get('conf'))
}).then(function( server ){
    d.run(function(){
  		logger.info('subdir: %s', server.config.paths.subdir )
  		app.use(server.config.paths.subdir, server.rootApp);
  		server.start( app );
    });
});
'use strict'

/**
 * @module dependant-core/resources/v1/search
 * @author Eric Satterwhite
 * @since 0.0.1
 * @requires tastypie/lib/resource
 */
const Resource = require('tastypie/lib/resource')
const Class = require('tastypie/lib/class')
const conf = require('dependant-conf')
const ESClient = require('elasticsearchclient')



const client = new ESClient(conf.get('search'))

/**
 * An api resource to allow for full text searching across posts
 * @constructor
 * @alias dependant-core/resources/v1/search
 */
const SearchResource = new Class({
  inherits: Resource
, options: {
    collection: 'data'
  }

  // only return url, title and tags in response
  // also remap tagged to tags
, fields: {
    url: {type: 'ApiField'}
  , title: {type: 'ApiField'}
  , tags: {type: 'ApiField', attribute: 'tagged'}
  }

  // internal method that is responsible for getting
  // an array of objects
, _get_list: function(bundle, cb) {

    // bundle has express req, res, next
    var qryObj = {
      query: {
        filtered: {
          query: {
            multi_match: {
              query: bundle.req.query.term
            , analyzer: 'standard'
            , type: 'phrase'
            , slop: 1
            , fuzziness: 'AUTO'
            , fields: ['title^5', 'tagged^3', 'content^1']
            , tie_breaker: 0.3
            , operator: 'and'
            }
          }
        }
      }
    }

    client.search('idx-post', 'post', qryObj, function(err, data) {
      data = JSON.parse(data)
      data = data.hits.hits
        .sort(function(a, b) {
          return a._score < b._score ? 1 : -1
        })
        .map(function(hit) { return hit._source })
      cb(err, data)
    })
  }
})
module.exports = SearchResource