Exact Match OR Null Queries With Elasticsearch

E

lasticsearch is an extremely powerful searchable document store. The more you use it, the more you learn about the more realize how deep the rabbit hole of possibility goes. Except for when it comes null values. The Achilles heel of elasticsearch. What it really boils down to is elasticsearch doesn't index null values or fields that are missing from a document. Even if you have set up an index mapping and told elasticsearch about your field.

As you might think it can get a little tricky searching document for a field value or where that field is null / missing - especially when combined with other field queries. For example lets say I have some documents that looks something like this

[{
  "name": "bill"
, "age": 25
, "fav_color": "blue"
}, {
  "name": "sam"
, "age": "30"
, "fav_color": null
}, {
  "name": "fred"
, "age": "21"
, "fav_color": "red"
}]

If I want to find all documents where age greater than or equal to 25 and fav_color is blue or null you could think of this in terms of SQL

SQL ˈsē-kwəl -n --noun

  1. domain-specific language used in programming and designed for managing data held in a relational database management system
  2. the next installment (as of a speech or story); especially : a literary, cinematic, or televised work continuing the course of a story begun in a preceding one
SELECT 
 id
, name
, age
, fav_color
FROM
  users
WHERE
  age > 25
AND
  (fav_color = blue OR fav_color IS NULL)

You could do this with bool queries, but it gets a little complex - even for small queries.

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [{
            "range": {
             "age": {
               "gte": 25
             }
            }
          }, {
            "bool": {
             "should": [{
               "bool": {
                "must": {
                  "term": {
                   "fav_color": "blue"
                  }
                }
               }
             }, {
               "bool": {
                 "must_not": {
                   "exist": "fav_color"
                 }
              }
             }]
            }
          }]
        }
      }
    }
  }
}

This style of queries is really nice if you want to create a generic query builder for elasticsearch as everything can be reduced to a bool query at the end of the day. But elasticsearch has a short cut for this with the dis_max query which allows you to combine queries. When the query uses the same field, it is OR'd together, and you can even specify tie breaker values if you need to denote which values are more important. So we could combine the above like so

{
  "dis_max": {
    "queries": [{
      "term": {
        "fav_color": "blue"
      }
    }, {
      "bool": {
        "must_not": {
          "exists": "fav_color"
        }
      }
    }]
  }
}

Unfortunately, there is not a missing query type, so we still have to use a bool with a must_not clause. But, now we can replace most of the nested bool queries with the dis_max query and elasticsearch will do the work of translating it into a bunch of bool queries for us.

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [{
            "range": {
              "age": {
                "gte": 25
              }
            }
          }, {
            "dis_max": {
              "queries": [{
                "term":{
                  "fav_color":"blue"
                }
              }, {
                "bool": {
                  "must_not": {
                    "exists": "fav_color"
                  }
                }
              }]
            }
          }]
        }
      }
    }
  }
}