Exact Match OR Null Queries With Elasticsearch
lasticsearch is an extremely powerful searchable document store. The more you use it, the more you learn about the more realize how deep the rabbit hole of possibility goes. Except for when it comes null
values. The Achilles heel of elasticsearch. What it really boils down to is elasticsearch doesn't index null
values or fields that are missing from a document. Even if you have set up an index mapping and told elasticsearch about your field.
As you might think it can get a little tricky searching document for a field value or where that field is null / missing - especially when combined with other field queries. For example lets say I have some documents that looks something like this
[{
"name": "bill"
, "age": 25
, "fav_color": "blue"
}, {
"name": "sam"
, "age": "30"
, "fav_color": null
}, {
"name": "fred"
, "age": "21"
, "fav_color": "red"
}]
If I want to find all documents where age
greater than or equal to 25
and fav_color
is blue
or null
you could think of this in terms of SQL
SQL ˈsē-kwəl -n --noun
- domain-specific language used in programming and designed for managing data held in a relational database management system
- the next installment (as of a speech or story); especially : a literary, cinematic, or televised work continuing the course of a story begun in a preceding one
SELECT
id
, name
, age
, fav_color
FROM
users
WHERE
age > 25
AND
(fav_color = blue OR fav_color IS NULL)
You could do this with bool queries, but it gets a little complex - even for small queries.
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [{
"range": {
"age": {
"gte": 25
}
}
}, {
"bool": {
"should": [{
"bool": {
"must": {
"term": {
"fav_color": "blue"
}
}
}
}, {
"bool": {
"must_not": {
"exist": "fav_color"
}
}
}]
}
}]
}
}
}
}
}
This style of queries is really nice if you want to create a generic query builder for elasticsearch as everything can be reduced to a bool query at the end of the day. But elasticsearch has a short cut for this with the dis_max query which allows you to combine queries. When the query uses the same field, it is OR
'd together, and you can even specify tie breaker values if you need to denote which values are more important. So we could combine the above like so
{
"dis_max": {
"queries": [{
"term": {
"fav_color": "blue"
}
}, {
"bool": {
"must_not": {
"exists": "fav_color"
}
}
}]
}
}
Unfortunately, there is not a missing
query type, so we still have to use a bool
with a must_not
clause. But, now we can replace most of the nested bool queries with the dis_max
query and elasticsearch will do the work of translating it into a bunch of bool
queries for us.
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [{
"range": {
"age": {
"gte": 25
}
}
}, {
"dis_max": {
"queries": [{
"term":{
"fav_color":"blue"
}
}, {
"bool": {
"must_not": {
"exists": "fav_color"
}
}
}]
}
}]
}
}
}
}
}