Is the missing value option officially supported in Date Range aggregations?

Terms and Histogram aggregations support a missing value option, which will default any missing or null values to a specified value. But the documentation does not mention this option for Date Range aggregations.

It looks like support for this feature was added to several aggregations in this pull request, but not Date Range aggregations:

Eventhough the docs do not mention support for it, using this option in a date range agg does seem to work. Is this just not officially supported for Date Range? Is this option safe to use?

I have the following query, which returns any docs with a null user_date_of_birth in the 'Unknown' bucket:

Request:
GET myindex/user/_search
{
  "aggs": {
    "age_groups": {
      "range": {
        "field": "user_date_of_birth",
        "missing": "1899-12-31",
        "ranges": [
          {
            "key": "Unknown", 
            "to": "1900-01-01"
          },
          {
            "from": "1950-01-01"
          }
        ]
      },
      "aggs": {
        "age_group_count": {
          "value_count": {
            "field": "user_id"
          }
        }
      }
    }
  }
}

Response:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user_id": 2,
          "user_date_of_birth": null
        }
      },
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user_id": 1,
          "user_date_of_birth": "1950-01-02"
        }
      }
    ]
  },
  "aggregations": {
    "age_groups": {
      "buckets": [
        {
          "key": "Unknown",
          "to": -2208988800000,
          "to_as_string": "1900-01-01T00:00:00.000Z",
          "doc_count": 1,
          "age_group_count": {
            "value": 1
          }
        },
        {
          "key": "1950-01-01T00:00:00.000Z-*",
          "from": -631152000000,
          "from_as_string": "1950-01-01T00:00:00.000Z",
          "doc_count": 1,
          "age_group_count": {
            "value": 1
          }
        }
      ]
    }
  }
}
1 Like

In my opinion the missing option is mostly useful on numeric metric aggregations (stats, percentiles, etc.). We also added support for it to the terms aggregation since having a bucket for missing values was a feature that had been requested for a long time. I am less sure about use-cases for bucket aggregations since the missing value would be put in the same bucket as other values. For instance, I don't think the missing option is used a lot on histograms.

In your case, you seem to want to have a bucket that only contains missing values. Maybe it would be better to use the missing aggregation for this?

Thanks for the reply. That was just a simple example. In my real use case I do want missing values to be grouped in the same bucket as other values.

I'm just curious as to why the missing value option works for range aggregations but it is not documented as being supported.

I'd also love clarification on whether this is an actual safe-to-use feature in range aggregations. When bucketing by date ranges, it'd be great to be able to just specify a bucket for documents where that value is null.

What is your real use-case? Documenting this feature would mean that we need to commit to it, so we want to make sure it would be helpful. Having an example use-case would also help give a good example in the docs.

In my use case I want to group clients by age group. Any date of birth less than 1900 or a null/missing date of birth I want to fall into an 'Other' bucket.

Request:
GET myindex/user/_search
{
  "aggs": {
    "age_groups": {
      "date_range": {
        "field": "user_date_of_birth",
        "missing": "1899-12-31",
        "ranges": [
          {
              "key": "Generation Z",
              "from": "2000"
            },
            {
              "key": "Generation Y",
              "from": "1980",
              "to": "2000"
            },
            {
              "key": "Generation X",
              "from": "1965",
              "to": "1980"
            },
            {
              "key": "Baby Boomer",
              "from": "1946",
              "to": "1965"
            },
            {
              "key": "Silent Generation",
              "from": "1925",
              "to": "1946"
            },
            {
              "key": "Greatest Generation",
              "from": "1900",
              "to": "1925"
            },
            {
              "key": "Other",
              "to": "1900"
            }
        ]
      }
    }
  }
}

Response:
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user_id": 2,
          "user_date_of_birth": "1890-01-01"
        }
      },
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user_id": 1,
          "user_date_of_birth": "1960-01-01"
        }
      },
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "3",
        "_score": 1,
        "_source": {
          "user_id": 3
        }
      }
    ]
  },
  "aggregations": {
    "age_groups": {
      "buckets": [
        {
          "key": "Other",
          "to": -2208988800000,
          "to_as_string": "1900-01-01T00:00:00.000Z",
          "doc_count": 2
        },
        {
          "key": "Greatest Generation",
          "from": -2208988800000,
          "from_as_string": "1900-01-01T00:00:00.000Z",
          "to": -1420070400000,
          "to_as_string": "1925-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Silent Generation",
          "from": -1420070400000,
          "from_as_string": "1925-01-01T00:00:00.000Z",
          "to": -757382400000,
          "to_as_string": "1946-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Baby Boomer",
          "from": -757382400000,
          "from_as_string": "1946-01-01T00:00:00.000Z",
          "to": -157766400000,
          "to_as_string": "1965-01-01T00:00:00.000Z",
          "doc_count": 1
        },
        {
          "key": "Generation X",
          "from": -157766400000,
          "from_as_string": "1965-01-01T00:00:00.000Z",
          "to": 315532800000,
          "to_as_string": "1980-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Generation Y",
          "from": 315532800000,
          "from_as_string": "1980-01-01T00:00:00.000Z",
          "to": 946684800000,
          "to_as_string": "2000-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Generation Z",
          "from": 946684800000,
          "from_as_string": "2000-01-01T00:00:00.000Z",
          "doc_count": 0
        }
      ]
    }
  }
}

Thank you. I opened https://github.com/elastic/elasticsearch/issues/17597.

Awesome, thanks.