Is the missing value option officially supported in Date Range aggregations?

MattQuinn4 · March 30, 2016, 10:09pm

Terms and Histogram aggregations support a missing value option, which will default any missing or null values to a specified value. But the documentation does not mention this option for Date Range aggregations.

It looks like support for this feature was added to several aggregations in this pull request, but not Date Range aggregations:

Eventhough the docs do not mention support for it, using this option in a date range agg does seem to work. Is this just not officially supported for Date Range? Is this option safe to use?

I have the following query, which returns any docs with a null user_date_of_birth in the 'Unknown' bucket:

Request:
GET myindex/user/_search
{
  "aggs": {
    "age_groups": {
      "range": {
        "field": "user_date_of_birth",
        "missing": "1899-12-31",
        "ranges": [
          {
            "key": "Unknown", 
            "to": "1900-01-01"
          },
          {
            "from": "1950-01-01"
          }
        ]
      },
      "aggs": {
        "age_group_count": {
          "value_count": {
            "field": "user_id"
          }
        }
      }
    }
  }
}

Response:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user_id": 2,
          "user_date_of_birth": null
        }
      },
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user_id": 1,
          "user_date_of_birth": "1950-01-02"
        }
      }
    ]
  },
  "aggregations": {
    "age_groups": {
      "buckets": [
        {
          "key": "Unknown",
          "to": -2208988800000,
          "to_as_string": "1900-01-01T00:00:00.000Z",
          "doc_count": 1,
          "age_group_count": {
            "value": 1
          }
        },
        {
          "key": "1950-01-01T00:00:00.000Z-*",
          "from": -631152000000,
          "from_as_string": "1950-01-01T00:00:00.000Z",
          "doc_count": 1,
          "age_group_count": {
            "value": 1
          }
        }
      ]
    }
  }
}

jpountz · March 31, 2016, 8:13am

In my opinion the missing option is mostly useful on numeric metric aggregations (stats, percentiles, etc.). We also added support for it to the terms aggregation since having a bucket for missing values was a feature that had been requested for a long time. I am less sure about use-cases for bucket aggregations since the missing value would be put in the same bucket as other values. For instance, I don't think the missing option is used a lot on histograms.

In your case, you seem to want to have a bucket that only contains missing values. Maybe it would be better to use the missing aggregation for this?

MattQuinn4 · March 31, 2016, 1:07pm

Thanks for the reply. That was just a simple example. In my real use case I do want missing values to be grouped in the same bucket as other values.

I'm just curious as to why the missing value option works for range aggregations but it is not documented as being supported.

Carmen · April 1, 2016, 4:05pm

I'd also love clarification on whether this is an actual safe-to-use feature in range aggregations. When bucketing by date ranges, it'd be great to be able to just specify a bucket for documents where that value is null.

jpountz · April 4, 2016, 6:50am

What is your real use-case? Documenting this feature would mean that we need to commit to it, so we want to make sure it would be helpful. Having an example use-case would also help give a good example in the docs.

MattQuinn4 · April 4, 2016, 7:03pm

In my use case I want to group clients by age group. Any date of birth less than 1900 or a null/missing date of birth I want to fall into an 'Other' bucket.

Request:
GET myindex/user/_search
{
  "aggs": {
    "age_groups": {
      "date_range": {
        "field": "user_date_of_birth",
        "missing": "1899-12-31",
        "ranges": [
          {
              "key": "Generation Z",
              "from": "2000"
            },
            {
              "key": "Generation Y",
              "from": "1980",
              "to": "2000"
            },
            {
              "key": "Generation X",
              "from": "1965",
              "to": "1980"
            },
            {
              "key": "Baby Boomer",
              "from": "1946",
              "to": "1965"
            },
            {
              "key": "Silent Generation",
              "from": "1925",
              "to": "1946"
            },
            {
              "key": "Greatest Generation",
              "from": "1900",
              "to": "1925"
            },
            {
              "key": "Other",
              "to": "1900"
            }
        ]
      }
    }
  }
}

Response:
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user_id": 2,
          "user_date_of_birth": "1890-01-01"
        }
      },
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user_id": 1,
          "user_date_of_birth": "1960-01-01"
        }
      },
      {
        "_index": "myindex",
        "_type": "user",
        "_id": "3",
        "_score": 1,
        "_source": {
          "user_id": 3
        }
      }
    ]
  },
  "aggregations": {
    "age_groups": {
      "buckets": [
        {
          "key": "Other",
          "to": -2208988800000,
          "to_as_string": "1900-01-01T00:00:00.000Z",
          "doc_count": 2
        },
        {
          "key": "Greatest Generation",
          "from": -2208988800000,
          "from_as_string": "1900-01-01T00:00:00.000Z",
          "to": -1420070400000,
          "to_as_string": "1925-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Silent Generation",
          "from": -1420070400000,
          "from_as_string": "1925-01-01T00:00:00.000Z",
          "to": -757382400000,
          "to_as_string": "1946-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Baby Boomer",
          "from": -757382400000,
          "from_as_string": "1946-01-01T00:00:00.000Z",
          "to": -157766400000,
          "to_as_string": "1965-01-01T00:00:00.000Z",
          "doc_count": 1
        },
        {
          "key": "Generation X",
          "from": -157766400000,
          "from_as_string": "1965-01-01T00:00:00.000Z",
          "to": 315532800000,
          "to_as_string": "1980-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Generation Y",
          "from": 315532800000,
          "from_as_string": "1980-01-01T00:00:00.000Z",
          "to": 946684800000,
          "to_as_string": "2000-01-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "Generation Z",
          "from": 946684800000,
          "from_as_string": "2000-01-01T00:00:00.000Z",
          "doc_count": 0
        }
      ]
    }
  }
}

jpountz · April 7, 2016, 3:52pm

Thank you. I opened https://github.com/elastic/elasticsearch/issues/17597.

MattQuinn4 · April 7, 2016, 3:53pm

Awesome, thanks.

Topic		Replies	Views
Missing Value option NEST Elasticsearch	1	436	July 3, 2018
Is a way : to take on acount missing value buckets with Rollup search Elasticsearch	3	567	December 3, 2018
How to return whole date range in Date Histogram aggegation Elasticsearch	1	570	February 26, 2018
Aggregation retuns null buckets Elasticsearch	10	1383	July 9, 2020
Suitability of Aggregation framework for traditional OLAP style analytics Elasticsearch	1	360	July 6, 2017

Is the missing value option officially supported in Date Range aggregations?

Related topics