date.getHourOfDay() is giving strange results in aggregation

I am indexing some events and trying to get unique hours but the terms aggregation is giving weird response . I have the following query.

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "City": [
              "Chicago"
            ]
          }
        },
        {
          "range": {
            "eventDate": {
              "gte": "2018-06-22",
              "lte": "2018-06-22"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "Hours": {
      "terms": {
        "script": "doc['eventDate'].date.getHourOfDay()"
      }
    }
  }
}

This query produces following response.

"buckets": [
        {
          "key": "19",
          "doc_count": 12
        },
        {
          "key": "9",
          "doc_count": 7
        },
        {
          "key": "15",
          "doc_count": 4
        },
        {
          "key": "16",
          "doc_count": 4
        },
        {
          "key": "20",
          "doc_count": 4
        },
        {
          "key": "12",
          "doc_count": 2
        },
        {
          "key": "6",
          "doc_count": 2
        },
        {
          "key": "8",
          "doc_count": 2
        },
        {
          "key": "10",
          "doc_count": 1
        },
        {
          "key": "11",
          "doc_count": 1
        }
      ]

Now I changed the range to get the events for past one month

{
"range": {
"eventDate": {
"gte": "2018-05-22",
"lte": "2018-06-22"
}
}
}

and the response I got was

"Hours": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 1319,
      "buckets": [
        {
          "key": "22",
          "doc_count": 805
        },
        {
          "key": "14",
          "doc_count": 370
        },
        {
          "key": "15",
          "doc_count": 250
        },
        {
          "key": "21",
          "doc_count": 248
        },
        {
          "key": "16",
          "doc_count": 195
        },
        {
          "key": "0",
          "doc_count": 191
        },
        {
          "key": "13",
          "doc_count": 176
        },
        {
          "key": "3",
          "doc_count": 168
        },
        {
          "key": "20",
          "doc_count": 159
        },
        {
          "key": "11",
          "doc_count": 148
        }
      ]
    }

As you can see I got buckets with key 6,8,9,10 and 12 in the response of first query but not in the second query which is very strange as documents returned by first query is a small subset of the second query. Is this a bug or am I missing something obvious?

Thanks

By default the terms aggregation returns the 10 most common values (based on document count). The most common hour of days in one date range is most likely not the same as in another date range.

You can use the size parameter to get more buckets from the terms aggregation. In your case there should only be 24 unique values, so you could set size to 24 to get all values:

"terms": {
        "size": 24,
        "script": "doc['eventDate'].date.getHourOfDay()"
      }

Thanks Abdon!! setting the size to 24 fixed the problem of missing time buckets.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.