Missing results in aggregations

I have raspberry pis sending temperature data to elasticsearch. Each host has two temperature sensors (high and low). I have a query that groups by host and by sensor into four, 6-hour periods (morning, afternoon, evening, night). The idea is to average 30-days of these buckets to get the 30-day average temperature for each part of the day. Then I compare this to the last bucket (this morning for example) and I can see how this morning's temperature compares to the 30-day average morning temperature.

The problem is if a host goes offline for longer than 6 hours, it will not make a bucket for that host. The host could be up for 30 days, down for 6 hours in between, and my search results simply omit the entire host bucket just because one little interval bucket has no data.

If I have 29 days of data, that's good enough. I don't want my entire dataset to be ignored just because I'm missing 1/30 of my data.

Here's my search query, perhaps there's a better way?

query_range = 30
query_payload ={"from": 0, "size": 1000,
                "aggs":{
                    "hosts": {
                        "terms": { "field": "host" },
                        "aggs":{
                            "sensors": {
                                "terms": { "field": "sensor" },
                                "aggs":{
                                    "periods": {
                                        "date_histogram": {
                                            "field": "@timestamp",
                                            "interval": "6h"
                                        },
                                        "aggs":{
                                            "f": {"avg": { "field": "tempf" }},
                                            "c": {"avg": { "field": "tempc" }}}}}}}}},
                "query": {
                    "bool": {
                        "must": [{}],
                        "filter": {
                            "range": {
                                "@timestamp": {
                                   "gte": "now-%sd" % query_range,
                                   "lte": "now-6h"}}}}}}

Turns out it was just coincidence that the hosts omitted also had missing data. The root of my problem was that my "size" key needed to be in the host aggregation and not at the top level.

Problem solved.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.