I have raspberry pis sending temperature data to elasticsearch. Each host has two temperature sensors (high and low). I have a query that groups by host and by sensor into four, 6-hour periods (morning, afternoon, evening, night). The idea is to average 30-days of these buckets to get the 30-day average temperature for each part of the day. Then I compare this to the last bucket (this morning for example) and I can see how this morning's temperature compares to the 30-day average morning temperature.
The problem is if a host goes offline for longer than 6 hours, it will not make a bucket for that host. The host could be up for 30 days, down for 6 hours in between, and my search results simply omit the entire host bucket just because one little interval bucket has no data.
If I have 29 days of data, that's good enough. I don't want my entire dataset to be ignored just because I'm missing 1/30 of my data.
Here's my search query, perhaps there's a better way?
query_range = 30
query_payload ={"from": 0, "size": 1000,
"aggs":{
"hosts": {
"terms": { "field": "host" },
"aggs":{
"sensors": {
"terms": { "field": "sensor" },
"aggs":{
"periods": {
"date_histogram": {
"field": "@timestamp",
"interval": "6h"
},
"aggs":{
"f": {"avg": { "field": "tempf" }},
"c": {"avg": { "field": "tempc" }}}}}}}}},
"query": {
"bool": {
"must": [{}],
"filter": {
"range": {
"@timestamp": {
"gte": "now-%sd" % query_range,
"lte": "now-6h"}}}}}}