Query to get servers that have increased storage usage by X% during last X months?

Is there any way to get a list of servers that have increased storage usage by 20% or X% in last 3 months or X months?

I am indexing daily server storage usage as numeric percentage_used (ej. 80.55) and want to get a list of servers where storage usage is growing by X during Y period.

Hi @agonzalez,

in general, this can be done by composing a few pipeline aggregations. Unfortunately Kibana only supports a too limited set right now, which makes visualization impossible. But you could easily use something like the following query in a custom script that queries Elasticsearch:

GET logstash-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-90d",
              "lte": "now",
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "_source": {
    "excludes": []
  },
  "aggs": {
    "hosts": {
      "terms": {
        "field": "hostname",
        "order": {
          "_term": "desc"
        }
      },
      "aggs": {
        "dates": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "90d",
            "time_zone": "Europe/Berlin"
          },
          "aggs": {
            "percentage-used-difference": {
              "serial_diff": {
                "buckets_path": "percentage-used"
              }
            },
            "percentage-used": {
              "max": {
                "field": "percentage_used"
              }
            },
            "bucket-filter": {
              "bucket_selector": {
                "buckets_path": {
                  "percentage_used_difference": "percentage-used-difference"
                },
                "script": "(params?.percentage_used_difference ?: 0) > 20"
              }
            }
          }
        },
        "bucket-filter": {
          "bucket_selector": {
            "buckets_path": {
              "date_count": "dates._bucket_count"
            },
            "script": "params.date_count > 0"
          }
        }
      }
    }
  }
}

Replace the 90d with your time interval and the 20 with your minimum difference.

Thanks, i replaced hostname with my field asset_tag and tried to run and i got:

"Fielddata is disabled on text fields by default. Set fielddata=true on [asset_tag] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

I read in some places the usage of fielddata is not recommended, is this required? how do i enable it for an existing text field?

You're probably indexing the asset_tag as a string, which means it is being analyzed. Since 5.0 there is a dedicated keyword type to store string data without running full-text analysis on them, which allows for efficient aggregation and exact matching. You could either change your mapping of the asset_tag field to type keyword or use the default subfield asset_tag.keyword in your query.

Right asset_tag is being indexed as string by NEST, because in c# it is a string property. How can I tell NEST to index it as keyworkd?

I used asset_tag.keyword and dont get error but just get this. I even lower from 20 to 1 to get more results but i dont see any result.

{
"took": 31,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"aggregations": {
"hosts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}

So your index does not have asset_tag defined as a multi-field by default. In order to change using which type Elasticsearch indexes a certain field, a mapping must be defined before any documents are created in the index. For an assumed index named my_index and document type my_type the following request would create a static mapping with the field asset_tag defined as being of type keyword:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "asset_tag": {
          "type":  "keyword"
        }
      }
    }
  }
}

Then you should be able to use the asset_tag field in the Terms aggregation. If you're using a rolling index naming schema (like logstash with logstash-[YYYY.MM.DD] for example) an index template would allow you to generalize the above definition.

but i can see the asset_tag.keyword on the index pattern, is not that enough?

You're right, that should be enough. Have you tried lowering (or removing) the thresholds in the bucket-filter aggregations to check whether there are results before the filter are applied?

yes i changed to this and same result :frowning:

GET /ueb-metrics-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-90d",
              "lte": "now",
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "_source": {
    "excludes": []
  },
  "aggs": {
    "hosts": {
      "terms": {
        "field": "asset_tag.keyword",
        "order": {
          "_term": "desc"
        }
      },
      "aggs": {
        "dates": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "90d",
            "time_zone": "Europe/Berlin"
          },
          "aggs": {
            "percentage-used-difference": {
              "serial_diff": {
                "buckets_path": "percentage-used"
              }
            },
            "percentage-used": {
              "max": {
                "field": "percentage_used"
              }
            },
            "bucket-filter": {
              "bucket_selector": {
                "buckets_path": {
                  "percentage_used_difference": "percentage-used-difference"
                },
                "script": "(params?.percentage_used_difference ?: 0) > 1"
              }
            }
          }
        },
        "bucket-filter": {
          "bucket_selector": {
            "buckets_path": {
              "date_count": "dates._bucket_count"
            },
            "script": "params.date_count > 0"
          }
        }
      }
    }
  }
}

How about removing the two bucket-filter aggregations completely? Are there any buckets in the result?

no changes. like this?

GET /ueb-metrics-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-90d",
              "lte": "now",
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "_source": {
    "excludes": []
  },
  "aggs": {
    "hosts": {
      "terms": {
        "field": "asset_tag.keyword",
        "order": {
          "_term": "desc"
        }
      },
      "aggs": {
        "dates": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "90d",
            "time_zone": "Europe/Berlin"
          },
          "aggs": {
            "percentage-used-difference": {
              "serial_diff": {
                "buckets_path": "percentage-used"
              }
            },
            "percentage-used": {
              "max": {
                "field": "percentage_used"
              }
            }
          }
        }
      }
    }
  }
}

Ok, so my guess would be that the terms aggregation yields no buckets

"terms": {
  "field": "asset_tag.keyword",
  "order": {
    "_term": "desc"
  }
},

Could you just run that aggregation alone in the dev tools or in a visualization to confirm that?

I retook this and run that query and no results:

query:

GET /ueb-metrics-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-90d",
              "lte": "now",
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "_source": {
    "excludes": []
  },
  "aggs": {
    "hosts": {
      "terms": {
        "field": "asset_tag.keyword",
        "order": {
          "_term": "desc"
        }
      }     
    }
  }
}

result

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 25,
    "successful": 25,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "hosts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

i can see data/doc in ueb-metrics when browsing in kibana->discover.

Any idea?

if i remove time range i get buckets.

    {
      "range": {
        "@timestamp": {
          "gte": "now-90d",
          "lte": "now",
          "format": "epoch_millis"
        }
      }
    }

So either the documents in the index have @timestamp values outside of that range or they do not have a @timestamp field at all. Can you rule out either of the two possibilities?

Also, make sure the system dates of the cluster nodes are correct. Otherwise the now date expressions might evaluate to unexpected values.

ok, found problem field name is timestamp not @timestamp, but now i get just 2 buckets for the full query where percent_used difference is > 1. I have 5000 servers reporting storage usage and 1 million documents and i dont think just 2 servers have increases 1% in last 90 days.

this is sample document:

{
  "_index": "ueb-metrics-2017.04",
  "_type": "object",
  "_id": "e5304b10-6155-4086-bc0f-96e7a4d57569_capacity_2017.04.25_10.00",
  "_score": null,
  "_source": {
    "type": "capacity",
    "asset_tag": "e5304b10-6155-4086-bc0f-96e7a4d57569",
    "timestamp": "2017-04-25T10:00:14.029811-06:00",
    "capacity": {
      "ir_used": 2,
      "used": 0,
      "available": 6000,
      "percent_used": 0
    }
  },
  "fields": {
    "timestamp": [
      1493136014029
    ]
  },
  "highlight": {
    "type.keyword": [
      "@kibana-highlighted-field@capacity@/kibana-highlighted-field@"
    ],
    "type": [
      "@kibana-highlighted-field@capacity@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1493136014029
  ]
}

this i full query:

GET /ueb-metrics-*/_search
{
  "size": 0,
  "query": {
"bool": {
  "must": [
    {
      "query_string": {
        "query": "type:capacity",
        "analyze_wildcard": true
      }
    },
    {
      "range": {
        "timestamp": {
          "gte": "now-90d",
          "lte": "now",
          "format": "epoch_millis"
        }
      }
    }
  ],
  "must_not": []
}
  },
  "_source": {
"excludes": []
  },
  "aggs": {
"hosts": {
  "terms": {
    "field": "asset_tag.keyword",
    "order": {
      "_term": "desc"
    }
  },
  "aggs": {
    "dates": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "90d",
        "time_zone": "Europe/Berlin"
      },
      "aggs": {
        "percent-used-difference": {
          "serial_diff": {
            "buckets_path": "percent-used"
          }
        },
        "percent-used": {
          "max": {
            "field": "capacity.percent_used"
          }
        },
        "bucket-filter": {
          "bucket_selector": {
            "buckets_path": {
              "percent_used_difference": "percent-used-difference"
            },
            "script": "(params?.percent_used_difference ?: 0) > 1"
          }
        }
      }
    },
    "bucket-filter": {
      "bucket_selector": {
        "buckets_path": {
          "date_count": "dates._bucket_count"
        },
        "script": "params.date_count > 0"
      }
    }
  }
}
  }
}

and result:

{
  "took": 51,
  "timed_out": false,
  "_shards": {
    "total": 25,
    "successful": 25,
    "failed": 0
  },
  "hits": {
    "total": 1130449,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "hosts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 1129222,
      "buckets": [
        {
          "key": "ff764213-0292-46c7-8ff3-63b52baf4c67",
          "doc_count": 45,
          "dates": {
            "buckets": [
              {
                "key_as_string": "2017-04-24T00:00:00.000+02:00",
                "key": 1492984800000,
                "doc_count": 11,
                "percent-used": {
                  "value": 79.71998596191406
                },
                "percent-used-difference": {
                  "value": 8.467666625976562
                }
              }
            ]
          }
        },
        {
          "key": "ff748769-855a-4f6e-9a93-f627d1d66a61",
          "doc_count": 431,
          "dates": {
            "buckets": [
              {
                "key_as_string": "2017-04-24T00:00:00.000+02:00",
                "key": 1492984800000,
                "doc_count": 10,
                "percent-used": {
                  "value": 101.27615356445312
                },
                "percent-used-difference": {
                  "value": 37.369319915771484
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Good to hear that you found that particular problem. :thumbsup: To debug other potential mistakes in the query I would advise to try out the query with smaller intervals than 90 days and to remove the bucket-filter clauses. That way you can attempt to reproduce the numbers manually and narrow down where the mistakes might be.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.