Latest full document in transform

Hi,

I am using elasticsearch, kibana and metricbeat to gather metrics from my systems. I want to build a few Data Table visualisation that have a list of:

  • hosts that are currently at more than 80% RAM
  • hosts that currently have more than 80% of their disk full
  • hosts who's CPU (load average) is greater than the count of the CPUs

From what I understand the best way to do this is to transform the metricbeat-* index to give me the latest document for each value of host.hostname. But I have no idea how to copy the full document into the trasfrom.

This trasform gets the last @timestamp for each host.hostname any help in extending this to incorporate the entire document (ore even a list of fields from the document) would be much appreciated.

PUT _transform/last-heartbeat-per-hostname
{
  "source": {
    "index": [
      "metricbeat-*"
    ]
  },
  "pivot": {
    "group_by": {
      "host.hostname": {
        "terms": {
          "field": "host.hostname"
        }
      }
    },
    "aggregations": {
      "@timestamp": {
        "max": {
          "field": "@timestamp"
        }
      }
    }
  },
  "description": "last-heartbeat-per-hostname",
  "dest": {
    "index": "last-heartbeat-per-hostname"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  }
}

with scripted metric this should be possible, e.g. something like that:

"latest_doc": {
  "scripted_metric": {
    "init_script": "state.timestamp_latest = 0L; state.last_doc = ''",
    "map_script": "def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli(); if (current_date > state.timestamp_latest) {state.timestamp_latest = current_date;state.last_doc = new HashMap(params['_source']);}",
    "combine_script": "return state",
    "reduce_script": "def last_doc = '';def timestamp_latest = 0L; for (s in states) {if (s.timestamp_latest > (timestamp_latest)) {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}} return last_doc"
  }
}

I hope this gives you a starting pointer. If you only want certain fields, you can access them by name in _source.

Note that in the result the latest doc is nested below the latest_doc field. If you want to remove that nesting you should be able to do so using a ingest pipeline as part of transform output.

Hope that helps!

Hi Hendrik,

Sorry for not being clear in my Initial request, assume the metricbeat index has the following documents:

{"_source": {"@timestamp":"2020-03-02T04:05:10.000Z", "host":{"hostname":"metricbeat-1"}, mem_percent:"70" }}
{"_source": {"@timestamp":"2020-03-02T04:10:10.000Z", "host":{"hostname":"metricbeat-1"}, mem_percent:"90" }}
{"_source": {"@timestamp":"2020-03-02T04:15:10.000Z", "host":{"hostname":"metricbeat-1"}, mem_percent:"80" }}
{"_source": {"@timestamp":"2020-03-02T04:25:10.000Z", "host":{"hostname":"metricbeat-1"}, mem_percent:"50" }}

{"_source": {"@timestamp":"2020-03-02T04:05:20.000Z", "host":{"hostname":"metricbeat-2"}, mem_percent:"60" }}
{"_source": {"@timestamp":"2020-03-02T04:10:20.000Z", "host":{"hostname":"metricbeat-2"}, mem_percent:"70" }}
{"_source": {"@timestamp":"2020-03-02T04:20:20.000Z", "host":{"hostname":"metricbeat-2"}, mem_percent:"80" }}
{"_source": {"@timestamp":"2020-03-02T04:25:20.000Z", "host":{"hostname":"metricbeat-2"}, mem_percent:"90" }}

{"_source": {"@timestamp":"2020-03-02T04:05:30.000Z", "host":{"hostname":"metricbeat-3"}, mem_percent:"40" }}
{"_source": {"@timestamp":"2020-03-02T04:20:30.000Z", "host":{"hostname":"metricbeat-3"}, mem_percent:"80" }}
{"_source": {"@timestamp":"2020-03-02T04:25:30.000Z", "host":{"hostname":"metricbeat-3"}, mem_percent:"70" }}
{"_source": {"@timestamp":"2020-03-02T04:30:30.000Z", "host":{"hostname":"metricbeat-3"}, mem_percent:"90" }}

What I am trying to achieve is a Kibana Data Table that tells me that tells me:

# Systems with high ram:
- metricbeat-2
- metricbeat-3

Because the latest doc foe each hostname has mem_percent > 80

I tried using a scripted metric based on what you had shared but I can't seem to filter by hostname.
The best I am able to get is the latest global value, in this case my table would only have metricbeat-3 as the doc for metricbeat-2 is older than it.

I can get what I want using this search query

GET metricbeat/_search
{
  "aggs": {
    "instances": {
      "terms": {
        "field": "hostname",
        "size": 1000,
        "order": {
          "_key": "desc"
        }
      },
      "aggs": {
        "latest_mem": {
          "top_hits": {
            "_source": [
              "mem_percent",
            ],
            "size": 1,
            "sort": [
              {
                "@timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  },
  "size": 0
}

However I do not know how to use it in kibana to create a Data Table visualization.
I mention transforms specifically because I am using them to get the latest timestamp from each hostname and save those in a dedicated index which only has the latest values.
I use that index to build a similar Data Table for when the last @timestamp is > 10 min away from now, but I am open to other ways of doing this.
Any suggestions on how to make the scripted query return a document per hostname would be appreciated too.

Thanks for your help.

Reshad

Did you tried the aggregation I suggested inside of the transform you posted in the 1st post?

This transform preview based on the data you shared:

POST _transform/_preview
{
  "source": {
    "index": [
      "metricbeat-*"
    ]
  },
  "pivot": {
    "group_by": {
      "host.hostname": {
        "terms": {
          "field": "host.hostname"
        }
      }
    },
    "aggregations": {
      "@timestamp": {
        "max": {
          "field": "@timestamp"
        }
      },
      "latest_doc": {
        "scripted_metric": {
          "init_script": "state.timestamp_latest = 0L; state.last_doc = ''",
          "map_script": "def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli(); if (current_date > state.timestamp_latest) {state.timestamp_latest = current_date;state.last_doc = new HashMap(params['_source']);}",
          "combine_script": "return state",
          "reduce_script": "def last_doc = '';def timestamp_latest = 0L; for (s in states) {if (s.timestamp_latest > (timestamp_latest)) {timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}} return last_doc"
        }
      }
    }
  },
  "description": "last-heartbeat-per-hostname",
  "dest": {
    "index": "last-heartbeat-per-hostname"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  }
}

returns e.g.

{
  "preview" : [
    {
      "hostname" : "metricbeat-1",
      "@timestamp" : "2020-03-02T04:25:10.000Z",
      "latest_doc" : {
        "hostname" : "metricbeat-1",
        "@timestamp" : "2020-03-02T04:25:10.000Z",
        "mem_percent" : "50"
      }
    },
    {
      "hostname" : "metricbeat-2",
      "@timestamp" : "2020-03-02T04:25:20.000Z",
      "latest_doc" : {
        "hostname" : "metricbeat-2",
        "@timestamp" : "2020-03-02T04:25:20.000Z",
        "mem_percent" : "90"
      }
    },
    {
      "hostname" : "metricbeat-3",
      "@timestamp" : "2020-03-02T04:30:30.000Z",
      "latest_doc" : {
        "hostname" : "metricbeat-3",
        "@timestamp" : "2020-03-02T04:30:30.000Z",
        "mem_percent" : "90"
      }
    }
  ],
...
}

However, I made a mistake in my script, you need to deep-copy the object. This example has the fixed version (when getting the source you need to deep-copy using new HashMap(...)).

(I will fix my 2nd post for later readers of this thread)

1 Like

Hi Hendrik,

This seems to do the trick. I did not think of using the scripted metric in a transform.
Thanks for your help.

I will need to play with this a bit to see that everything works as expected.

Thanks,

Reshad

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.