Split event by an object of objects into separate events

I am attempting to use the logstash-filter-metric plugin to produce metrics split by the source [host] of the events. We have > 1000 different log sources, and being able to track where all the logs are coming from across each of our logstash instances on our cluster is useful. With five logstash receivers and five logstash processor instances, having this breakdown of eps rate_1m per logstash host would be ideal.

This is in some ways a continuation of my comment on Generate one event per metric name #30, and trying to work around the restriction of the logstash-filter-metric plugin inability to split metric by a field value.

@magnusbaeck: I have read all your replies to similar comments, this general topic is covered repetitively in similar ways, but not quite the same scenario as far as I could see.

I'm using logstash-5.4.2 for this test configuration:

# An input for testing
input {
  file {
    path => "/tmp/production-beat-output-log-samples.json"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    ignore_older => "0"
    codec => "json"
  }
}
# Produce some metrics by source host
filter {
  if [message] {
    metrics {
      meter => [ "[metric][%{host}]" ]
      add_tag => "metric"
      rates => 1
    }
  }
  if "metric" in [tags] {
    # Try to split the metrics into unique events, alas, not an array
    split {
      field => "metric"
      target => ""
    }
  }
}
# Output some stuff to stdout
output {
  if "metric" in [tags] {
    stdout {
      codec => json
    }
  }
}

This outputs JSON like this:

{
  "@timestamp":"2017-06-22T03:51:41.942Z",
  "metric":{
    "server1":{"rate_1m":1.0,"count":623},
    "server2":{"rate_1m":2.0,"count":68},
    "server3":{"rate_1m":3.0,"count":873},
    "server4":{"rate_1m":4.0,"count":17},
    "server5":{"rate_1m":5.0,"count":1063},
    "server6":{"rate_1m":7.0,"count":1},
    "server7":{"rate_1m":2.0,"count":336},
    "server8":{"rate_1m":4.0,"count":2295},
    "server9":{"rate_1m":0.1,"count":1132}
  },
  "ls-host":"logstash1"
  "@version":"1",
  "message":"mymachine",
  "tags":
    ["metric"]
}

I want to eventually produce events like:

{
  "@timestamp":"2017-06-22T03:51:41.942Z",
  "host": "server1",
  "rate_1m": 1.0,
  "count": 623,
  "ls-host":"logstash1"
  "@version":"1",
  "message":"mymachine",
  "tags":
    ["metric"]
}
{
  "@timestamp":"2017-06-22T03:51:41.942Z",
  "host": "server2",
  "rate_1m": 2.0,
  "count": 68,
  "ls-host":"logstash1"
  "@version":"1",
  "message":"mymachine",
  "tags":
    ["metric"]
}

I've tried to use the split, mutate and metricize plugins, but none of them split an object of objects into separate events. I've found an approach where if you list every possible sub-object, I could clone them out, but given the vast amount of sources I have coming and going, it's really not practical.

Am I missing something fundamental here, is there an easier better way to get the stats I'm after?
Could we just merge in the improvement to metrics filter: Split Metrics #45?
Or is there an option in another filter to accomplish what I'm looking for?

This might be out of scope depending on your case or amount of metrics, but maybe the [statsd output plugin] (http://www.elastic.co/guide/en/logstash/current/plugins-outputs-statsd.html) could suit your needs better.

The following would create a separate metric for each host (and can be further segregated by using namespace and sender), though it would require setting up a statsd server (there are ElasticSearch plugins for statsd but I haven't tried them).

statsd {
    ...
    increment => [ "event_count_%{host}" ]
    ...
}
1 Like

Hi @paz,

Thanks for the idea, actually, it turns out that's exactly what i'm attempting to move away from and have been using this method for a couple of years. I started with statsd, but now telegraf has a statsd compatible input and ingest that data into influxdb.

The problem is these increment packets is that they are UDP, and increment relies on the counters being incremented reliably, which it's fundamentally not. The eps rates of the metric filter would be far less spammy and means I could send them using TCP as gauge metrics.

It's possible I could send the metrics events output to a custom golang service to munge it, but it seems like a function that either the logstash-filter-split or logstash-filter-metrics plugin could of been capable of.

I've had an idea of using a separate elasticsearch cluster for the the log cluster metrics for some time, however, this metrics splitting is one of the problems for moving away from statsd and inflxudb.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.