How do I measure/monitor my logstash servers performance?

Hi,

With logstash version 1.5.1, my crashing issues went away. Great! Now I'd like to see about making sure my logstash cluster is sized correctly.

Currently have a number of logstash servers behind a DSR hardware lb vip. How would I know if I had too few servers, or too many? If too few, I'd assume higher and higher latency, and maybe memory pressure. I don't see that.

I have a mix of remote syslog to logstash, plus pulling logs from redis.

Any suggestions? Thanks.

This is something we are working on improving by providing monitoring API endpoints into LS.

Currently your best bet would be to monitor your redis list lengths and do something on a high, EPS level using metrics.

Okay, so I set-up the example metrics filter to measure rate. I've got the 1 minute rate going into graphite. Code below:

filter {

  metrics {
    meter => "events"
    add_tag => "metric"
  }

  if "metric" in [tags] {
    mutate {
      add_field => { "graphite_hostname" => "%{message}" }
    }
    mutate {
      gsub => [
        "graphite_hostname", "\.", "_"
      ]
    }
  }
}

output {
  if "metric" in [tags] {
    graphite {
      metrics => { "logstash.events.%{graphite_hostname}.rate_1m" => "%{events.rate_1m}" }
    }
  }
}

If I sum the various rate_1m values, that's the per/second incoming event rate? (Averaged over a minute.)

Answering my own question.

Yes. It's per second rate. logstash-filter-metrics uses the ruby 'metriks' gem for this. And in that gem, here's the documentation:

one_minute_rate()

Returns the one-minute average rate.

meter = Metriks.meter('requests')
puts "rate: #{meter.one_minute_rate}/sec"