Looking for some help to implement disk and file system (files and directories) usage collection by Logstash and storage in Elasticsearch.
Basically, data processing workflows that we want to evaluate server performance along side data access and data storage. We are using Topbeat with Logstash, Elasticsearch and Kibana and it would be good to have a comprehensive time series data set that can all be easily analysed. We will have application logs to store along with the disk and file system usage metrics (and/or logs).
Collectd (df and disk modules) is storing data in InfluxDB for redundancy.
Have looked at 'df' and 'du' commands and they seem to produce the needed information which could be scripted, logged, parsed and collected. Not sure if this is the way to go.
Hope there are plenty of operations people out there that have addressed this use case.
The other thing is about monitoring all of data files on the file system.
Logging the size of every file in the file system? No, I don't believe collectd does that. Perhaps you can use Logstash's exec input to run a small script, or write a collectd plugin.
This isn't a very good idea unless you only have Beats-based inputs. The field you reference here won't be set for events from the udp input so the index name will be e.g. %{[@metadata][beat]}-2016.03.09.
document_type => "%{[@metadata][type]}"
Same thing here.
Disable the elasticsearch output for now and use a simple stdout { codec => rubydebug } output. Once things look as you expect, try enabling the ES output again.
I'm not sure what you're asking. The input plugins are agnostic about the outputs. If your elasticsearch output is configured to send all events to logstash-%{+YYYY.MM.DD} then that's where all events will go.
Okay, how can I configure Logstash to output each input plugin into their own index?
eg
syslog plugin to syslog index
collectd plugin to collectd index
exec plugin to exec index
Is that a sensible thing to do?
Seems that would be a good way to easily monitor index growth for each plugin, plugin operation and make easier the tuning of each plugin for the amount of data collected and frequency of data collection.
It's recommended by the Elastic folks, but keep in mind that having too many shards per ES node is a bad idea. It's even more important that you review the default shards count of five.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.