Gents you've been awesome - that helps clear up a whole bunch of concepts and sysadmin operations.
Think I have found the problem.
dev-36: - had multiple configuration files
/etc/logstash/conf.d/logstash.conf
/etc/logstash/conf.d/logstash.conf.original
/etc/logstash/conf.d/logstash-topbeats.conf
/etc/logstash/conf.d/logstash.syslog.conf
dev-37: - had one configuration file
/etc/logstash/conf.d/logstash.conf
dev-36 - made the configuration files (logstash.conf and topbeat.yml) identical to those on dev-37
dev-36 - logstash still starting and stopping after a few seconds (problem persisting)
dev-36 - removed all other .conf files from /etc/logstash/conf.d
dev-36 - logstash now started and still running after 10 minutes (problem solved, I think)
It would be great to do some sanity tests for the topbeat data collection and storage in Elasticsearch.
The ELK stacks are running on our worker servers (E5v3 16 Core, 1024 GB RAM) which is part of the reason for collecting the performance metrics (next syslogs, and eventually app logs).
We are developing our data processing pipelines and need to evaluate the performance of our internal application and its algorithms (also written in Java).
I guess there are some obvious questions about optimisation of the ELK stack processes for a balance somewhere between performance and reliability.
I noted in some of the other vendor stacks (Influx, TICK, collectd, rsyslog) forums there can be issues with metric data being dropped under certain circumstances (heavy cpu or memory loads).
Welcome and suggestions and guidance on these topics..
But so far it looks like the logstash issue on dev-36 has been solved..
Configuration files:
dev-36:/etc/logstash/conf.d/logstash.conf
input { beats { port => 5044 } } output { elasticsearch { hosts => "localhost:9200" manage_template => false index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}" document_type => "%{[@metadata][type]}" } }
dev-36/etc/topbeat/topbeat.yml
input: period: 10 procs: [".*"] stats: system: true proc: true filesystem: true cpu_per_core: true output: logstash: hosts: ["127.0.0.1:5044"]
Does the logging to files help address that issue or concern about potential data loss?
Not currently in topbeat.yml
logging: to_files: true files: path: /var/log/topbeat name: topbeat.log rotateeverybytes: 10485760 keepfiles: 7 level: info
Topbeat now starting on both dev-38 and dev-39.
Replaced the contents of the topbeat.yml files on both servers and the service started and seems to running normally. This looks like it may have been a hidden characters issue, was having the same issue on both servers, same changes appears to have fixed the problem on both servers..
Alight, next on the list is syslog (rsyslog) collection by logstash, have searched and read a few posts about the syslog plug-in.
I am keen to get this operational as soon as possible on our servers, the documentation did not seem very detailed, are you able to help with this here, or start a new thread?