The fact that 98% of CPU is used does not necessarily mean that only one processing thread is active. Logstash can only process as fast as the downstream are able to accept the data, so it is useful to verify that Elasticsearch is not the bottleneck here. What does CPU usage and disk I/O and iowait look like on your Elasticsearch cluster? How many indices and shards are you actively writing into? Do you see any errors or warnings in the Elasticsearch logs? What is the specification of your Elasticsearch cluster and what throughput are you seeing?
If this is not the bottleneck, have you verified that you have enough data coming in to keep all pipeline threads busy?
I have 3 VMs, in each one there is one instance of logstash, elasticsearch and kibana. The first machine : 8 vCPUs, 48 GO Memory and 200GB HDD ( I am sending data to logstash running here and send data to the third machine, and my elasticsearch is a master) The second one : 4 vCPUs,48 GO Memory and 200GB HDD ( another instance of ES running here as a master) The third machine: 4 vCPUs,48 GO Memory and 200GB HDD ( another instance of ES running here node.master: false and node.data: true)
My filebeat shipped data from 3 servers and send it to logstash ( VM1)
I used this config for another filebeat which shipped data from only one server and it works fine but when I have 3 servers I am loosing a lot of data on my ES.
As you see there is only 3 indexes only because the output of my logstash is the name of the server.
How can I check that logstash receiving enough data ? because in this case I send 3 times more data compared to my first test
Having 2 master-eligible nodes in a cluster is bad practice as you should always aim to have at least 3. I would therefore recommend you make all 3 nodes master eligible and also make sure you set minimum_master_nodes to 2.
It would be great if you also could answer my other questions, e.g. about disk I/O and seen throughput.
It may be that your VMs are overcommitted with respect to CPU and/or memory, so the VMs do not actually have access to the resources you have assigned. That could certainly explain why you do not get higher utilisation, and is something you should check.
Problem resolved: I am using now another instance with ubuntu and not debian . This instance is running on openstack, so it was not a problem of hardware. I don't know why the instance with debian image couldn't use more than one CPU and the ubuntu instance is running very well.
Hello, just for info, I found out the problem. In fact I was using an old version of kernel, I upgraded it and logstash was able to use more than one CPU. But the systemd-journal was using 90-100% of one CPU even after the upgrade. There was another problem in my journalctl.conf I edited it and now my ELK is working very well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.