I have six ES nodes (ES 2.4.1); two client nodes, two master nodes, and two data nodes.
I have setup Logstash (2.2.4) to push data to the two client nodes. I have also set up Logstash to push the data to both the two client nodes and the two master nodes. Either way, I'm seeing a dramatic spike in Indexing rates, every minute the indexing rate will jump from 0 events/sec to 5000 events/sec, leaving the average around roughly 2500 events/sec.
The indexing rate is shown via Marvel as this;
The arrow is where I modified the config from LS pushing to Master/Client nodes to LS just pushing to Client nodes.
This appears to be some sort of batch processing. I have a couple questions associated with this setup;
Is it best practice to push Logstash output (from 4 LS instances) to the two client nodes, the two master nodes, or all four? I've read that you want your data nodes to not participate in directly receiving LS data.
Is there a way to smooth out this indexing rate? My primary goal here is to increase throughput, and it appears that there are downtimes where I could be processing additional data.
-- This is an issue because the log shippers (Filebeat 5.1.1) are sending data at a faster rate than I'm currently ingesting, I know this because everything is timestamped -- there's a two or three day delay between the time the log was created and the time it's ingested into ELK.