Spikes in Indexing Rate - Methods to Increase Indexing Rate?

I have six ES nodes (ES 2.4.1); two client nodes, two master nodes, and two data nodes.

I have setup Logstash (2.2.4) to push data to the two client nodes. I have also set up Logstash to push the data to both the two client nodes and the two master nodes. Either way, I'm seeing a dramatic spike in Indexing rates, every minute the indexing rate will jump from 0 events/sec to 5000 events/sec, leaving the average around roughly 2500 events/sec.

The indexing rate is shown via Marvel as this;


The arrow is where I modified the config from LS pushing to Master/Client nodes to LS just pushing to Client nodes.

This appears to be some sort of batch processing. I have a couple questions associated with this setup;

  1. Is it best practice to push Logstash output (from 4 LS instances) to the two client nodes, the two master nodes, or all four? I've read that you want your data nodes to not participate in directly receiving LS data.

  2. Is there a way to smooth out this indexing rate? My primary goal here is to increase throughput, and it appears that there are downtimes where I could be processing additional data.
    -- This is an issue because the log shippers (Filebeat 5.1.1) are sending data at a faster rate than I'm currently ingesting, I know this because everything is timestamped -- there's a two or three day delay between the time the log was created and the time it's ingested into ELK.

Send requests to the data or client nodes, not to the data nodes.

That seems irregular.
What's the load on ES like?

That's not ideal as there is no majority for a quorum.

Do you mean 'send requests to the master or client nodes'? I'm curious if there's a logical preference between the two..
I modified my config from sending LS data to both the two masters and the two client nodes to just sending the data to the two client nodes -- really saw no difference in the indexing rate.

Also, I've read that having too many client nodes can be detrimental to performance as any data allocation process by one client node must be confirmed with the other before data is committed. I'm considering removing one of the client nodes and just directing all LS data to the one client node.

The load doesn't seem too large, the master & data nodes all have 24 CPUs, the VMs each have 8.

@warkolm while I have your attention, does the numbers above look okay? Any tweaking you'd do in order to get more performance out of my cluster?

I agree and you've told me that before but I had no additional physical machines to allocate. At the beginning of the year I'm getting new hardware and will be utilizing three master nodes as per Elastic' recommendation.

Sorry, I meant "send requests to the data and client nodes, not the master nodes".

Where did you read this?

Yeah that looks ok.

Thanks, I'll send LS data to the client & data nodes to see if that increases throughput -- it held pretty steady when I moved to just client nodes receiving the LS data, as compared to both Client & Master nodes.

Is there any reason for the spiking in the throughput as seen in my first post? Is it batching up data or something? I ask because this is still occurring and I'm unsure whether it's the desired behavior or not..

Ahh ok, so that's not going to really impact data allocation, more cluster state changes.

Regarding the sawtooth, it could be a data sampling problem, it could be LS batching things up. It might be worth moving this to the X-Pack category to see what the team there thinks.

I'll do that, thanks for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.