Tuning Logstash Batch Sizes to ECE Deployments

crickes · July 8, 2019, 8:20am

I'm aware of testing batch size when trying to figure out how much data you can send a node in Elasticsearch i.e. keep increasing the batch size until the ingest rate tops out to and extrapolate, but when sending to ECE there are other factors, not just the final allocator where the data will live, to consider.
When sending large amounts of data to deployment built in ECE, the data will routed through a loadbalancer, and then an ECE proxy before it hits a node in the deployment, and then perhaps is routed to another node if the data is destined for a shard on another node.
In my infrastructure, I have some SSD equipped servers set to be used as hot ingest nodes in my templates, but in testing, I find that although the data eventually ends upon these allocators, we see the data hitting other nodes in the deployment first. The proxy node is obviously sending the batch request to any node in the deployment and that node is unpacking the bulk request and sending the data to the destination nodes/shards. My problem is that the other (not-hot) nodes in my deployment do not have as much RAM and so this is becoming one of the bottlenecks. Ideally, I want the the ECE proxy to send all data to the hot nodes, but the only way I have found of ensuring that, is to press the 'Stop Routing' button on all other nodes.

Is there another way to ensure the data is sent only to the hot nodes?

Are there any other considerations to get optimal throughput of data into an ECE deployment?

Alex_Piggott · July 8, 2019, 1:26pm

If you set the "hot" nodes to have the ingest role (via the instance configuration or alternatively just overriding the node_type field in the advanced editor) and the "warm" nodes to not have the ingest role then the proxy will preferentially route to the hot nodes

Alex

Randy-312 · July 11, 2019, 7:51pm

We ran into the same problems as you've seen.
That's why it's critical to have your allocators have a label on them from day 1.

We learned this the hard way as part of moving to hot/warm nodes.
The EBS Backed volumes do NOT scale well as a Warm Node.
Especially considering the IO Limitations that Docker brings in to avoid noisy neighbor scenarios.

system · July 25, 2019, 7:54pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.