Elastic ingest node load not balanced with pipeline

ss_22 · April 7, 2022, 4:55pm

Hi,

I have a ECK cluster with 6 data/ingest nodes. I also have an ingest node pipeline deployed and processing logs from Filebeat. While it is running I noticed only one node being utilized near 100% CPU where the others hover around 25%.

When I dig the issue more from following command, it clearly shows only the one node does the pipeline processing.

GET _nodes/stats/ingest?filter_path=nodes.*.ingest

From this information, it seems the resource constraints of ingest are not being spread around appropriately.

Is there a configuration in our helm we missed by default? Is there a configuration as part of the Ingest Node Pipeline that we are supposed to set?

Thanks,
Scott

stephenb · April 8, 2022, 2:07am

Did you try using and array of hosts in filebeat for elasticsearch output?

See Here

hosts

The list of Elasticsearch nodes to connect to. The events are distributed to these nodes in round robin order. If one node becomes unreachable, the event is automatically sent to another node. Each Elasticsearch node can be defined as a URL or IP:PORT . For example: http://192.15.3.2 , https://es.found.io:9230 or 192.24.3.2:9300 . If no port is specified, 9200 is used.

ss_22 · April 11, 2022, 5:18pm

Hi Stephen,

We use a load-balancer / ingress in front of our Elasticsearch nodes. We list that as our host endpoint.

Thanks,
Scott

stephenb · April 11, 2022, 7:42pm

Then it could be a couple things first to come to mind

your load balancer is "Sticky" instead of round robin.

or

You only have 1 primary shard so where that primary shard resides is where the hot node is.

ss_22 · April 11, 2022, 8:24pm

We have ensured that the load balancer is not sticky and have 3 primary shards (1 replica) spread across the 6 data nodes.

Any other ideas?

stephenb · April 11, 2022, 8:51pm

Did you run Hot Threads on that node to see what it is?

ss_22 · April 11, 2022, 9:13pm

Yes, but if I remember correctly we just saw the pipeline processing we expected.

What should we be looking for? I'll re-run that API call

stephenb · April 11, 2022, 10:18pm

Whatever is taking up the most CPU....

And can you see the traffic from the ingress load balancer is evenly distributed?

I ask because you can set round robin... but if that is by connection and there is 1 main connection and that is stuck on a single node ... I have seen that.

ss_22 · April 19, 2022, 1:22pm

Hi Stephen,

Sorry for the late response. We have put our ingest nodes on dedicated nodes which helped -- this led to an even CPU spread across the availability zones.

But, we are still seeing the unevenness with our data nodes ... basically our a-0 node is always over %85+ where the other nodes are in the low teens. I ran a hot threads and got some of the following results (sorry I cant copy/paste them in here):

68.5% (342.7ms out of 500ms) cpu usage by thread 'Elasticsearch[eck-Elasticsearch-es-data-zone-a-0][filebeat-000839][0]: Lucene Merge Thread #594]'

68.5% (339.1ms out of 500ms) cpu usage by thread 'Elasticsearch[eck-Elasticsearch-es-data-zone-a-0][write][T#1]'

64.0% (320.1ms out of 500ms) cpu usage by thread 'Elasticsearch[eck-Elasticsearch-es-data-zone-a-0][write][T#3]'

stephenb · April 19, 2022, 4:33pm

This means that a merge / force merge is taking place on that node... if your ILM policies are requiring force merges that could be the source.. merges happen on the data node where the data resides..

Otherwise looks like there are a lot of writes... if there is a single primary shard then the writes are happening there...

You can look at the _threads as well.

Otherwise I do not have much more suggestions perhaps someone else will.

system · May 17, 2022, 4:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest nodes don't load balance the pipeline load Elasticsearch	1	527	December 8, 2020
Load unfairly distributed during large ingest Elasticsearch	6	663	May 8, 2017
Performance bottleneck enriching documents due to only a single node processing the ingest pipelines Elasticsearch ingest-pipeline	7	530	November 9, 2022
Ingest of data hammering one node only Elasticsearch	8	2020	February 21, 2020
Do I tell the world to hit one node? Or many? Or load balance? Elasticsearch	5	908	July 6, 2017

Elastic ingest node load not balanced with pipeline

hosts

Related topics

`hosts`