Performance bottleneck enriching documents due to only a single node processing the ingest pipelines

JvSPV · October 6, 2022, 9:30am

We have a cluster of 4 nodes: A, B, C, and D. All four have all default roles assigned, including the ingest role. We also have a separate master node.

We have recently put new ingest pipelines into use with which we enrich documents we send to Elasticsearch from one or multiple enrich indices. When we use these ingest pipelines for enriching documents we send to Elasticsearch, only node B becomes very active processing these (verified using GET _nodes/B/hot_threads) and the other three are not involved. Because only node B is involved and it cannot handle the load required for these ingest pipelines (it must process millions of documents, totalling several hundred GB of data), it becomes a huge performance bottleneck for us.

The index we are writing to has its primary shard on node A and a replica on node C, so nothing on node B. The various enrich indices have shards on all four nodes and their corresponding source indices have shards on nodes A and C or nodes A and D, so again nothing on node B.

What is going on? Why would node B, out of all nodes, become active and how could we fix this and spread the load?

Christian_Dahlqvist · October 6, 2022, 10:47am

How are you sending the data to Elasticsearch? Have you checked whether you are sending data to all nodes?

leandrojmp · October 6, 2022, 12:29pm

Are you using the enrich processor, right?

Can you share your enrich policies?

Also, how are you sending your data to elasticsearch and what are the specs of your nodes?

The recommendation from the documentation is to use dedicated ingest node for heavy loads.

JvSPV · October 6, 2022, 12:43pm

It looks like that may well be the issue. The IP we are using to connect to Elasticsearch is precisely that of node B as opposed to those of the other nodes.

How can I send data to all nodes? We are using .NET and NEST version 7.17.4. Is it a matter of supplying the URLs for each of the nodes to NEST's ElasticClient?

Christian_Dahlqvist · October 6, 2022, 12:52pm

I am not a .NET developer, but that is typically how it works for most other clients.

JvSPV · October 12, 2022, 3:04pm

It looks like this resolves the ingest only being run on a single node, but I have difficulty properly verifying this, because we have had to temporarily stop using the ingest pipeline because of resource limitations in multiple of places. According to the Elasticvue Chrome extension our Elasticsearch cluster is very constrained in its RAM (but not its heap) resources (all our nodes usually use 95-99% of the RAM they have), even when we're not using the ingest pipeline. Do you have any thoughts on how we might improve Elasticsearch's RAM usage?

Christian_Dahlqvist · October 12, 2022, 3:24pm

It is always recommended to run Elasticsearch on its own dedicated nodes. Elasticsearch relies on the heap (typically no more than 50% of available RAM) but also stores some data off-heap. In addition to this it relies on the operating system page cache for performance. This can quickly use up all available RAM on a host, but that is expected and not a problem since any memory assigned to the page cache can quickly be reclaimed by the operating system if any other process should require it.

What you are describing therefore sounds normal and nothing to worry about, assuming page cache usage is included inyour measurement.

system · November 9, 2022, 3:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enrich Processor is slow on multi nodes Elasticsearch ingest-pipeline	10	1096	January 6, 2021
Elastic ingest node load not balanced with pipeline Elasticsearch ingest-pipeline	10	805	May 17, 2022
Ingest only nodes Elasticsearch	4	597	July 4, 2020
Ingest Nodes Uses for ELK stack Elasticsearch	2	425	March 16, 2019
Updating enrich index for pipeline Elasticsearch	8	748	September 9, 2023

Performance bottleneck enriching documents due to only a single node processing the ingest pipelines

Related topics