We have a cluster of 4 nodes: A, B, C, and D. All four have all default roles assigned, including the ingest role. We also have a separate master node.
We have recently put new ingest pipelines into use with which we enrich documents we send to Elasticsearch from one or multiple enrich indices. When we use these ingest pipelines for enriching documents we send to Elasticsearch, only node B becomes very active processing these (verified using GET _nodes/B/hot_threads) and the other three are not involved. Because only node B is involved and it cannot handle the load required for these ingest pipelines (it must process millions of documents, totalling several hundred GB of data), it becomes a huge performance bottleneck for us.
The index we are writing to has its primary shard on node A and a replica on node C, so nothing on node B. The various enrich indices have shards on all four nodes and their corresponding source indices have shards on nodes A and C or nodes A and D, so again nothing on node B.
What is going on? Why would node B, out of all nodes, become active and how could we fix this and spread the load?
It looks like that may well be the issue. The IP we are using to connect to Elasticsearch is precisely that of node B as opposed to those of the other nodes.
How can I send data to all nodes? We are using .NET and NEST version 7.17.4. Is it a matter of supplying the URLs for each of the nodes to NEST's ElasticClient?
It looks like this resolves the ingest only being run on a single node, but I have difficulty properly verifying this, because we have had to temporarily stop using the ingest pipeline because of resource limitations in multiple of places. According to the Elasticvue Chrome extension our Elasticsearch cluster is very constrained in its RAM (but not its heap) resources (all our nodes usually use 95-99% of the RAM they have), even when we're not using the ingest pipeline. Do you have any thoughts on how we might improve Elasticsearch's RAM usage?
It is always recommended to run Elasticsearch on its own dedicated nodes. Elasticsearch relies on the heap (typically no more than 50% of available RAM) but also stores some data off-heap. In addition to this it relies on the operating system page cache for performance. This can quickly use up all available RAM on a host, but that is expected and not a problem since any memory assigned to the page cache can quickly be reclaimed by the operating system if any other process should require it.
What you are describing therefore sounds normal and nothing to worry about, assuming page cache usage is included inyour measurement.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.