Elasticsearch threadpool and index settings in ECK

We are using ECK operator 1.2 and ElasticSearch 7.4.0 for a 3 node cluster with the default settings on Azure Kubernetes Services. We need to update the following ElasticSearch configuration in our cluster:
threadpool.bulk.type: fixed
threadpool.bulk.size: 24
threadpool.bulk.queue_size: 1000
threadpool.search.type: fixed
threadpool.search.size: 24
threadpool.search.queue_size: 5

we have tried adding it under nodeSets.config:

  • name: default
    config:

    most Elasticsearch configuration parameters are possible to set, e.g:

    node.attr.attr_name: attr_value
    node.master: true
    node.data: true
    node.ingest: true
    node.ml: true

    this allows ES to run on nodes even if their vm.max_map_count has not been increased, at a performance cost

    node.store.allow_mmap: false
    node.threadpool.bulk.type: fixed
    node.threadpool.bulk.size: 24
    node.threadpool.bulk.queue_size: 1000

    node.threadpool.search.type: fixed
    node.threadpool.search.size: 24
    node.threadpool.search.queue_size: 50

but elastic instance gets stuck on ApplyingChanges and elastic pods start crashing after that with the following error:

"Suppressed: java.lang.IllegalArgumentException: unknown setting [node.threadpool.search.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",

What's the best method to make these changes for ElasticSearch cluster deployed using ECK on Kubernetes?

Thanks in advance.

Further in the stack trace you should find more details, that often can guide you to the correct setting to use, for example:
Suppressed: java.lang.IllegalArgumentException: unknown setting [node.threadpool.search.queue_size] did you mean [thread_pool.search.queue_size]?

For your noted settings and referencing the 7.4 docs, it should only require:

thread_pool.write.size: 24
thread_pool.write.queue_size: 1000
thread_pool.search.size: 24
thread_pool.search.queue_size: 50

Note that many of the sizes are calculated off of the "# of available processors" that the node detects - this may then also require you to adjust the processors setting as noted in the documentation. I don't think we would generally recommend changing these settings, especially increasing the write/bulk thread pool if you are already encountering issues with bulk rejections anyway so please be aware of that.

We definitely do not recommend increasing threadpool sizes, it just hides the underlying issue.

Any idea what these errors mean version 2.4.2 is an old but entirely relevant explanation as to why.

Got it. We are trying to find the root cause of the below errors:

invalid NEST response built from a unsuccessful () low level call on POST: /_bulk?refresh=wait_for # Invalid Bulk Items

---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.

What are your recommendations?

What do your Elasticsearch logs show at the time of this error?

Hello Guys,

Thanks for your help. I will try to explain briefly what is happening so that you can react to whether our indexing strategy is the source of this issue causing overload of the Elastic indexing service or something else. The only strange thing is that it is working when we host the Elastic inside our Windows running \bin\elasticsearch.bat but inside POD is always failing.

So, the current situation is that we have a bunch of documents with metadata. Each metadata field is a document in the index with a relation field to the parent (document). When we process these documents parallel, the followings happen:

Document 1
     - Request 1:  Index Document Head
     - Request 2:  Index Document Metadata Fields (Bulk Request)
     - Request 3:  Index Document Text Extract

Document 2
     - Request 1:  Index Document Head
     - Request 2:  Index Document Metadata Fields (Bulk Request)
     - Request 3:  Index Document Text Extract

So, if we have 100 documents processing them in batches parallel, it can mean 100 / batch size requests (e.g. if batch size 4, then 25 requests) for writing the same index at the same time. Request 1, 2, 3 (which are different methods in the code as well) run sequentially awaiting each other so you can take them as ones.

The question is whether it can be the reason of this issue and if yes, then we should implement a different indexing strategy working with larger and more composite batches per a request-base?

As a second alternate solution, we could save documents and metadata to the persistent storage parallel but indexing would happen sequentially, in a dedicated thread taking available items from a queue continously because we want to make documents available for search immediately without having to wait for the last document to be saved.

Update
It seems that the team has managed to fix this issue. The backround job calling the microservice endpoint to save and index documents frequently times out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.