We are using ECK operator 1.2 and ElasticSearch 7.4.0 for a 3 node cluster with the default settings on Azure Kubernetes Services. We need to update the following ElasticSearch configuration in our cluster:
threadpool.bulk.type: fixed
threadpool.bulk.size: 24
threadpool.bulk.queue_size: 1000
threadpool.search.type: fixed
threadpool.search.size: 24
threadpool.search.queue_size: 5
we have tried adding it under nodeSets.config:
name: default
config:
most Elasticsearch configuration parameters are possible to set, e.g:
but elastic instance gets stuck on ApplyingChanges and elastic pods start crashing after that with the following error:
"Suppressed: java.lang.IllegalArgumentException: unknown setting [node.threadpool.search.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",
What's the best method to make these changes for ElasticSearch cluster deployed using ECK on Kubernetes?
Further in the stack trace you should find more details, that often can guide you to the correct setting to use, for example: Suppressed: java.lang.IllegalArgumentException: unknown setting [node.threadpool.search.queue_size] did you mean [thread_pool.search.queue_size]?
For your noted settings and referencing the 7.4 docs, it should only require:
Note that many of the sizes are calculated off of the "# of available processors" that the node detects - this may then also require you to adjust the processors setting as noted in the documentation. I don't think we would generally recommend changing these settings, especially increasing the write/bulk thread pool if you are already encountering issues with bulk rejections anyway so please be aware of that.
Got it. We are trying to find the root cause of the below errors:
invalid NEST response built from a unsuccessful () low level call on POST: /_bulk?refresh=wait_for # Invalid Bulk Items
---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
Thanks for your help. I will try to explain briefly what is happening so that you can react to whether our indexing strategy is the source of this issue causing overload of the Elastic indexing service or something else. The only strange thing is that it is working when we host the Elastic inside our Windows running \bin\elasticsearch.bat but inside POD is always failing.
So, the current situation is that we have a bunch of documents with metadata. Each metadata field is a document in the index with a relation field to the parent (document). When we process these documents parallel, the followings happen:
Document 1
- Request 1: Index Document Head
- Request 2: Index Document Metadata Fields (Bulk Request)
- Request 3: Index Document Text Extract
Document 2
- Request 1: Index Document Head
- Request 2: Index Document Metadata Fields (Bulk Request)
- Request 3: Index Document Text Extract
So, if we have 100 documents processing them in batches parallel, it can mean 100 / batch size requests (e.g. if batch size 4, then 25 requests) for writing the same index at the same time. Request 1, 2, 3 (which are different methods in the code as well) run sequentially awaiting each other so you can take them as ones.
The question is whether it can be the reason of this issue and if yes, then we should implement a different indexing strategy working with larger and more composite batches per a request-base?
As a second alternate solution, we could save documents and metadata to the persistent storage parallel but indexing would happen sequentially, in a dedicated thread taking available items from a queue continously because we want to make documents available for search immediately without having to wait for the last document to be saved.
Update
It seems that the team has managed to fix this issue. The backround job calling the microservice endpoint to save and index documents frequently times out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.