My elastic cluster:- version 7.2.0.
3 data nodes, 1 master node, 2 master+ingress nodes
Problem: I could see too many rejections on data nodes. So there is data loss
GET _nodes/moss-eck-es-data-2/stats/thread_pool?human&pretty
"write" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 1361456,
"largest" : 4,
"completed" : 3887827
}
My current shards in the cluster are around 720.
My use case: We are inserting data where the document has nearly 200 fields. we will frequently update the records in the same bulk request. In single bulk request it may contain data updates for multiple indices.
Can anyone please suggest here, what might be going wrong and what might help.
this means that the bulk requests could not be processed, because the thread pools and their queue was filled up at the time of the index request. This information will be returned as part of hte bulk API, so the client can indicate what the next step should be - discarding those documents or waiting and trying a second time.
Judging by the number of threads, I assume that you got four cores. In order to to have faster writing you could either add more nodes or increase the number of cores - it is of course possibly, that you might hit another bottleneck like I/O.
Maybe you can talk a little bit more about your cluster. Is it only this node that has rejections or is your whole cluster under load or overloaded?
The number of shards you are actively indexing into will affect how quickly queues fill up and you seem to have a quite large number of shards given your data volume. Are you able to reduce the number of shards?
We will push metrics data for every 15 min. For now, We are flushing nearly 10 to 50 bulk requests of size vary between 2MB to 64MB. The single bulk request will contain data, which will index documents (of nearly 200 fields) into 80 indices ( 2 shards and one replica). and bulk requests will also documents with the same ID multiple times (update scenario).
are you sending all those bulk requests in parallel or ware you waiting for some to finish? reducing the number of shards with that number of nodes sounds like a good idea as well to me.
We are sending four requests in parallel (flush thread count: 4). And we are waiting for max 20s for response and most of the times our bulk request is getting timeout at client side.
Frequent updates of the same document can have a very negative impact on indexing performance at it can result in a large number of small flushes. Try to avoid this at any cost.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.