How the indexing requests will be routed to Data nodes from client node

nethis · February 3, 2017, 9:27am

Hi ,

I have cluster of 4 data nodes, 3 master and 1 client
When I see the _cat/thread_pool, I could see the queue and active count only 2 nodes pre-dominently.

10.158.36.204 10.158.36.204 0  0 0 0 0 0 0 0 0 
10.158.36.200 10.158.36.200 0  0 0 0 0 0 0 0 0 
10.158.36.211 10.158.36.211 8 60 0 0 0 0 0 0 0 
10.158.36.212 10.158.36.212 8  20 0 0 0 0 0 0 0 
10.158.36.202 10.158.36.202 0  0 0 0 0 0 0 0 0 
10.158.36.199 10.158.36.199 0  0 0 0 0 0 0 0 0 
10.158.36.209 10.158.36.209 0  0 0 0 0 0 0 0 0 
10.158.36.201 10.158.36.201 0  0 0 0 0 0 0 0 0 
10.158.36.208 10.158.36.208 0  0 0 0 0 0 0 0 0 
10.158.36.220 10.158.36.220 0  0 0 0 0 0 0 0 0

My config is as below , for all the data nodes nodes.

cluster.name: ittesprod
node.name: ITTESPROD-DATA4
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
network.host: _non_loopback_
http.enabled: false
discovery.zen.ping.unicast.hosts: ["10.158.36.220","10.158.36.200","10.158.36.201","10.158.36.202","10.158.36.210","10.158.36.211","10.158.36.212","10.158.36.208"]
threadpool.bulk.queue_size: 500
bootstrap.memory_lock: true
indices.memory.index_buffer_size : 25%
indices.requests.cache.size: 5%
indices.queries.cache.size: 15%
indices.store.throttle.max_bytes_per_sec : 500mb

Client node config is mentioned below.

node.master: false
node.data: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.158.36.220","10.158.36.200","10.158.36.201","10.158.36.202","10.158.36.204","10.158.36.208","10.158.36.211","10.158.36.212","10.158.36.209"]
threadpool.bulk.queue_size: 1000

when I see the bulk complete request count, it is high on 2 nodes, like 10 + lakhs. and other 2 nodes as 3+ lakhs.

What should I check, why bulk indexing requests are routed to only 2 nodes, instead of round robin to all data nodes??

Please suggest, what to check here?

warkolm · February 3, 2017, 9:33am

Depends where the shards that the requests need to go to live.

tanguy · February 3, 2017, 9:34am

Hi,

10.158.36.211 and 10.158.36.212 are data nodes?

Do you bulk index your documents into the same index? If so, are this index shards assigned to all data nodes?

nethis · February 3, 2017, 9:37am

@tanguy

10.158.36.211 and 10.158.36.212 are data nodes?

Yes

Do you bulk index your documents into the same index? If so, are this index shards assigned to all data nodes?

yes, this index has 20 shards, and 3 replicas, which is distributed across 4 nodes.

Is that something like, bulk index will happen only on primary shard and then replicated to other nodes?

Also, as I have 20 shards, the data might me going to only 5 shards, where as those 5 are available on single node? is this possible?

tanguy · February 3, 2017, 10:26am

For a bulk index of 100 documents, the coordinating node (ie the node which receives the request) analyzes the metadata line of each item in order to find to shard in which the document will be indexed. Then it groups the documents by shard, and sends in parallel the groups of documents to their target shards.

If you don't use features like routing and nested documents, the 100 documents will be spread over the 20 shards. It means that the coordinating node will prepare 20 batch of 5 documents each and send them to each primary shard. Then, the documents are indexed on the primary shard and once done, documents are sent to replicas shards to be indexed there too.

Yes, this is possible to have "hot" shards when the documents ends up to concern only a subset of shards.

system · March 3, 2017, 10:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is bulk index sending to data nodes better or non-data nodes? Elasticsearch	3	1473	July 6, 2017
Routing - Massive injection with bulk API Elasticsearch	8	962	July 5, 2017
Why my bulk request not using all cpus of others node? Elasticsearch	27	678	June 8, 2019
Search Queue of a particular node fills up and not routed on other nodes Elasticsearch	8	936	January 6, 2018
Elasticsearch cluster spreading the bulk tasks Elasticsearch	7	942	July 6, 2017

How the indexing requests will be routed to Data nodes from client node

Related topics