How the indexing requests will be routed to Data nodes from client node

Hi ,

I have cluster of 4 data nodes, 3 master and 1 client
When I see the _cat/thread_pool, I could see the queue and active count only 2 nodes pre-dominently.

10.158.36.204 10.158.36.204 0  0 0 0 0 0 0 0 0 
10.158.36.200 10.158.36.200 0  0 0 0 0 0 0 0 0 
10.158.36.211 10.158.36.211 8 60 0 0 0 0 0 0 0 
10.158.36.212 10.158.36.212 8  20 0 0 0 0 0 0 0 
10.158.36.202 10.158.36.202 0  0 0 0 0 0 0 0 0 
10.158.36.199 10.158.36.199 0  0 0 0 0 0 0 0 0 
10.158.36.209 10.158.36.209 0  0 0 0 0 0 0 0 0 
10.158.36.201 10.158.36.201 0  0 0 0 0 0 0 0 0 
10.158.36.208 10.158.36.208 0  0 0 0 0 0 0 0 0 
10.158.36.220 10.158.36.220 0  0 0 0 0 0 0 0 0 

My config is as below , for all the data nodes nodes.

cluster.name: ittesprod
node.name: ITTESPROD-DATA4
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
network.host: _non_loopback_
http.enabled: false
discovery.zen.ping.unicast.hosts: ["10.158.36.220","10.158.36.200","10.158.36.201","10.158.36.202","10.158.36.210","10.158.36.211","10.158.36.212","10.158.36.208"]
threadpool.bulk.queue_size: 500
bootstrap.memory_lock: true
indices.memory.index_buffer_size : 25%
indices.requests.cache.size: 5%
indices.queries.cache.size: 15%
indices.store.throttle.max_bytes_per_sec : 500mb

Client node config is mentioned below.

node.master: false
node.data: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.158.36.220","10.158.36.200","10.158.36.201","10.158.36.202","10.158.36.204","10.158.36.208","10.158.36.211","10.158.36.212","10.158.36.209"]
threadpool.bulk.queue_size: 1000

when I see the bulk complete request count, it is high on 2 nodes, like 10 + lakhs. and other 2 nodes as 3+ lakhs.

What should I check, why bulk indexing requests are routed to only 2 nodes, instead of round robin to all data nodes??

Please suggest, what to check here?

Depends where the shards that the requests need to go to live.

Hi,

10.158.36.211 and 10.158.36.212 are data nodes?

Do you bulk index your documents into the same index? If so, are this index shards assigned to all data nodes?

@tanguy

10.158.36.211 and 10.158.36.212 are data nodes?

Yes

Do you bulk index your documents into the same index? If so, are this index shards assigned to all data nodes?

yes, this index has 20 shards, and 3 replicas, which is distributed across 4 nodes.

Is that something like, bulk index will happen only on primary shard and then replicated to other nodes?

Also, as I have 20 shards, the data might me going to only 5 shards, where as those 5 are available on single node? is this possible?

For a bulk index of 100 documents, the coordinating node (ie the node which receives the request) analyzes the metadata line of each item in order to find to shard in which the document will be indexed. Then it groups the documents by shard, and sends in parallel the groups of documents to their target shards.

If you don't use features like routing and nested documents, the 100 documents will be spread over the 20 shards. It means that the coordinating node will prepare 20 batch of 5 documents each and send them to each primary shard. Then, the documents are indexed on the primary shard and once done, documents are sent to replicas shards to be indexed there too.

Yes, this is possible to have "hot" shards when the documents ends up to concern only a subset of shards.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.