All of the primary shards are on 1 node. Would that bottlenecks our writes?

Jehu_T · June 17, 2020, 8:08am

Hi,

We see that almost all of the primary shards are on 1 node out of the 3 nodes. Would that bottlenecks our writes? Our ES Indexing Service has code to catch what's basically a "queue full" state on the server and resubmits rejected write requests until they actually go through. By having all primary shards on the one node, I think we do not spread the writes across the 3 nodes, and therefore the queue-able number of requests is 1/3 of what it could be.

Cluster config:

{
"persistent": {
"cluster": {
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"cluster_concurrent_rebalance": "10",
"node_concurrent_recoveries": "10",
"enable": "all"
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "200mb"
}
},
"xpack": {
"monitoring": {
"collection": {
"enabled": "true"
}
}
}
}

warkolm · June 17, 2020, 10:16am

Do you have replicas set?

Jehu_T · June 17, 2020, 10:17am

Yes 2 replicas. The 2 replicas are on the other 2 nodes. But all the primary shards are on the first node.

warkolm · June 17, 2020, 10:22am

Are you sending all the requests to the node with all the primaries?
How did they all manage to end up on the one node? That's pretty odd, and usually only due to manual intervention.

Jehu_T · June 17, 2020, 11:05am

No, we are sending it to all nodes. Any workaround to fix this?

The current status is as below:

{
  "cluster_name" : "cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 464,
  "active_shards" : 1244,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

jcampanell-c2f · June 23, 2020, 3:17pm

Hi,

Let me jump in here. I work with Jehu. We are trying to determine if, when writing to many indexes and many records to these indexes, will we see a degradation in performance if the vast majority of the primary shards are on one node. The code writing to the indexes is written in Java, using the RestHighLevelClient, with all 3 of the nodes' IP addresses/ports provided. In watching the log files, I am occasionally seeing a substantial number of bulk requests returned with something along the lines of a "request queue full" error, which then means the failed records need to be resubmitted until they are processed (there is a small delay before resubmitting the failed requests). My assertion is that by spreading out the primary shards across the cluster, we will get better write performance and fewer failures since there would be a better distribution of the write requests across the cluster. Am I correct in this assumption?

Thanks,
Jim

system · July 21, 2020, 3:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a performance issue if all the primary shards are located on a single node? Elasticsearch	2	347	July 27, 2020
All primary shards are in same node. Why? Version 1.1.1 Elasticsearch	3	447	July 6, 2017
Almost all primary shards on same node Elasticsearch	3	892	July 5, 2017
Is es really load balance? Elasticsearch	3	366	July 6, 2017
Primary Shards Uneven Distribution {it is a problem} Elasticsearch	1	555	November 29, 2018

All of the primary shards are on 1 node. Would that bottlenecks our writes?

Related topics