All of the primary shards are on 1 node. Would that bottlenecks our writes?

Hi,

We see that almost all of the primary shards are on 1 node out of the 3 nodes. Would that bottlenecks our writes? Our ES Indexing Service has code to catch what's basically a "queue full" state on the server and resubmits rejected write requests until they actually go through. By having all primary shards on the one node, I think we do not spread the writes across the 3 nodes, and therefore the queue-able number of requests is 1/3 of what it could be.

Cluster config:

{
"persistent": {
"cluster": {
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"cluster_concurrent_rebalance": "10",
"node_concurrent_recoveries": "10",
"enable": "all"
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "200mb"
}
},
"xpack": {
"monitoring": {
"collection": {
"enabled": "true"
}
}
}
}

Do you have replicas set?

Yes 2 replicas. The 2 replicas are on the other 2 nodes. But all the primary shards are on the first node.

Are you sending all the requests to the node with all the primaries?
How did they all manage to end up on the one node? That's pretty odd, and usually only due to manual intervention.

No, we are sending it to all nodes. Any workaround to fix this?

The current status is as below:

{
  "cluster_name" : "cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 464,
  "active_shards" : 1244,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Hi,

Let me jump in here. I work with Jehu. We are trying to determine if, when writing to many indexes and many records to these indexes, will we see a degradation in performance if the vast majority of the primary shards are on one node. The code writing to the indexes is written in Java, using the RestHighLevelClient, with all 3 of the nodes' IP addresses/ports provided. In watching the log files, I am occasionally seeing a substantial number of bulk requests returned with something along the lines of a "request queue full" error, which then means the failed records need to be resubmitted until they are processed (there is a small delay before resubmitting the failed requests). My assertion is that by spreading out the primary shards across the cluster, we will get better write performance and fewer failures since there would be a better distribution of the write requests across the cluster. Am I correct in this assumption?

Thanks,
Jim

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.