Appropriate Number of Indices/Shards for My Cluster

I'm building an ES system to back my webapp, which is designed to support multiple data types from many different sources. The cluster I've built has 3 nodes and about 514 shards, which currently support about 200 separate indices.

My question is: Am I approaching a practical limit for the number of indices I can reasonably support while still maintaining decent performance? I'm debating moving all of our data into a single huge index, but that presents its own complications with field conflicts, etc. I've included my cluster statistics for more information:

  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "XXXX",
  "cluster_uuid" : "XXXX",
  "timestamp" : 1571679938040,
  "status" : "green",
  "indices" : {
    "count" : 208,
    "shards" : {
      "total" : 514,
      "primaries" : 257,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 60,
          "avg" : 2.4711538461538463
        },
        "primaries" : {
          "min" : 1,
          "max" : 30,
          "avg" : 1.2355769230769231
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 1275422466,
      "deleted" : 8272996
    },
    "store" : {
      "size_in_bytes" : 847769442547
    },
    "fielddata" : {
      "memory_size_in_bytes" : 736188,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 11275993,
      "total_count" : 2214100,
      "hit_count" : 991,
      "miss_count" : 2213109,
      "cache_size" : 656,
      "cache_count" : 656,
      "evictions" : 0
    },
    "completion" : {
      "size_in_bytes" : 21634393
    },
    "segments" : {
      "count" : 3927,
      "memory_in_bytes" : 1693171610,
      "terms_memory_in_bytes" : 1290343398,
      "stored_fields_memory_in_bytes" : 317444248,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 11647744,
      "points_memory_in_bytes" : 63844780,
      "doc_values_memory_in_bytes" : 9891440,
      "index_writer_memory_in_bytes" : 196920116,
      "version_map_memory_in_bytes" : 11946957,
      "fixed_bit_set_memory_in_bytes" : 2571384,
      "max_unsafe_auto_id_timestamp" : 1571652126087,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "data" : 3,
      "coordinating_only" : 0,
      "master" : 3,
      "ingest" : 3
    },
    "versions" : [
      "7.2.0"
    ],
    "os" : {
      "available_processors" : 96,
      "allocated_processors" : 96,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 3
        }
      ],
      "mem" : {
        "total_in_bytes" : 803789303808,
        "free_in_bytes" : 675442356224,
        "used_in_bytes" : 128346947584,
        "free_percent" : 84,
        "used_percent" : 16
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 7100,
        "max" : 7356,
        "avg" : 7219
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 27873044,
      "versions" : [
        {
          "version" : "12.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "12.0.1+12",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 11099536424,
        "heap_max_in_bytes" : 96034947072
      },
      "threads" : 800
    },
    "fs" : {
      "total_in_bytes" : 1610578071552,
      "free_in_bytes" : 747566669824,
      "available_in_bytes" : 747566669824
    },
    "plugins" : [
      {
        "name" : "geopoint-clustering-aggregation",
        "version" : "7.2.0.0",
        "elasticsearch_version" : "7.2.0",
        "java_version" : "1.8",
        "description" : "Aggregate clusters from geohash grid aggregation",
        "classname" : "com.opendatasoft.elasticsearch.plugin.GeoPointClusteringAggregationPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 3
      },
      "http_types" : {
        "security4" : 3
      }
    },
    "discovery_types" : {
      "zen" : 3
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 3
      }
    ]
  }
}```

The ideal number of shards is not a black or white thing, but you are approaching the upper end of the number of shards that you'd typically want to have per node. You have about 850GB of data, spread out over 514 shards, resulting in on average 1-2 GB of data per shard. A rule of thumb is that a shard can typically hold at least tens of GBs of data, so a factor of 10 less shards could be more optimal. This blog post has some details and good pointers on optimizing the number of shards.

Why do you have 200 indexes? Do you have 200 different datatypes? If not, it could be good to consolidate documents with the same data schema (mapping) into one index. Also, it may be good to create indexes with one shard only (which you seem to already be doing for most indexes?).

Finally, if you want to find out what the optimal number of shards is for your hardware, your documents, and your use case, you'd have to do a capacity planning exercise. These two videos on our website will walk you through that process:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.