Sharding Issue

Team

We are using the Dockerize ELK cluster and we are using logtrail pugin also in kibana.So we have 2 dashborad one is for discover and one is for logtrail

But our shard limit is set as 1250 . So not we are facing issue your shard limit increase and our logtrail is not propely working . Could you please help me to fix this

How many shards do you have in the cluster? What is their average size? How many data nodes do you have in the cluster? What is their specification?

so we have define 1250 shard
and data node 4

Which Elasticsearch version are you using? What is the specification of the nodes? How much heap do you have? Can you provide the output of the cluster stats API?

{
"_nodes": {
"total": 4,
"successful": 4,
"failed": 0
},
"cluster_name": "ELK",
"timestamp": 1540297696630,
"status": "green",
"indices": {
"count": 814,
"shards": {
"total": 8132,
"primaries": 4066,
"replication": 1,
"index": {
"shards": {
"min": 2,
"max": 10,
"avg": 9.99017199017199
},
"primaries": {
"min": 1,
"max": 5,
"avg": 4.995085995085995
},
"replication": {
"min": 1,
"max": 1,
"avg": 1
}
}
},
"docs": {
"count": 616442429,
"deleted": 0
},
"store": {
"size": "291.4gb",
"size_in_bytes": 312990804408,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"evictions": 0
},
"query_cache": {
"memory_size": "33.1mb",
"memory_size_in_bytes": 34745520,
"total_count": 3556933,
"hit_count": 3290542,
"miss_count": 266391,
"cache_size": 9208,
"cache_count": 9412,
"evictions": 204
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 41882,
"memory": "1.6gb",
"memory_in_bytes": 1754557450,
"terms_memory": "1.4gb",
"terms_memory_in_bytes": 1546693704,
"stored_fields_memory": "127.2mb",
"stored_fields_memory_in_bytes": 133459088,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "21.1mb",
"norms_memory_in_bytes": 22138560,
"points_memory": "13.7mb",
"points_memory_in_bytes": 14452842,
"doc_values_memory": "36mb",
"doc_values_memory_in_bytes": 37813256,
"index_writer_memory": "0b",
"index_writer_memory_in_bytes": 0,
"version_map_memory": "0b",
"version_map_memory_in_bytes": 0,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1540252810171,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 4,
"data": 4,
"coordinating_only": 0,
"master": 4,
"ingest": 4
},
"versions": [
"5.3.0"
],
"os": {
"available_processors": 32,
"allocated_processors": 32,
"names": [
{
"name": "Linux",
"count": 4
}
],
"mem": {
"total": "125.6gb",
"total_in_bytes": 134960480256,
"free": "23.8gb",
"free_in_bytes": 25566994432,
"used": "101.8gb",
"used_in_bytes": 109393485824,
"free_percent": 19,
"used_percent": 81
}
},
"process": {
"cpu": {
"percent": 0
},
"open_file_descriptors": {
"min": 4393,
"max": 5077,
"avg": 4895
}
},
"jvm": {
"max_uptime": "84.5d",
"max_uptime_in_millis": 7308066849,
"versions": [
{
"version": "1.8.0_121",
"vm_name": "OpenJDK 64-Bit Server VM",
"vm_version": "25.121-b13",
"vm_vendor": "Oracle Corporation",
"count": 4
}
],
"mem": {
"heap_used": "29.4gb",
"heap_used_in_bytes": 31670701560,
"heap_max": "47.9gb",
"heap_max_in_bytes": 51504480256
},
"threads": 414
},
"fs": {
"total": "1tb",
"total_in_bytes": 1162047315968,
"free": "774.1gb",
"free_in_bytes": 831253094400,
"available": "725gb",
"available_in_bytes": 778508849152,
"spins": "true"
},
"plugins": [],
"network_types": {
"transport_types": {
"netty4": 4
},
"http_types": {
"netty4": 4
}
}
}
}

Node is very heavy but still we are getting issue not sure why

You have 8132 shards on 4 nodes with a total of just 48GB heap, which is far, far too much. Please read this blog post for some practical guidance on shard sizes and sharding, then rethink how you shard data and reduce the number of primary shards, consolidate indices and/or switch to e.g. weekly or monthly indices. You can also use the shrink index API to reduce the number of shards for existing indices.

But why we are getting this error

Error! Exception while executing search query :[illegal_argument_exception] Trying to query 1600 shards, which is over the limit of 1200. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive. It is usually a better idea to have a smaller number of larger shards. Update [action.search.shard_count.limit] to a greater value if you really want to query that many shards at the same time.

There is a limit to the number of shards that can be queried in the version you are using. You are however severely oversharded and need to look at correcting that.

What is the best way to fix this issue and correct the sharding ?

Did you look at the links I provided?

Yes i have gone through the above link.

Actually in our cluster we have more than one year data in ELK cluster .

I would recommend doing the following:

  • First change your sharding strategy going forward to prevent it from deteriorating. Consolidate indices with similar data, switch to monthly indices and reduce the number of primary shards to 1.
  • Then start reducing the number of shards already in the cluster. If you intend to keep data for a long time, consolidate indices and reindex into monthly indices with a single primary shard using the reindex API. Just using the shrink index API will probably not bring you down far enough.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.