Sharding Issue


(Abhinav Dwivedi) #1

Team

We are using the Dockerize ELK cluster and we are using logtrail pugin also in kibana.So we have 2 dashborad one is for discover and one is for logtrail

But our shard limit is set as 1250 . So not we are facing issue your shard limit increase and our logtrail is not propely working . Could you please help me to fix this


(Christian Dahlqvist) #2

How many shards do you have in the cluster? What is their average size? How many data nodes do you have in the cluster? What is their specification?


(Abhinav Dwivedi) #3

so we have define 1250 shard
and data node 4


(Christian Dahlqvist) #4

Which Elasticsearch version are you using? What is the specification of the nodes? How much heap do you have? Can you provide the output of the cluster stats API?


(Abhinav Dwivedi) #5

{
"_nodes": {
"total": 4,
"successful": 4,
"failed": 0
},
"cluster_name": "ELK",
"timestamp": 1540297696630,
"status": "green",
"indices": {
"count": 814,
"shards": {
"total": 8132,
"primaries": 4066,
"replication": 1,
"index": {
"shards": {
"min": 2,
"max": 10,
"avg": 9.99017199017199
},
"primaries": {
"min": 1,
"max": 5,
"avg": 4.995085995085995
},
"replication": {
"min": 1,
"max": 1,
"avg": 1
}
}
},
"docs": {
"count": 616442429,
"deleted": 0
},
"store": {
"size": "291.4gb",
"size_in_bytes": 312990804408,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"evictions": 0
},
"query_cache": {
"memory_size": "33.1mb",
"memory_size_in_bytes": 34745520,
"total_count": 3556933,
"hit_count": 3290542,
"miss_count": 266391,
"cache_size": 9208,
"cache_count": 9412,
"evictions": 204
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 41882,
"memory": "1.6gb",
"memory_in_bytes": 1754557450,
"terms_memory": "1.4gb",
"terms_memory_in_bytes": 1546693704,
"stored_fields_memory": "127.2mb",
"stored_fields_memory_in_bytes": 133459088,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "21.1mb",
"norms_memory_in_bytes": 22138560,
"points_memory": "13.7mb",
"points_memory_in_bytes": 14452842,
"doc_values_memory": "36mb",
"doc_values_memory_in_bytes": 37813256,
"index_writer_memory": "0b",
"index_writer_memory_in_bytes": 0,
"version_map_memory": "0b",
"version_map_memory_in_bytes": 0,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1540252810171,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 4,
"data": 4,
"coordinating_only": 0,
"master": 4,
"ingest": 4
},
"versions": [
"5.3.0"
],
"os": {
"available_processors": 32,
"allocated_processors": 32,
"names": [
{
"name": "Linux",
"count": 4
}
],
"mem": {
"total": "125.6gb",
"total_in_bytes": 134960480256,
"free": "23.8gb",
"free_in_bytes": 25566994432,
"used": "101.8gb",
"used_in_bytes": 109393485824,
"free_percent": 19,
"used_percent": 81
}
},
"process": {
"cpu": {
"percent": 0
},
"open_file_descriptors": {
"min": 4393,
"max": 5077,
"avg": 4895
}
},
"jvm": {
"max_uptime": "84.5d",
"max_uptime_in_millis": 7308066849,
"versions": [
{
"version": "1.8.0_121",
"vm_name": "OpenJDK 64-Bit Server VM",
"vm_version": "25.121-b13",
"vm_vendor": "Oracle Corporation",
"count": 4
}
],
"mem": {
"heap_used": "29.4gb",
"heap_used_in_bytes": 31670701560,
"heap_max": "47.9gb",
"heap_max_in_bytes": 51504480256
},
"threads": 414
},
"fs": {
"total": "1tb",
"total_in_bytes": 1162047315968,
"free": "774.1gb",
"free_in_bytes": 831253094400,
"available": "725gb",
"available_in_bytes": 778508849152,
"spins": "true"
},
"plugins": [],
"network_types": {
"transport_types": {
"netty4": 4
},
"http_types": {
"netty4": 4
}
}
}
}


(Abhinav Dwivedi) #6

Node is very heavy but still we are getting issue not sure why


(Christian Dahlqvist) #7

You have 8132 shards on 4 nodes with a total of just 48GB heap, which is far, far too much. Please read this blog post for some practical guidance on shard sizes and sharding, then rethink how you shard data and reduce the number of primary shards, consolidate indices and/or switch to e.g. weekly or monthly indices. You can also use the shrink index API to reduce the number of shards for existing indices.


(Abhinav Dwivedi) #8

But why we are getting this error

Error! Exception while executing search query :[illegal_argument_exception] Trying to query 1600 shards, which is over the limit of 1200. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive. It is usually a better idea to have a smaller number of larger shards. Update [action.search.shard_count.limit] to a greater value if you really want to query that many shards at the same time.


(Christian Dahlqvist) #9

There is a limit to the number of shards that can be queried in the version you are using. You are however severely oversharded and need to look at correcting that.


(Abhinav Dwivedi) #10

What is the best way to fix this issue and correct the sharding ?


(Christian Dahlqvist) #11

Did you look at the links I provided?


(Abhinav Dwivedi) #12

Yes i have gone through the above link.

Actually in our cluster we have more than one year data in ELK cluster .


(Christian Dahlqvist) #13

I would recommend doing the following:

  • First change your sharding strategy going forward to prevent it from deteriorating. Consolidate indices with similar data, switch to monthly indices and reduce the number of primary shards to 1.
  • Then start reducing the number of shards already in the cluster. If you intend to keep data for a long time, consolidate indices and reindex into monthly indices with a single primary shard using the reindex API. Just using the shrink index API will probably not bring you down far enough.

(system) #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.