Courier Fetch: N of N shards failed in kibana

hi there.

i run elk since 6 months ago. it is being stable.

but these days i got this msg very often

courier Fetch: 23 of 420 shards failed.

could anybody explain on this msg?

i have no idea how to fix.

my scenario is that app logs streaming > logstash > aws elastic search domain > kibana ui

What is the output of the cluster stats API?

here it is @Christian_Dahlqvist

{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "891349355538:pgwdev",
"timestamp": 1521446981178,
"status": "yellow",
"indices": {
"count": 88,
"shards": {
"total": 432,
"primaries": 432,
"replication": 0,
"index": {
"shards": {
"min": 1,
"max": 5,
"avg": 4.909090909090909
},
"primaries": {
"min": 1,
"max": 5,
"avg": 4.909090909090909
},
"replication": {
"min": 0,
"max": 0,
"avg": 0
}
}
},
"docs": {
"count": 5390254,
"deleted": 1
},
"store": {
"size": "3.9gb",
"size_in_bytes": 4210638451,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "845.4kb",
"memory_size_in_bytes": 865744,
"evictions": 0
},
"query_cache": {
"memory_size": "4.4mb",
"memory_size_in_bytes": 4686202,
"total_count": 1546397,
"hit_count": 889013,
"miss_count": 657384,
"cache_size": 36817,
"cache_count": 36821,
"evictions": 4
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 2292,
"memory": "31.3mb",
"memory_in_bytes": 32861595,
"terms_memory": "27.6mb",
"terms_memory_in_bytes": 29018539,
"stored_fields_memory": "1.6mb",
"stored_fields_memory_in_bytes": 1768592,
"term_vectors_memory": "936b",
"term_vectors_memory_in_bytes": 936,
"norms_memory": "29.3kb",
"norms_memory_in_bytes": 30016,
"points_memory": "65.5kb",
"points_memory_in_bytes": 67168,
"doc_values_memory": "1.8mb",
"doc_values_memory_in_bytes": 1976344,
"index_writer_memory": "0b",
"index_writer_memory_in_bytes": 0,
"version_map_memory": "0b",
"version_map_memory_in_bytes": 0,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1520593756131,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 1,
"data": 1,
"coordinating_only": 0,
"master": 1,
"ingest": 1
},
"versions": [
"5.5.2"
],
"os": {
"available_processors": 1,
"allocated_processors": 1,
"names": [
{
"count": 1
}
],
"mem": {
"total": "1.9gb",
"total_in_bytes": 2093498368,
"free": "141.2mb",
"free_in_bytes": 148090880,
"used": "1.8gb",
"used_in_bytes": 1945407488,
"free_percent": 7,
"used_percent": 93
}
},
"process": {
"cpu": {
"percent": 2
},
"open_file_descriptors": {
"min": 1412,
"max": 1412,
"avg": 1412
}
},
"jvm": {
"max_uptime": "9.8d",
"max_uptime_in_millis": 853696503,
"mem": {
"heap_used": "434.7mb",
"heap_used_in_bytes": 455873608,
"heap_max": "1015.6mb",
"heap_max_in_bytes": 1065025536
},
"threads": 113
},
"fs": {
"total": "11.6gb",
"total_in_bytes": 12548489216,
"free": "7.6gb",
"free_in_bytes": 8254406656,
"available": "7gb",
"available_in_bytes": 7593385984
},
"network_types": {
"transport_types": {
"netty4": 1
},
"http_types": {
"filter-jetty": 1
}
}
}
}

That is a lot of shards given the amount off heap you have available in your cluster. Read this blog post for some guidance on how large you shards should be and how many you should aim to have in your cluster.

You can use the shrink index API to reduce the shard count by some margin. If you need to go further, you may need to use the reindex API to reindex your data into e.g. monthly indices instead.

if i use reindex API, there will be duplication problem??
and using monthly indices still same performance with daily indices??

Given how little data you have in the cluster, I don;'t see monthly indices getting very large, so I would expect them to perform much better. As seen in the blog post I linked to we often recommend shard sizes in the tens of GB, which you seem unlikely to reach even with monthly indices.

While reindexing is going on, you could end up with data being duplicated as the monthly index would potentially also match the index pattern. Once the reindexing has completed I would however expect the daily indices to be deleted. As you have relatively little data in your cluster I would not necessarily expect reindexing to take very long.

If this is not acceptable, you can reindex into an index that does not match the daily index pattern and then instead create an alias for the index at the time you delete the daily indices. This will reduce the amount off time duplicates can be seen in the system.

@Christian_Dahlqvist thanks for your reply and advise.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.