I have a 5 node ES (6.2.2) cluster for monitoring the performance of around 15 applications. The indices feed by logstash. I have a daily indices of each application. One of the indices (4 primary + 1 replica) which actually the biggest one always makes strange thing. One or more of its replicas shards open many files and uses a lot of disk space which ends up too many open files log enrty on one or more nodes. If I uses _cat/shards/index_name the output look like this:
index shard prirep state docs store ip node
siapm-nagiosxi-jmx-2018.09.21 2 p STARTED 4066912 2.2gb 10.240.150.16 pracp1_node2
siapm-nagiosxi-jmx-2018.09.21 2 r STARTED 4066912 1.5gb 10.240.150.15 pracp2_node4
siapm-nagiosxi-jmx-2018.09.21 1 r STARTED 4067841 1.5gb 10.240.150.16 pracp1_node2
siapm-nagiosxi-jmx-2018.09.21 1 p STARTED 4067841 2.2gb 10.240.150.17 pracp2_node3
siapm-nagiosxi-jmx-2018.09.21 3 p STARTED 4064477 2.1gb 10.240.150.15 pracp2_node4
siapm-nagiosxi-jmx-2018.09.21 3 r STARTED 4064477 1.5gb 10.240.150.14 pracp1_node1
siapm-nagiosxi-jmx-2018.09.21 0 p STARTED 4066127 2.5gb 10.240.150.14 pracp1_node1
siapm-nagiosxi-jmx-2018.09.21 0 r STARTED 4067861 15.8gb 10.240.150.17 pracp2_node3
In this case I have one strage replica but other times I have two or more. On pracp2_node3 the open_file_descriptors steadely grows. Now I have a workaround.
In case of old index I close the index and reopen it and I can see the /_nodes/stats/process API that the open_file_descriptors count normalises. In case of the actual day index I throw away the replica with the number_of_replicas: 0 setting and then bring back the replica. After all replica initialized the disk usages and the open file count normalise.