We had 'too many open files' error on several nodes and even though we fixed the problem, one shard fell into a sort of a limbo.
# Relevant _cluster/allocation/explain output:
node01.example.com = {"in_sync":true,"allocation_id":"xxxx1","store_exception":{"type":"file_system_exception","reason":"/var/lib/elasticsearch/nodes/0/indices/zzzzzzzid/_state: Too many open files"}}
node02.example.com = {"in_sync":false,"allocation_id":"yyyy2"}
The 'usual' POST /_cluster/reroute?retry_failed=true
API call does not allocate the shard and although the actual data -- more or less -- can be found on node01 and node02 too, it seems to differ (i.e. Lucene data, translog files, checkpoints have different file size and/or modification time).
Is there a way to force a resync without losing data? (Or w/losing as little as possible...)