Repair index after too many open files error

We had 'too many open files' error on several nodes and even though we fixed the problem, one shard fell into a sort of a limbo.

# Relevant _cluster/allocation/explain output:
node01.example.com = {"in_sync":true,"allocation_id":"xxxx1","store_exception":{"type":"file_system_exception","reason":"/var/lib/elasticsearch/nodes/0/indices/zzzzzzzid/_state: Too many open files"}}
node02.example.com = {"in_sync":false,"allocation_id":"yyyy2"}

The 'usual' POST /_cluster/reroute?retry_failed=true API call does not allocate the shard and although the actual data -- more or less -- can be found on node01 and node02 too, it seems to differ (i.e. Lucene data, translog files, checkpoints have different file size and/or modification time).

Is there a way to force a resync without losing data? (Or w/losing as little as possible...)

A node restart (of node01) resolved the problem. It looks like the allocator was stuck because of the 'too many open files' error.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.