Sometimes, shards will not start up and they are complaining about having too many documents. Each shard has > 2 billion documents:
[WARN ][cluster.action.shard ] [node] [index][shard] received shard failed for [index][shard], node[node_id], [R], s[STARTED], indexUUID [index_id], reason [engine failure, message [refresh failed][IllegalArgumentException[Too many documents, composite IndexReaders cannot exceed 2147483519]]]
This is due to the shard exceeding the Lucene limitation on max # of documents per Lucene index.
Lucene has a max # of documents per Lucene index limitation. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128). In earlier versions of Lucene, if you ingest enough documents to exceed this limit, the index can become unusable and the Elasticsearch shard cannot be started.
SOLUTION
Lucene has since added enforcement checks to prevent the index from being corrupted when it exceeds the max # of documents limit:
https://issues.apache.org/jira/browse/LUCENE-5843 (available in ES 1.4 and above)
https://issues.apache.org/jira/browse/LUCENE-6299 (will be available in the next release after 1.4.4)
Keep in mind that even with the Lucene changes above, if you do hit the limit, you will not be able to add more documents to the index (but will still be able to search it).
Until you have both fixes above, it is helpful to monitor shard sizes using the _cat/shards api to prevent them from exceeding the Lucene limit and becoming unusable.
If you see that the shards are close to hitting the limit, you can update the index settings to prevent all writes to the index (See index.blocks.write
setting). Then create a new index and start writing to it. Or reindex with a larger number of shards.