As explained in the below quoted post on StackOverflow, Elasticsearch has a limit of two billion documents.
Yes there is limit to the number of docs per shard of 2 billion, which is a hard lucene limit.
There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is
2,147,483,519(= Integer.MAX_VALUE - 128) documents.
You should consider scaling horizontally.
However, the Elastic official website contains a success story about Rabobank: Enhancing the Online Banking Experience with Elasticsearch mentioning a dataset of over 23 billion transactions.
Not only is Rabobank searching faster than ever, they’re searching through more data than ever. With over 23 billion transactions spanning 80TB of data, Rabobank sees upwards of 200 events per second — over 10 million per day. And each query can span thousands of accounts, with corporate customers having over 5,000 accounts that they can now query at once. And being able to do all of this without adding any extra operations to their costly mainframes has helped save them millions of euros per year. Today, all front-end applications use Elasticsearch for search or for aggregating payment and saving transactions.
We are investigating the feasibility of the initiative using Elasticsearch as a secondary data source, side-by-side, synchronizing to the main SQL database. The scenario is OLTP (on-line transaction processing).
Considering the success story mentioned above, how can we overcome the two-billion limitation on the number of Elasticsearch records?
We highly appreciate any hints and suggestions.