I've been researching this but was hoping to understand more of the underlying reasons. The official documentation recommends increasing the open file descriptor limit to 32k or 64k. In our cluster we currently have 12,800 shards on 5 data nodes. This leads to approximately 40k file descriptors on each node. The data nodes are CentOS with 64GB of memory.
Are there reasons we don't want to have more than 64k file descriptors open on a node? Is the concern about memory needed for each descriptor or are there other resource issues?
Is the 32k or 64k limit still applicable for nodes with higher amounts of memory?
As for the number of shards, why is that an issue?
Right now our queries are normally around 1 per second but the latency is usually around 20ms. Our current indexing averages between 2,000 and 3,000 docs per second while latency is .3ms. Will the number of shards per node be an issue when we have more queries?
Currently, we have 1,043 indexes. 5 shards per index (primary and 2 replicas). We added another replica some time ago which is why it doesn't add up to 12,800 shards. There's 7TBs of data.
Basically you have over 2500 shards per node. Each shard is a lucene instance and requires a certain amount of resources to manage. That's a lot of your heap taken just to manage shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.