I'm working on an Elasticsearch implementation project and a company in which 80% of the actions are reading, and 20% writing.
The environment is configured with 5 nodes where all are master / data / coordinating. In this environment, what is the best best practice for mapping indexes?
Currently the cluster has 130GB, and each month we will have around 2.5GB added. Monthly Indicies would be the best option? Or can I opt for just one index? Is it ideal to keep 5 shards, so they are distributed across all cluster nodes?
Currently each node has 550GB of disk and 8GB is reserved for heap.
A better approach would be to do capacity planning. You can use a tool like Jmeter to do that. Start with an index having single shard, try putting in data into it also start querying the index in parallel. Once you feel that the response has degraded below a threshold, take note of the size of the index.
Now divide the (size of the total data)/(size of the index noted down). Round off to the nearest integer and this should give a good approximation of the number of shards required.
NOTE: The above will only give the number of primary shards required. Other factors like shards per node, etc. need to be decided later on.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.