I've been reading alot on how many shards / replicas I need, and I'm seeing multiple opinions.
We have a 3 node cluster that mainly receives logs through logstash from multiple sources. We have around 30 different types going into 15 or so different indicies. Logstash breaks these indicies into daily indicies, which results in around 320 total indicies. Most all of them were setup from the start with 5 shards and 1 replica. This results in 2,800 shards. We have around 270 million documents that results in 590 GB of data on 3 nodes. Our cluster is constantly receiving data from log sources but doesn't receive a large amount of queries: around 12 users look at the data sparingly throughout the day.
What would be the best shard / replica setup for this type of cluster? Can I reduce the shard sizes per index to 1 per node and 1 replica (3 shards and 1 replica)? Is there a way to keep the queries fast but reduce the over data stored on disk?
Also, what is the best way to test performance if I do change shard /replicas ? How do I tell if queries are taking longer of performance is suffering?
Assuming 270 million documents is what you need to index per day, on the average, how many documents do you have per index? 270mil/15 -> roughly 18 million documents???
By default, ES is configured to create a new index with 5 shards and 1 replica and it seems that's what you have. And you said logstash creates daily indices where each index has 5 shards and 1 replica. This is too much for the amount of data you are indexing per day because each shard can hold up to 2B documents.
I suggest you do the math, adjust logstash to probably create monthly indices, adjust the number of shards and replicas per index before doing a performance test.
Sorry for the confusion: We use curator to delete the logs after 30 days and the total doc count for the cluster is 270 million. It looks like we are ingesting around 6 - 9 million events per day.
So would you suggest having 3 shards (one per node)? or 1 shard and 1 replica? I'm looking to reduce the data but still have redundancy across the 3 nodes.
Note: those 6 - 9 million events are across the 15 indicies. So each index is relatively small. Some are around 1 - 3 GB, but those are the largest daily indicies.
If you have to have 15 indices, then try each index with 1 shards and 2 replicas. If two servers go down, you index is still good as long as the 2 replicas are not in the same servers and also not on the server where the primary is.
Could you combine more types into a single index? Because you said each index is relatively small, it may be helpful to have less indexes per day, like decreasing from 15 to 5 indexes per day.
About shards and replicas, I would first go for 3 shards (because you have 3 ES nodes) and 1 replica, so each node will hold 1 primary shard and 1 replica of each index. IMO, you should be fine as long as the size of each shard is under 10 to 15 GB. If shard size goes over that threshold, consider increasing to 6 shards (again, with 3 nodes, you may want to use either 3, 6, or 9 shards so that each node holds the same number of shards.) You nodes' hardware also affects how you are going to allocate shards.
Having an additional replica is more expensive in term of indexing and storage, so find out a good reason you want to do it.
If you are using Kibana visualization, each visualization has a Statistics portion that shows query and request duration. You can somewhat get an idea of how fast your queries are.
What I do is to create a dashboard full of visualization that I need for normal usage and see how long it would take for Kibana to render the dashboard. For example, request during for the last 24 hour data that is under 5000 ms is acceptable for me.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.