Index optimum size Elasticsearch Cluster

I'm currently testing to store all kind of data on an elastic cluster. I start to dimension my cluster with a minimal safest configuration 3 nodes and my index with 3 shards and 2 replicas.

It's working fine but I was wondering if it's not too much according to my data right now. I have two kind of data:

  • twitter input (from logstash) with a daily index
  • time series with a daily index

For twitter (with my keywords) one index is around :

  • 13 000 documents
  • 26mb of storage size

For time series, one index is around:

  • 300 documents
  • 500 kb of storage size

I discover on different sources that one shard should not exceeded 1M documents and 50GB, but is there a minimal set also to speed query performance. Should I can use 3 shards - 2 replicas per index even if I have not a lot of data per days ?

I would like to keep the "backup" configuration thanks to replicas, but also have the best performance in query search according to my data size. I think over-dimension will slow my search.

Having 2 replicas for this sort of data may not much a lot of sense unless it's really important.

There's a hard 2 billion doc limit due to lucene. We recommend shards are no bigger than 50GB, but you can go bigger.

1 Like

Thanks for you reply.

So 1 replica should be enough.
But according to the "index pattern", what is the best way due to amount of my data per day ?

Is it better to have one index per day even if it is small or one bigger index per month ? according to query performance.
I will do a lot of query by "value", "date range", and "keywords".

Given the data volumes you have mentioned I would recommend using monthly indices. For efficiency, you should ideally look to have an average shard size measured in gigabytes.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.