Getting right shards and replicas count

Hi,

I'm building a new application using Elasticsearch. This application will have to manage a relatively low number of documents (10.000 at start!), but many of these documents are quite huge and can contain multiple big PDF documents which are indexed in full text using ingest attachment plug-in.

The Elasticsearch cluster will contain three nodes at start.

So my question is just: based on these elements, how can I get the best settings for shards and replicas count?

Best regards,

Thierry

Bonjour Thierry

It's really hard to know.
In general we aim to have shards not above 20gb.
So you should try to index some of the documents and see what the size will be.

For your use case, I'd dedicate some nodes as ingest nodes.
Because those nodes might need a lot of HEAP to run attachment plugin on big files.

Hi David,
With my first 10000 documents, the index size is about 100 GB, so I would need 5 to 6 shards (with a regular increase).
But what about the number of replicas? For better search times, should I just have one replica for each shard?

Do you want a replica for backup if nothing else?

Increasing the number of replicas increases the search capacities of the cluster. As more shards will hold the data.
But if you see that the response time is good enough with one replica, don't change it.

I don't really need backup, I can reindex all the documents if required (even if it needs quite some time).
What I really need are:

  • at first, a good query response time on mixed terms + full text search
  • a "not too bad" indexing time (for which using dedicated ingest nodes with more RAM, as suggested by David, may probably be a good option)...

You have to reindex to change the number of shards for an index. I wonder how ILM would work to handle your growth. I'd be tempted to try ILM with 3 shards rollover at 20GB max_primary_shard_size. 3 shards should allocate 1 per data node until they (well the first one anyway) gets to 20Gb, then allocate another 3 shards to hold your initial 100Gb. As you grow, more shards would be created.

Elastic seems to recommend larger 50Gb shards in recent doc. I roll a lot of indices by 50G size or age (7-14 days typically), about half roll by age so I do have a lot of smaller shards. Of course my index/search pattern is completely different, everyone's is :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.