Does every index have its own shard?

Hi,

(Sorry if this is a double post but i could not find answer.) Does every index have its own shard or may they have common shards? We know that when we create an index, it has 5 shards by default. So when we create an index, does elasticsearch create 5 shards or just assign 5 existing shards to the new index.

Each index have its own shards. No shards are ever shared. If you have 5 shards generated by default I suspect you may be running a very old version that is EOL, so I would recommend upgrading.

Thanks for the reply Christian. I read the 5 shards info in here.

That is from an old book that covers version 2 of Elasticsearch. Some information may still apply, but it is generally badly out of date. It has since been changed.

Wow thanks for the enlightment. I could not find 8.6 version on Elasticsearch Guide search bar. I have another question. This is off topic but great issue. We have 10000 customers and we want to index their documents. We will create an index per customer and we want to scale elasticsearch if a customer uses it in large amount. How can we scale our product by per customer?

I have an idea like this => We use a unique alias per customer and assign these alias to their index names. If an index reaches large numbers, we create a new index with same alias and make the new index as write index in alias.

Christian i have one more question if you have time. Could you share a link with me about shards?( about "No shards are ever shared information") Thank you for your time.

An index has one or more shards. That has been the case since the very first versions of Elasticsearch, so in that respect the Definitive Guide is still correct. You can also find a descriptiion in the current docs.

This way of handling multi-tenancy in Elasticsearch tend tio scale badly as indices and shards are not free and come with some overhead. This also applies to aliases.

If the customers all share the same schema I would recommend grouping users into indices rather than have dedicated indices per customer.

Customers share the same schema but according to data privacy and deleting data (if a customer cancels the membership) easily seems good fit for multi-tenancy. If a customer cancels the membership, we delete the index. I am not sure which one is more sensible (multi-tenancy or grouping)

Having a large number of small indices and shards in Elasticsearch can be problematic and cause issues. There is a limit of 1000 shards per node in place to protect from this, although you can override it. The latest version of Elasticsearch handles large number of shards better than before, but I would probably still not recommend going with an index per customer unless the number of customers is quite low. If you have 10000 customers and want resiliency, you need a replica shard configured, which will result in 20000 shards.

If you are never going beyond 10000 customers you MAY make this work, but it will not scale well at all if you expect or hope the number of customers grows in the future.

1 Like

Actually 10000 is probably the top(there is no such customer right now) but we want to manage our product every scenario. But some customers may have very large datas, in this condition would not it be hard to delete the data of the customer that cancels the membership?

If we group our customers into indexes, will we add nodes into the cluster when our data will reach large numbers? (I assume that we should decide the shard number by considering that a shard shhould have between 10-50 GB)

Sorry if the questions are newbie, i research Elasticsearch just for two weeks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.