Index discussion - to split or not to split

Hello,
General question here, no problem to solve.
Thanks in advance for your insights!

Everywhere in the forums I see people creating documents on indexes, and those indexes are date-based.
For example:
index => "blabla-%{+YYYY.MM}"

Is there anything wrong in creating just one index for my needs and just counting on it to grow and handle data over time? Yes, the host will be busier after a while.
Is there any reason to split indexes in addition to the built-in Elastic "scale out mechanisms"?

I can add nodes and shards and resources. Why split my index? Why not?

You will eventually need to drop old data. It is recommended to delete an index, rather than trying to delete a set of documents within an index. https://www.elastic.co/guide/en/elasticsearch/guide/2.x/retiring-data.html

There are other reasons, like changing settings (you can't change the number of primary shards after the index is create) or changing mappings. Think of the index has bucket of data, and a shard is used as a unit of work (helping distribute the data among multiple nodes). If you have too many small shards, you are splitting the unit of work unnecessarily. If the shards are very large, search performance can suffer.

Additionally, splitting data by date is not necessarily the most efficient bucketing technique. You might want to check out the rollover api: https://www.elastic.co/blog/managing-time-based-indices-efficiently

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.