Would there be an impact / difference of Big and Small Indices?

Noctis17 · July 17, 2020, 12:43am

Hello and good day!

Just like what the title says, I humbly request for you guys' expertise and a healthy, bountiful discussion regarding this....

My team has 2 indices called "web" and "socialmedia", both of them got their data / documents from a MySQL Databases that has collected millions of data since year 2015.

Now our IT Head is planning up to create micro indices that would chop or chunk these 2 BIG indices into smaller ones -- whereas we'll be dissecting each of them per months.

Here's the idea, for example, the "web" index has a whole set of year 2019 data / documents, and we're planning to chunk these data into chunks each month, from "web" index to -> "webjan2019", "webfeb2019", "webmar2019" (meaning, a lot of data with a "time" falling for a specific month would belong to a specific micro index) and so on...

Now back to the main questions:

Would there be a big impact or big improvement if we'll be performing the aforementioned idea?
Would there be heavy drawbacks by doing so?

Thank you very much in advance, I greatly appreciate your invaluable expertise!

warkolm · July 17, 2020, 12:50am

We aren't all guys

Time based indices 100% make sense for time based data, rather than a single monolithic index. You should look at using ILM to mange them as well.

There is a slight overhead with managing more shards, but that should be mitigated by more efficient searches.

The best advice is to keep shard size <50GB. If that means weekly, monthly, or even yearly, then that's fine.

Vinayak_Sapre · July 17, 2020, 2:00am

+1 to all points Mark mentioned.

Aligning indices to your data expiration period can simplify clean up. Instead of deleting documents from an index, you can drop indices.

Steve_Mushero · July 17, 2020, 2:41am

But overall what is the point, i.e. is there a problem to be solved? Millions of docs is not very large and easily purged, etc. as needed. ES has no problem also 'searching' for all the docs from 2019 so unless you have performance or other issues, why break something that's working? Worst case, break into years if you really want to purge, etc.

Noctis17 · July 17, 2020, 5:15am

hello sir, it's troublesome especially if you need to reindex a lot of documents.

I mean, for example you wanted to change a mapping of a certain field, or add a new column with analyzers, you have to reindex roughly 300 million + documents just so you can use the index again

Noctis17 · July 17, 2020, 5:15am

I mean, please take note that the 2019 documents are only my example, we actually have around 300m+ docs ever since year 2015

Steve_Mushero · July 17, 2020, 5:44am

Sure but wouldn't you do that for all docs, or only the latest ones? Reindexing is not very common for most people. Though if you do it for only some docs and fairly often, then sure, split them up and alias the whole set so you don't have to change your code, etc.

Noctis17 · July 17, 2020, 5:47am

our indices are always changing depending on what the client or the boss needs, so we're always reindexing them whenever there were changes... so yeah, it's kinda hard to reindex a one BIG index that has a lot of data since 2015...

so with your great expertise sir, is our concept would be great as an optimization or improvement of our infrastructure?

Steve_Mushero · July 17, 2020, 6:05am

Well I guess if you are always changing them, you’ll need smaller indexes and thus can break them up. Though instead of re-indexing I wonder if you can just index into other smaller temp indexes with more fields or whatever analyses you need; guess it depends a lot on how you use the data - constantly changing indexing is not very common.

system · August 14, 2020, 6:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Small vs Large indices Elasticsearch	7	7838	July 5, 2017
Reindexing a large collection into time based indices Elasticsearch	7	762	July 5, 2017
Tradeoffs for using week/month (time) based indices Elasticsearch	3	373	July 6, 2017
Splitting small amount of data over multiple vs a single index? Elasticsearch	1	522	December 13, 2017
What is better. Monthly Indices or 1 Index with more shards? Elasticsearch	5	1133	October 17, 2020

Would there be an impact / difference of Big and Small Indices?

Related topics