How to manage rolling indexes with non-static data

jthoni · February 10, 2017, 7:47pm

We have data that we ingest 24x7. The data is all timestamped (by CreatedDate), but we have it call going into a single index. Our shards get too big (90 gb) and we reindex (which we occasionally do anyway due to changes in our product) into a new index with a higher shard count. I would like to leverage rolling time-based indexes. This works fine if you are dealing with static log files, but not so much with our data. The CreatedDate specifies when the source doc we ingested was created, but they can continue to be updated indefinitely. Our pipeline to Elasticsearch does not know whether this is a new document, or an upsert on an existing.

My current plan is to have an index per month. We can have logic for both indexing and querying that will determine the correct index to target (or indexes if we are working with a date range). If we don't know the CreatedDate when querying, we will target an alias that will hit all indexes within our retention policy.

Does this sound like a good plan of attack?

Also, our current index has 15 shards. If I were to go to rolling indexes, I plan to reduce that to 2 shards for each index. Is there any guidance on where the sweet spot is between having many shards so things could be parallelized vs fewer shards so that all the data you are querying is located in close proximity? I know... it depends on your data and your hardware.

Thanks,
~john

warkolm · February 10, 2017, 10:33pm

Sounds sane.
I'd just index into whatever date index matches, then use the querying to make sure you get the latest event, should be simpler.

Regarding shards, you would need to test. Because you are using timebased you can easily increase/decrease each day till you find the sweet spot.

system · March 10, 2017, 10:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing, querying and bulk updating against time-based indexes Elasticsearch	2	938	September 19, 2017
Manage old data based on time Elasticsearch	6	1527	July 5, 2017
What is better. Monthly Indices or 1 Index with more shards? Elasticsearch	5	1132	October 17, 2020
Indexing by time and deleting indexes by time Elasticsearch	4	372	July 6, 2017
Elastic rollover performance thoughts Kibana	3	268	June 18, 2020

How to manage rolling indexes with non-static data

Related topics