Reindexing a large collection into time based indices

yoitsro · October 2, 2016, 9:49pm

Assuming I have a 30 shard index with over 200 million documents in it and I wanted to split these out into a time based index, how would I do this without affecting response times? The other issue is storage space, but I could easily scale up the instances before reindexing.

Cheers,
Ro.

nik9000 · October 2, 2016, 10:20pm

200 million is usually fine.... Splitting it into smaller indexes will help
if you can write your queries so they only target the indexes that contain
the docs. In 5.0 we rewrite the queries on the target shards so that if an
index doesn't have any docs in the time range then it becomes a match_none
so it is cheap.

Anyway, yeah, your best bet is to reindex using the time ranges in the
filter. I'd add more space to the cluster rather than try and juggle thing,
delete-by-query isn't a good way to free space so you can't easily juggle
the free space.

yoitsro · October 2, 2016, 10:23pm

Hey Nick,

Thanks very much for this. The other issue is that the index is a live index with full read/write access across the index. How would I ensure there's no data loss? And wouldn't there be any latency increase across the cluster if I was reindexing the documents?

nik9000 · October 2, 2016, 10:37pm

We don't really have a thing for live indexes. Sadly, that is a thing
you'll have to work out.

Do you have any restrictions on your access patterns? Sometimes that helps.

You could have the index write to both, but that can be difficult
depending.

yoitsro · October 2, 2016, 10:40pm

Ahh! That would be perfect actually! I think that's possible using the system we have.

nik9000 · October 2, 2016, 10:51pm

Oh no! I mistyped. Misphoned. Something. ES doesn't have a thing to have
the write forked to two indexes. Thatd be a thing you'd have to do in your
application. Sorry!

yoitsro · October 4, 2016, 8:57am

No, that's all good actually. We can do this application side without much fuss. Thank you!

Topic		Replies	Views
How to reindex a single index into several time-indexed indexes Elasticsearch	13	813	July 17, 2019
Reindex API performance Elasticsearch	3	4495	July 5, 2017
Reindex 1 index to multiple indexes Elasticsearch	8	554	June 15, 2023
Would there be an impact / difference of Big and Small Indices? Elasticsearch	9	515	August 14, 2020
Manage old data based on time Elasticsearch	6	1528	July 5, 2017

Reindexing a large collection into time based indices

Related topics