Indexing, querying and bulk updating against time-based indexes

jthoni · August 21, 2017, 11:43pm

We have a system that collects, processes and stores customer generated data. The data comes from multiple streams with varying cadences. Some are near real time, and others 24 to 48 hours delayed. We are currently storing this in one large index, but it is becoming unwieldy. I am looking into breaking this into time-based indexes based on month. I am familiar with teh documentation related to rolling over indexes and using aliases. This is very straight forward. That scenario seems very geared to logging, however, and our situation is a little different.

First, what we care about is when originating document was created by the customer (i.e. not when the processed document was indexed into ES). Our system is based on create_date, and that is what we know deterministically. If the created_date and index date fall on opposite sides of a index rollover, it is not as straight forward to grab that item. Instead of a direct link to the doc, we would have manage aliases for it (i.e. would have to look at the dates, determine if it was close to a roll over date (possibly) and include multiple indexes.

Also, there are times where we bulk update and have only a list of ids. In this scenario (i.e. the documents corresponding to those ids may be spread out over monthly indexes over up to 36 months), how can we bulk edit? Can we just use the _all?

Is there a butter way I should be approaching this?

Thanks!
~john

Christian_Dahlqvist · August 22, 2017, 5:16am

The rollover API is great for immutable data, but will make it hard to efficiently determine the index for update operations. If you have the created_date as part of the document_id, you are probably better off creating time-based indices based with the year and month in the index name. This will allow you to determine the index name based on the document_id, which will simplify bulk updates.

system · September 19, 2017, 5:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to manage rolling indexes with non-static data Elasticsearch	2	471	March 10, 2017
Time-based indices and automation in node.js Elasticsearch	9	1455	January 13, 2017
How to purge old documents or any better options to use rollover for dynamic index Elasticsearch	18	561	August 16, 2019
Updates on time based indexes Elasticsearch	3	731	May 12, 2021
Index Life Cycle Management Elasticsearch	4	111	January 23, 2024

Indexing, querying and bulk updating against time-based indexes

Related topics