How to get index rollover last digit

Hello team ES,
In our project, we have a datename index.
Index is using a policy that rolls over daily. Sometimes, we might need to update documents in rolled over indices. For that we have a pipeline, that figures out the physical index conditionally.

Everytime an index gets rolled over, the digit gets incremented. For example,
transactions-2022-07-04-000001
transactions-2022-07-05-000002

My question is, how can we identify that last digit in our pipeline to identify the actual index.

this line - ctx['_index'] = ctx['_index'] + '-' + payloadDate +'-' + '000001';

Pipeline

PUT _ingest/pipeline/transaction_index_pipeline
{
  "description": "Adds ingest time to document + figures out the destination index if it's a dated transaction",
  "processors": [
    {
      "set": {
        "field": "_source.ingest_time",
        "value": "{{{_ingest.timestamp}}}"
      }
    },
    {
      "set": {
        "field": "_source.initial_ingest_time",
        "value": "{{{_ingest.timestamp}}}",
        "override": false
      }
    },
    {
    "script": {
      "description": "Set index based on `payload_ts` field",
       "lang": "painless",
      "source": """
        long today = new Date().getTime();
        long payloadTs = ctx['payload_ts'] ;
        
        LocalDate payloadDate = Instant.ofEpochMilli(payloadTs).atZone(ZoneId.systemDefault()).toLocalDate();
        LocalDate currentDate = Instant.ofEpochMilli(today).atZone(ZoneId.systemDefault()).toLocalDate();

        if(payloadDate.isBefore(currentDate)){
          ctx['_index'] = ctx['_index'] + '-' + payloadDate +'-' + '000001';  
        }
      """
      }
    }
  ]
}

bootstrap index

PUT <transactions-{now/d{yyyy-MM-dd}}-000001>
{
    "aliases": {
        "transactions": {
        "is_write_index": "True"
    }
    },
    "mappings": {
    "properties": {
      "executingEntityIdCode":    { "type": "keyword" },  
      "transactionReferenceNumber":  { "type": "keyword"  }, 
      "ingest_time":   { "type": "date"  }     
    }
  },
  "settings" : { 
    "index" : { 
      "lifecycle.origination_date": 1656892800000 
    } 
  }
 }

You can't. The sequence number increases monotonically irrespective of date.

If you need to update documents and know the timestamp of these it would maybe make more sense to create daily indices based on the timestamp and not use rollover. That way you would have one index per day and know exactly which index each document resides in based on timestamp alone.

Index rollover was created to deal with the scenario where large and unpredictable volumes of immutable data is indexed and you want to be able to aim for a uniform index/shard size for performance reasons. If you need to efficiently update documents or have indices that strictly cover spoecific time periods rollover may not be the right tool to use.

Thank you so much for taking time to respond to this.

I have a few follow up questions.

  1. Every time an indexing request is received, we rely on pipeline to index it to correct index. Would it not cause a performance overhead?
  2. Based on your experience, do you think it hurts to not have a write index if we're working with an alias for searches and indexing request?
  3. If we do not have a rollover, the transition to warm tier will be based on index creation date (please correct me if I am wrong), so in this case documents indexed on the last day of index transition day will also get moved to warm tier, although they were indexed just a day earlier than the hot phase for index expires.

The pipeline would do what it does now except adding the sequence number at the end so I would not expect much difference. Running a pipeline for each document adds overhead, which is why this often is done before the data is sent to Elasticsearch, e.g. in Logstash.

You could have a default pipeline defined for all indices matching the index pattern and an alias pointing to a single one of these to write to if you want. Changing the index name may result in data being rerouted within the cluster more frequently but may not be a big change compared to what you currently do.

Correct. Isn't that what happens now anyway, although based on the rollover timestamp?

Note that there always is a tradeoff, and you need to find the one that best fits your use case.

I had modified the pipeline code to calculate rollover number by counting elapsed days from the creation of bootstrap index. It would be a bit of an overkill I suppose.

 if(payloadDate.isBefore(currentDate)){
          long days = (payloadTs-params['position']) / params['days']+1;
          String rolloverSequence = String.format('%06d', new def[] {days});

          ctx['_index'] = ctx['_index'] + '-' + payloadDate +'-' + rolloverSequence;  
        }
      """,
          "params": {
            //this is the index creation date for bootstrap index (milliseconds since epoch).
            "position": 1656892800000, 
            "days": 86400000
        }

That's what makes the difference isn't it. A document is accessible in hot tier for x number of days in albeit in a rolled over index before the transition phase begins.

Thank you for giving your time to this Christian.
I shall be changing the design to create indices on our own everyday instead of relying on rollover.

Yes, it does not seem like you get any benefits from the flexibility of rollover, just need to work around the limitations. If possible, determine the index name outside Elasticsearch before you index the data rather that use an ingest pipeline.

1 Like

We have to use ingest pipeline to determine index because our upstream is kafka topic which translates to index alias in ES.

What reads from Kafka and index into Elasticsearch? I would modify that process to determine the index name based on tbe timestamp.

there's whole lot of upstream connectors, but kafka is using Elasticsearch service sink connector to push data in ES. The topic in kafka becomes the index in elasticsearch
https://docs.confluent.io/kafka-connect-elasticsearch/current/overview.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.