Index.sort on sequence numbers

We have a use case that can greatly benefit from the ability to have an index.sort on _seq_no meta field of ES but that's not possible it seems

As of now, that's not possible

curl -XPUT -H "Content-Type:application/json" elasticsearch.local:9200/myindex -d'{ "settings": { "index": { "sort.field": "_seq_no", "sort.order": "asc" } } }'

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "unknown index sort field:[_seq_no]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "unknown index sort field:[_seq_no]"
  },
  "status": 400
}

I thought maybe as a workaround, I could use scripted_upsert and use _seq_no from the ctx and add it into the document but ctx does not expose _seq_no

Are any of the options I have doable? I can fork off ES temporarily and make those changes!

QQ,
Do documents with a higher seq_no on the same shard become visible for search before documents with a lower seq_no?

Hey,

maybe you can explain, why you think this? Wondering if there are others ways of making this work... like adding the current time to a field of the document (no need for seq_no then).

I'd consider the sequence number an implementation detail and thus would be vary to use this anyway.

--Alex

@spinscale anything that can be monotonically increasing could work for us. afaik timestamps do not provide that property

I am trying to implement a consumer protocol to read from elasticsearch. the index is expected to be insert-only (no updates). the seq_no works as an offset for this consumer protocol

Would the Elasticsearch ID autogeneration work? You could store the _id field in an extra field in the document via a pipeline.

_id won't provide me an ordered log right? (ordering = insert order). think of this like a kafka log. the offset in kafka = seq_no in elasticsearch

IIRC this is the ID generation code in java elasticsearch/TimeBasedUUIDGenerator.java at master · elastic/elasticsearch · GitHub - not sure if that works with your workload, as it is optimized for lucene.

That said, you are probably aware that there is a commercial feature in elasticsearch doing what you need (namely cross cluster replication), and it took a significant engineering approach to get there. sequence ids played an important part in that, but were just one of the building blocks..

Yes, I'm aware of CCR & it certainly doesn't seem like an easy task whatever I'm doing :frowning:

Interesting stuff about ID generation! Thx let me take a look

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.