Index.sort on sequence numbers

fhalde · May 31, 2021, 10:37am

We have a use case that can greatly benefit from the ability to have an index.sort on _seq_no meta field of ES but that's not possible it seems

As of now, that's not possible

curl -XPUT -H "Content-Type:application/json" elasticsearch.local:9200/myindex -d'{ "settings": { "index": { "sort.field": "_seq_no", "sort.order": "asc" } } }'

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "unknown index sort field:[_seq_no]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "unknown index sort field:[_seq_no]"
  },
  "status": 400
}

I thought maybe as a workaround, I could use scripted_upsert and use _seq_no from the ctx and add it into the document but ctx does not expose _seq_no

Are any of the options I have doable? I can fork off ES temporarily and make those changes!

QQ,
Do documents with a higher seq_no on the same shard become visible for search before documents with a lower seq_no?

spinscale · May 31, 2021, 2:30pm

Hey,

maybe you can explain, why you think this? Wondering if there are others ways of making this work... like adding the current time to a field of the document (no need for seq_no then).

I'd consider the sequence number an implementation detail and thus would be vary to use this anyway.

--Alex

fhalde · May 31, 2021, 2:35pm

@spinscale anything that can be monotonically increasing could work for us. afaik timestamps do not provide that property

I am trying to implement a consumer protocol to read from elasticsearch. the index is expected to be insert-only (no updates). the seq_no works as an offset for this consumer protocol

spinscale · May 31, 2021, 2:51pm

Would the Elasticsearch ID autogeneration work? You could store the _id field in an extra field in the document via a pipeline.

fhalde · May 31, 2021, 2:53pm

_id won't provide me an ordered log right? (ordering = insert order). think of this like a kafka log. the offset in kafka = seq_no in elasticsearch

spinscale · June 1, 2021, 8:03am

IIRC this is the ID generation code in java elasticsearch/TimeBasedUUIDGenerator.java at master · elastic/elasticsearch · GitHub - not sure if that works with your workload, as it is optimized for lucene.

That said, you are probably aware that there is a commercial feature in elasticsearch doing what you need (namely cross cluster replication), and it took a significant engineering approach to get there. sequence ids played an important part in that, but were just one of the building blocks..

fhalde · June 1, 2021, 8:41am

Yes, I'm aware of CCR & it certainly doesn't seem like an easy task whatever I'm doing

fhalde · June 1, 2021, 8:42am

Interesting stuff about ID generation! Thx let me take a look

system · June 29, 2021, 8:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic search optimistic concurrency control not working as expected Elasticsearch	4	551	March 24, 2022
Sequence numbers & indexing bottleneck Elasticsearch	2	227	February 24, 2022
Bulk Index with version or sequence number Elasticsearch	1	780	July 31, 2019
Get sequence number of document Elasticsearch	4	1536	March 1, 2018
Sort on missing field Elasticsearch	2	1849	July 6, 2017

Index.sort on sequence numbers

Related topics