I am working on a set of changes to ElasticSearch which allow storing a filter for documents matching a query using sequence numbers. I would like to get feedback on my approach.
This is implemented with these changes:
- Add special query field type and _stored_filter meta field whose content is a query.
- When a document with _stored_filter_query is encountered another field is added which stores the sorted sequence numbers for the documents matching the query.
- Introduce stored_filter_query which finds the document with the id specified in the query and loads the sequence numbers, and uses a points intersection to find documents with the sequence numbers.
A few questions arise:
This approach requires that a stored filter document is stored on each shard. This can be done by client or by server. Currently, I am relying on the client to partitioning by shard id and sending a request to store the stored filter on that shard. Is there an issue in having index requests or searches return the shard id for the documents they return?
Are sequence numbers unique within a shard or is the pair (primary term, sequence number) the only thing guaranteed to be unique?