I am working on a set of changes to ElasticSearch which allow storing a filter for documents matching a query using sequence numbers. I would like to get feedback on my approach.
This is implemented with these changes:
- Add special query field type and _stored_filter meta field whose content is a query.
- When a document with _stored_filter_query is encountered another field is added which stores the sorted sequence numbers for the documents matching the query.
- Introduce stored_filter_query which finds the document with the id specified in the query and loads the sequence numbers, and uses a points intersection to find documents with the sequence numbers.
A few questions arise:
-
This approach requires that a stored filter document is stored on each shard. This can be done by client or by server. Currently, I am relying on the client to partitioning by shard id and sending a request to store the stored filter on that shard. Is there an issue in having index requests or searches return the shard id for the documents they return?
-
Are sequence numbers unique within a shard or is the pair (primary term, sequence number) the only thing guaranteed to be unique?