We are currently using Lucene and are exploring Elasticsearch for scaling.
We have a requirement to filter queries based on doc id and the set of docs
to be filtered can be quite large e.g. out of a corpus of 10 million
documents, user can choose a set of 5 million and run a query targeting
that subset. Hence we need to pass in a set of 5 million doc ids so that
the query can run only on those rather than the full index.
I am planning to use a mapped _id field that will be set during index
mapping and then use a filtered query with IdsFilterBuilder to generate a
filtered query. The issue is that the API takes a list of strings and hence
will not scale - ideally we would like to pass in a bit set containing all
the doc ids.
We will be using the java api. What is the best way to approach this issue?
I understand that we would need to write a custom API that will accept a
bit set. If we write a plugin, can be access the internal APIs of
Elasticsearch and hence not use the SearchRequestBuilder?
Is a plugin the right approach? Any pointers as to where to start?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/663af063-525d-42f8-a2dd-a208c65a7621%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.