Hi,
I'm actually trying to prototype some filter plugin. The objectives are :
-
Optimize access to the nested documents. Actually (in a painless script for example) the only way to process nested entities is to process the _source of the parent. I would like to measure if accessing doc values of the nested entities leads toward better perfs.
-
Make possible the access to children documents source and doc-values (which seems actually impossible in any way from eg a painless script)
In a filter plugin, we don't have access to the SearchContext, so I use low-level Lucene queries directly over the leafReader.
I'm able to query children documents. Due to the parent/child data structure, I have to loop over all the segments of the shard. Something like
SortedSetDocValues sortedSetDocValues = context.reader().getSortedSetDocValues("id");
sortedSetDocValues.advanceExact(doc);
BytesRef bytesRef = sortedSetDocValues.lookupOrd(sortedSetDocValues.nextOrd());
String id = bytesRef.utf8ToString();
BooleanQuery booleanQuery = new BooleanQuery.Builder()
.add(new TermQuery(new Term("esType", "activity")), org.apache.lucene.search.BooleanClause.Occur.FILTER)
.add(new TermQuery(new Term("esJoin#case", id)), org.apache.lucene.search.BooleanClause.Occur.FILTER)
.build();
TopDocs theChildren = searcher.search(booleanQuery, 10);
Q : are there ways to optimize this query, eg can I find a way to exploit the global-ordinal cache at this level ?
For the nested documents, I've understood that what defines a nested document is its position in the segment regarding its parent. Nested documents are all the documents with a certain _type that appears before it's parent, and after the previous parent entity.
ElasticSearch optimizes nested queries with a Lucene ToParentBlockJoinQuery query that uses a bitset referencing all parents entities in the segment.
Q : in a filter plugin, I'have only the id of the parent document that I'm actually scoring, and the leafReaderContext
Is there a good way to find the nested documents. Actually I've the feeling that the only way would be to reference the parent on the nested entity and to do a query on this reference.
(message to Adrien & Jim, if you come here : at es-on Paris, you told me that it wasn't a good idea to try to process children in a filter, but I'm stubborn and I'm not sure it wont be usefull for our usecase)