I am quite interested to use the top_children query but I would need to get the info about the children, mainly their ids and some highlight. I know it is not implemented and I would like to provide an implementation.
First, does anybody is already working on it ?
If not, here is a suggested API.
On the query side, we would specify we want the children to be returned along with their parent data, and optionally some highlighting on the children fields.
{
"query": {
"top_children" : {
"type": "subcontent",
"query" : {
"term" : {
"name" : "bike"
}
}
"score" : "max",
"factor" : 5,
"incremental_factor" : 2
}
},
"children": { // just having this tags means we at least want the ids of the children
"size" : 2, // maximum number of children by parent to return
"full_data" : false, // if true return not just the id of the child but also its data
"highlight" : { // same syntax as for normal queries
"fields" : {
"name" : {}
}
}
}
}
Then as results we would have something like:
"hits" : {
"total" : 7,
"max_score" : 3.366573
"hits" : [
{
"_index" : "es",
"_type" : "twitter",
"_id" : 8001,
"_score" : 3.366573,
"children": {
"total" : 4,
"hits" : [ // hits on children of this current parent, ordered by score
{
"_id" : 654,
"highlight" : {
"name" : [ "my lovely bike" ]
}
},
{
"_id" : 987,
"highlight" : {
"name" : [ "my nice bike" ]
}
}
]
}
},
{
"_index" : "es",
"_type" : "twitter",
"_id" : 8004,
.... etc....
}
With this structure I guess we cover it all.
Now about the implementation.
First the information about the children needs to be kept along between the time the query is executed and the time the highlight phase happen. The simplest way of doing that seems to have the query implementation holding it. Then later on, the highlight having access to the context and thus to the query, the info on the children/parent association is accessible and ordered by score.
I am starting to think about an interface implemented by both the TopChildrenQuery and the BlockJoinQuery which would provide child info for a parent doc, ordered by score.
We would also need an implementation of FieldQuery which will only lookup to decompose the sub queries of TopChildrenQuery and BlockJoinQuery. Idem, an interface would expose on both queries an accessor to the query on the children.
About the highlighting of the children, the current code if HighlightPhase seems to fit at almost 80%. I guess a refactor is needed to extract from that code a ESHighlighter which would be used by both the parent highlight and the child one.
Since everything is done document by document, I don't see any issue regarding the sharding (I'm quite far from an expert in that area though).
I don't know yet how to properly get the data of the children, but I would probably get a lot of inspiration from the fetch phase.
Since I haven't started to code and I don't know that much elasticsearch code, at least does that makes sense ?
Nicolas