Hi all,
I have a question regarding grouping and sorting - I can give complete gist
examples, but would like to check first the overall perspective on this -
am I on the right track, do I miss something, what will be the future
direction on this.
My Usecase: I have a lot of press articles, some of them are very similar
in content. I have to provide a search interface, that groups these
duplets, while giving the user a lot of search possibilities on article
content / meta data.
I decided to create a parent child mapping (parent: group, child: article)
for this for the following reasons:
- the grouping will change over time: new articles are added constantly,
and I do not want to reindex a lot of stuff - articles have their own visibility restrictions, but should be indexed up
front - I strive for simple pagination and do not want to collect groups without
knowing how many child documents I have to fetch
My current search strategy has two stages
(1) search with a has_child query for groups
(2) resolve all children for the groups with a has_parent query
The problem is, that I need to sort the parents/groups (result of the
first has_child query) by values of the children (articles). As I
understand, this is currently not possible.
The only solution around is to wrap the has_child query with a function
score and use that score for the sorting. Something like (bold the relevant
parts):
curl -XGET 'http://localhost:9200/index/group/_search?pretty=1' -d '{
"query" : {
"has_child" : {
"query" : {
"function_score" : {
"query" : {
}
},
"functions" : [ {
"script_score" : {
"script" : "doc['article.publicationNameSort'].value"
}
} ],
"boost_mode" : "replace"
}
},
"child_type" : "article",
"score_type" : "max"
}
},
"sort" : [ {
"_score" : { }
} ]
}'
The problem in my use case is, that the sort often needs more than one
field or even several string values to sort on. (com)pressing these to a
single double is not always possible.
My questions
(A) will there be sorting support for has_child queries in the (near)
future
There are different comments on this in the community. Is this easy (as
supported by lucene) or a very high hanging fruit?
*(B) is there an other way to achieve the grouping *
The grouping could be solved by doing by hand - getting child values with a
simple query, scanning results, gathering some type of 'parent/group' field
and returning the result when enough groups have been resolved. A nightmare
regarding pagination. This looks a lot look the problems Elasticsearch
already has solved in parent-child queries / top-children query.
All other comments and suggestions are very appreciated.
Best regards, Wolfgang
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45239716-7962-4272-9d9e-1a3b811460b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.