Hello,
We have a complicated logic of terms
aggregation ordering and I am not able to implement using Elasticsearch.
Let's suppose that we have documents with fields (group_id, price, x,...)
. Our goal is to:
0. run a query that filters the whole documents set;
- group documents by
group_id
; - in each group fetch the document with the highest
x
value, let's call it a "group's representative"; - take that document's
price
and order all groups by thatprice
(order groups by theprice
of theirs representatives).
Those "documents" are realty adverts exported or parsed from other sites (we are a legal aggregator) and those "groups" are groups of identical adverts posted on different sites. For each "group" we choose a representative - e.g. the most descriptive advert having photos, different characteristics etc.
Step 0
was mentioned to show that our groups change dynamically and they can have different representatives depending on the query that filters the whole document set. Those queries are any filters that our users can apply to the adverts listing, e.g. room count, appartment area, etc.
We need to show documents on our website grouped by group_id
, each group's view is rendered using its current representative's data, groups are sorted by price
of theirs current representatives.
I understand that step 1
is done via terms
aggregation. Step 2
can be done using top_hits
with sorting by x
. But I have a problem with the step 3
implementation.
The first idea was to use the results of top_hits
for terms
ordering, but that is impossible (I know about {"max": {"script": "_score"}}
workaround it is mentioned further).
The second idea was to use scripted_metric
. I succeeded with calculation of the desired metric representative_price
for each bucket, but when it came to ordering terms
by it I found it to be impossible.
The third idea was to try hiding that representative's price into a document's _score
to use {"max": {"script": "_score"}}
workaround. That workaround is proposed anytime somebody asks about ordering terms
by top_hits
But while trying to design such scoring function I ended up with a conclusion that such function can not be built using only document's data without knowing bucket's characteristics. As I understand that is impossible to be done using function_score
query. I found this research about variables available in the scope of Elasticsearch scripts and gave up with the third idea.
I would be thankful if anybody will propose a workaround for this issue.