Calculate a score for every document in every bucket based on base_quality + application.quality (application.quality where application.id = id of bucket)
Get the best scoring document for every bucket
It's easy to bucket documents by application.id and to get the best quality for every bucket:
But I want is the document that creates this quality. Is that possible? What I need is something like top hits aggregation, but then with custom scoring. Maybe with a scripted metric aggregation?
Why not use the function_score query in the query section to score the document based on your criteria and then use the top_hits aggregation to get the top doc for each bucket (the top doc will have a score based on your function_score query)?
@colings86 the problem is that the score of a document can be different in every bucket where the document appears (based on the nested document that caused it to be in that bucket).
You can get a score per nested document in the query, but the combined score for the top-level document is used by the top_hits aggregation.
Ok, I had missed the nested agg in there. However, you should be able to just use the sort in the top_hits agg to order the documents by ascending quality field since the base_quality will be the same for all the documents in the same bucket?
I've looked into that. The base_quality can be different for every document, and the application.quality can be different for every nested document. They are bucketed purely on application.id. If I were able to sort them by descending quality + base_quality and then just get the first one that would be great, but I don't know how.
Also, I asumed sort was only performed on the results actually returned by top_hits, and that those were always determined by _score. If it's not, it really is almost exactly what I need, but not quite .
What I think I need is something like:
query: { match_all: {} },
aggs: {
nested1: {
nested: { path: 'applications' },
aggs: {
terms1: {
terms: { field: 'applications.id'},
aggs: { best_quality: {
scripted_metric: {
init_script: "_agg['results'] = []",
map_script: "_agg.results.add([source: _source, score: doc['quality'].value + _source.base_quality])",
# This is psuedo code, don't know if it can actually be done
reduce_script: "result = []; for (a in _aggs) { result.add(a.results.sort().first()) }; return result.sort().first()"
}
}}
}
}
}
}
But I can't figure out how to do the sorting in the reduce script.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.