I'm interested in using aggregations to produce distinct keys for multiple
"term" fields and then getting a "measure" value for those keys. This can
be accomplished by "tree"-ing term aggregations together and whatever
"measure" terms are applied to the lowest sub-aggregation.
Now, when I get this data back, I just recursively flatten the results into
a single List. I then apply whatever sorting and limiting after the fact.
The bad:
This actually requires me to request for every record from ElasticSearch,
which is not ideal.
So is there a particular way to accomplish the sorting/limiting on
ElasticSearch rather than after I flatten the data? I saw the "top_hits"
aggregation, but I'm not sure how it applies...
Thoughts, anybody? I saw that you can somewhat do this with "scripts" and
letting the top aggregation encompass all term fields, but is that any more
performant?
Thoughts, anybody? I saw that you can somewhat do this with "scripts" and
letting the top aggregation encompass all term fields, but is that any more
performant?
genre = {Action, Adventure}
actor = {Tom Cruise, Jason Statham}
I'm looking for a way to get the distinct combinations of values with doc
counts, so I use a sub-aggregation:
"aggs":{
"genreAgg": {
"terms": {
"field": "genre"
},
"aggs": {
"actorAgg": {
"terms": {
"field": "actor"
},
"aggs": {
"measureAgg": { "sum": { "field" : "docCount"} }
}
}
}
}
}
When I get the data back, I flatten it into a CSV format:
Action, Tom Cruise, 50
Adventure, Tom Cruise, 40
Action, Jason Statham, 20
Adventure, Jason Statham, 40
My question is, is there a better way to do this? I'm not entirely worried
about recursively flattening the data. My point of interest is:
Performance - My top aggregation may not be the one with the lowest
cardinality, can ES handle that for me?
Sorting & Limiting - I have to fetch all the data for these fields. Say
I want to "sort by actor, limit 1". Where do you apply the sort? It can't
be on the genre field. Actor's field seems logical, but I still can't
limit the genre field at all. Fetching all the data and then flattening
works because I can sort correctly then limit.
I have seen that you can use script fields to return back single rows. But
can you sort and limit by a script field?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.