Hi Elastic,
I want to learn how Elasticsearch does the simplest single-value metrics aggregation (count, sum, avg, max, min) on large data. Does each shard aggregate first and the final node aggregates the shard-level aggregates? Or each shard sends all matched documents to one node and the node does aggregation? If there is link to the code, I can also read that. Thank you!
Here is an example query.
{
'aggs' : {
'ok_rate' : {
'avg' : {
'script' : {
'source' : 'doc.status.value == 200 ? 1 : 0'
}
}
}
},
'query': {
'bool' : {
'filter' : [{
'range' : {
'@timestamp' : {
'gte' : time,
'lte' : time,
'format' : 'epoch_second'
}
}
}]
}
},
"size" : 0
}
Hi cecoke,
No, that's not a scalable approach. Each shard returns interim aggregation state which is merged by a coordinating node to produce the final result. Example interim state for average is here: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/InternalAvg.java#L90
Thank you so much for your help, Mark!
Closing.