I need to calculate some metrics for a dashboard view of a ~30gb index.
As I understand it, the sampler aggregation can be used to perform faster calculations on a small sample of the data, but the performance I get is abysmal even for a very small sample size, which does not make sense and defeats the purpose of using the sampler aggregation.
The sampler aggregation gets the best-scoring docs. In your example request you have no query so there is no notion of "best" - it just iterates over all docs in the index hoping to find the highest scoring docs (they will all score "1" in your example).
You then filter this sample by your date range.
It would make more sense to use the search index and put your range criteria in the query part of the request. This would mean we'd only iterate over docs that match the criteria.
ELK 7.2.1
(As far as I understand, the function score is calculated for the entire index during the aggregation, which takes a long time. Sampling a few documents uniformly should be faster.)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.