I need a way to create a transform that will aggregate the average of a field but without the outliers (let's say all values that falls between 10%-90% percentiles). for example if I have the following values:
[1,2,3,4,5,6,7,8,9,10]
Unfortunately there is no out of the box solution for this.
What I can think of:
You could in addition to you existing group_by further group by histogram. In the aggregation you need to calculate an average for that bucket and you need the count of documents (value_count on one of the group_by fields).
The transform would create a document for every histogram bucket. In a 2nd pass you can query the transform dest index, using a range query to filter out the outliers and aggregate using a weighted average aggregations, this is where you need the count as weight.
The other idea: filter the outliers already in the transform or use a filter aggregation in the transform with a avg child aggregation.
To get the left and right cut off value you can use a percentiles aggregation.
To sum it up, I do not see a solution that can be easily implemented but both ideas require extra work and additional queries or at least 2 transforms.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.