This script for transformation is working fine, where we want to know any other alternate method to implement the same, because it required multiple scripted fields for multiple use cases in same transform index where it may create loading issues.
I do not see anything wrong with your approach. Getting the last state is one of the top ask for transform and we might have a ootb solution for that in future. Today you need scripted_metric, we know this is complicated and fragile, but as said, there is no alternative at the moment.
Regarding your config: Are you always only interested in data points with duty == 2 (or duty>=2)? If so, I think it is better for performance, if you filter in the query, instead of filtering as part of scripted_metric. It would also be possible to put a filter aggregation right before scripted_metric, in case you do not want to filter globally.
Actually we want to have many max dates based on individual duty codes using multiple scripted_metric fields in same transform index, index filtering or filter aggregation may not suit for our case.
That means you filter out visited!=2 before the scripted_metric (and you can remove the check in the script). Note that the result would have an additional level (can be later moved using an ingest pipeline):
"visited_twice": {
"visitdate_doc": { ... }
}
This solution should work, however I can not say if/how much performance you gain.
(FWIW our benchmark tool rally has support for transform.)
we are creating few fields based on above sub-agg filter with range, where we want to create a fields based on 30 day and 45 day... avg value in transform index.
Thanks for the reply, we are checking earlier transform script on Elasticsearch 7.5. Also now i cross verified the feature with Elasticsearch 7.8 on sample ecommerce data, though transform runs but it throws null value in place of avg aggs,
You are grouping by day_of_week. So, your buckets will be the customer and each individual day of week (like SUNDAY, MONDAY, etc.).
Since you are filtering on now-7d, that means that if that particular customer say John did not place an order on a Monday in the last 7 days, that value will be null as it does not exist.
FWIW, I tried this myself, and did get some results, but yes, there are many null ones. Which makes sense, as not every customer makes an order every day of the week.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.