In order to delete old raw data without impacting the continuous lifetime cardinality aggregation transform, I was suggested to stack 2 transforms together. The first transform aggregated raw data into hourly summaries, then the outputs get fed into the second transform that aggregated the hourly summaries into lifetime total.
I am not sure how to set this up in practice because the output of the hourly cardinality transform is just a single integer representing how many unique values are seen in each hour, rather than some kind of data structure like HLL that can be rolled up to a higher level. Is this the right way to do this or is there a different approach?
Cardinalities can not be summarized using a stacked approach. I just read my previous post again, it sounds misleading. Sorry if I have not been clear. It should say that it does not work for cardinality.
There have been ideas about making HLL a native data type in Lucene, so it can be updated.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.