I am looking for a feasible way to rollup data I have stored in Elasticsearch. The records I have are time series based, and can be grouped by a timestamp, host, and path of a URL. What I had in mind was a cron job that looks at all the records 1 day old, not yet merged into a granularity. It would then bulk write the new merged records into the same index, and once completed run a delete by query where there is no granularity within the specified date range. I would eventually want to configure the cron job to run at the monthly/yearly granularity as well as single document granularity once it reaches a certain age.
What I am unsure about is the strategy needed to aggregate the data. For an input where there would be millions of records to aggregate, is this something I can handle with a single elastic query that fetches the aggregations, or will I have to use something such as Hadoop/MapReduce to read and aggregate the data? Would it be better to store granularities in separate indexes to make the rollup job easier?