Re-index aggregation ElasticSearch

Hey,

I have an Elastic Search cluster with one index per day.
I need to be able to query all the historical data, but storing it requires lots of space.
Thus I'd like to aggregate old indices by some dimensions into time buckets (e.g. 10 mins).
In this query the minimal query resolution would equal to time bucket size, but that's fine for me.
I would also like to do that without loading the data out of Elastic Search.
There's a possibility to do this in Druid time series DB by making reindex call with rollup.
As far as I know, reindex in Elastic Search does not support aggregation queries (or at least I was unable to do that).
Does anyone know if there is and what is the way to do that in Elastic Search?
There already exists a topic with a similar title on a forum (https://discuss.elastic.co/t/historical-data-rollup/25512) but the issue there seems completely different to me.

Thanks

2 Likes

Hi @cfifua,

it's doable but requires watches from x-pack. You can try x-pack for free for 30 days but after that period your watches will not execute anymore if you don't want to buy a license. For completeness sake this should get you started:

PUT _xpack/watcher/watch/my_rollup
{
   "trigger": {
      "schedule": {
         "monthly" : { "on" : 1, "at" : "midnight" }
      }
   },
   "input": {
      "search": {
         "request": {
            "indices": ["rally-2017"],
            "types": ["metrics"],
            "body": {
               "query": {
                  "match_all": {}
               },
               "size": 0,
               "aggs": {
                  "my_metric_keys": {
                     "terms": {
                        "field": "name"
                     }
                  }
               }
            }
         }
      }
   },
   "actions": {
      "index_payload": {
         "transform": {
            "script": "return ctx.payload.aggregations.my_metric_keys"
         },
         "index": {
            "index": "reports",
            "doc_type": "type"
         }
      }
   }
}

Without x-pack I'd write a small script that executes the query and bulk-indexes the results and just schedule it with cron to run once per month.

Daniel

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.