Rollup - Date Histogram aggregation


(Steven Liu) #1

Hi Guys

I have a rollup index (interval: 1d) created. But when do _rollup_search, I cannot use date histogram aggregation for larger interval (e.g. 1M, 1y). Only let me use 1d as defined in the rollup job config. Any ideas?

Cheers
Steve


(Zachary Tong) #2

Hi @Steven_Liu. Do you mean an exception is thrown, or it just doesn't show results/results are empty?

Note that there is a difference in calendar vs. fixed time (unfortunately). We're working on a PR to make this clearer in Rollup. I'm not sure if that's what you're running into, but wanted to mention.


(Steven Liu) #3

Thanks for the reply @polyfractal .

I got the exception thrown once using data histogram other than 1d:

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "failed to parse setting [source.date_histogram.interval] with value [1M] as a time value: unit is missing or unrecognized"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to parse setting [source.date_histogram.interval] with value [1M] as a time value: unit is missing or unrecognized"
    },
    "status": 400
}

I was trying to run _rollup_search and aggregate my daily rollup index into monthly. It worked only if use 'Days' (e.g. 1d or 30d) interval which is the same as in rollup group config. I read this link https://www.elastic.co/guide/en/elasticsearch/reference/current/_rollup_is_multi_interval_aware.html thought it is possible to aggregate with lower granular in 6.4??


(Zachary Tong) #4

Hm, not sure off the top of my head. You're correct, it should be possible to aggregate with larger intervals.

Could you paste the Rollup config, the query, and the exception full stack trace? I want to try and recreate on my side to make investigating easier.

Thanks!


(Steven Liu) #5

Cheers @polyfractal

My rollup config:

{
    "index_pattern": "payment_index",
    "rollup_index": "payment_rollup_index",
    "cron": "0 0/1 * 1/1 * ? *",
    "page_size" :1000,
    "groups" : {
      "date_histogram": {
        "field": "aud_datetime",
        "interval": "1d"
      },
      "terms": {
        "fields": ["cust_id", "payment_type", "payment_status"]
      }
    },
    "metrics": [
        {
            "field": "payment_amount_gbp",
            "metrics": ["min", "max", "sum","avg"]
        }
    ]
}

Search query:

{
  "size": 0,
  "aggs": {
  	"group_by_cust": {
        "terms": {
            "field": "cust_id"
        },
        "aggs": {
	        "payment": {
		    	"date_histogram": {
			    	"field": "aud_datetime",
			        "interval": "1M"
					},
					"aggs": {
				    	"total": {
				        	"sum": {
		            		"field": "payment_amount_gbp"
		        			}
				        }
		        		
		        	}
		    	}
    		}
		}
	}
}

As mentioned, I would like to create a monthly rollup index. However, I cannot use '1M' as data histogram interval. It through the same error.


(Zachary Tong) #6

:+1: roger, thanks for the config. Will poke at this on my end and get back to you.


(Zachary Tong) #7

Heya @Steven_Liu, found the issue. There is a bug in the validation code that tries to parse all request intervals as "fixed" time, which is why it fails for your calendar time 1M or month.

This was fixed in the refactor from https://github.com/elastic/elasticsearch/pull/32052. I went ahead and opened a new PR to backport the fix to 6.4.1: https://github.com/elastic/elasticsearch/pull/33284

For now, the fix is as you said: use a lower granularity value to express the month, like 30d. This is only sorta a fix though, because 30d is a "fixed" interval whereas 1M can change duration depending on which month. That's the impetus for the #32052 PR... to make the difference clear because it is confusing and trappy for users.

Even after the fix is backported and 6.4.1 released, I would encourage using only "fixed" time intervals. So instead of rolling up 1M, I would configure it to rollup 30d. You'll lose calendar aspects of variable-days-per-month, but gain a ton of querying flexibility. Calendar units are only available in "single" quantities (1d, 1w, 1M, 1q, 1Y), whereas fixed units allow multiples (5d, 100s, 2M, etc).

See the new 6.x docs here and here (under "Calendar vs Fixed Time Intervals") for some more explanation.


(Steven Liu) #8

Cheers @polyfractal. That makes a log sense now. Thanks very much


(system) closed #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.