SQL histogram returns unexpected offset

I'm trying to create a Histogram query using the SQL language. My goal is to get 3 buckets covering 30 days each. The following query does work but gives me an unexpected offset:

POST _sql?format=txt
{ 
  "query": "SELECT HISTOGRAM (\"@timestamp\" , INTERVAL 30 DAY) AS date_event_startperiod, MIN(\"@timestamp\") as minTimestamp, MAX(\"@timestamp\") as maxTimestamp, COUNT (*) as event_count FROM myindex-* WHERE \"@timestamp\" BETWEEN  DATE_TRUNC('day', NOW() - INTERVAL 90 DAY) AND DATE_TRUNC('day', NOW()) GROUP BY date_event_startperiod" }

The firstbucket is indicated as "date_event_startperiod" = '2021-08-01T00:00:00.000Z' and the 2nd 30days later. However the first document in this bucket dates from 2021-08-17 (which corresponds with the " NOW()-INTERVAL 90 DAYS" ).

so this is the full table I get back

 date_event_startperiod |      minTimestamp      |      maxTimestamp      |  event_count  
------------------------+------------------------+------------------------+---------------
2021-08-01T00:00:00.000Z|2021-08-17T00:00:00.021Z|2021-08-30T23:59:59.917Z|42950380       
2021-08-31T00:00:00.000Z|2021-08-31T00:00:00.351Z|2021-09-29T23:59:59.817Z|112407968      
2021-09-30T00:00:00.000Z|2021-09-30T00:00:00.807Z|2021-10-29T23:59:59.201Z|120844970      
2021-10-30T00:00:00.000Z|2021-10-30T00:00:00.140Z|2021-11-14T23:59:59.860Z|55325122  

Any suggestions how to fix this?

This is how the date_histogram works in Elasticsearch (ES SQL uses a date_histogram aggregation for a date sql HISTOGRAM) and the value you see there is the start date of the 30 days bucket in which those documents fall in. You could use the _translate API and see what query DSL we generate and run that in ES itself.

What was your expectation regarding the output of the HISTOGRAM function?

@Andrei_Stefan thanks a lot for reacting to my post!
I was hoping (maybe not so much expecting ;)) that my WHERE clause would effect where the HISTOGRAM would start. Imho the start of the month is somewhat arbritary, why not the start of the year?
So is there a way for me to influence where the HISTOGRAM starts, so the result of my query would give me 3 buckets of each 30days " wide" which would cover exactly the last 90 days?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.