Getting Doc Counts using a Rollup query

I am currently using an Aggregation query to get Document Counts from a Date Histogram of 1 Minute samples. This is my current query:

scroll_query = {
"query" : {
"range": {
  "Timestamp": {
    "gte": "now-30d/d",
    "lte": "now"
  }
}
  },
  "size": 20, 
  "aggs" : {
    "resample" : {
      "date_histogram": {
        "field": "Timestamp",
        "interval": "minute"
     }
  }
  }
}

And I am able to get Document Counts for each 1 minute time window:

{'resample': {'buckets': [{'key_as_string': '2019-09-20T11:30:00.0000000Z',
    'key': 1568979000000,
    'doc_count': 677},
   {'key_as_string': '2019-09-20T11:31:00.0000000Z',
    'key': 1568979060000,
    'doc_count': 648},
   {'key_as_string': '2019-09-20T11:32:00.0000000Z',
    'key': 1568979120000,
    'doc_count': 1873}

I am trying to convert this query into a "Rollup", where I just need document counts for bucketting done on Timestamp.

I submitted this Rollup job:

rollup_payload = {
    "index_pattern": "cn_index",
    "rollup_index": "cn_rollup",
    "cron": "*/30 * * * * ?",
    "page_size" :1000,
    "groups" : {
      "date_histogram": {
        "field": "Timestamp",
        "interval": "minute"
      }
    }
}

When I run a query on this rollup, I get errors:

GET cn_rollup/_rollup_search
{
    "size" : 0
}

I tried this query using different parameters, such as changing size to 1000, but that throws a 400 Error. Is it possible to get the same results using Rollup?

Hi,

your rollup configuration misses a metric, if you use value_count on the Timestamp field you basically get the doc count you are looking for:

    "metrics": [
        {
            "field": "Timestamp",
            "metrics": ["value_count"]
        }
    ]

When using rollup_search you need to specify an aggregation: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rollup-search.html

However, your usecase is not the typical rollup usecase, actually you do not need rollup_search, as you already rollup your data in the way you need it. You can actually simply search on the index you created: "cn_rollup", you do not even need the metric I suggested above, because there is a field for the doc count: "Timestamp.date_histogram._count"

LBNL I want to mention another alternative. You might be interested in transform: https://www.elastic.co/guide/en/elasticsearch/reference/7.4/put-transform.html

I think this is more suited to your usecase: It seems you create some sort of feature index, transform is made for that usecase.

An example:

PUT _data_frame/transforms/cn_transform
{
  "source": {
    "index": [
      "cn_index"
    ]
  },
  "pivot": {
    "group_by": {
      "time_bucket": {
        "date_histogram": {
          "field": "Timestamp",
          "fixed_interval": "1m"
        }
      }
    },
    "aggregations": {
      "count": {
        "value_count": {
          "field": "Timestamp"
        }
      }
    }
  },
  "dest": {
    "index": "cn_transform"
  },
  "sync": {
    "time": {
      "field": "Timestamp",
      "delay": "60s"
    }
  }
}


POST _data_frame/transforms/cn_transform/_start

This creates a transform and starts it. The transform continuously transforms the data from source index by pivoting it according to the configuration. The results are written to the specified destination index. Apart from the count you might want to create more features. Please have a look at transform, the documentation/examples should help you.

I hope this helps!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.