Rollup - Store statistics (just a number)

Hi.

I need to store historical data from my indexes, exactly the number of request per minute that my webservices has along time. Its possible to store only this information? If i create a rollup job using only the fileds i need to extract this information, the index size becomes so big depending on the days that the job process. I only want numerical statistics. Its possible to do that?

Thanks in advance

Víctor

Rollup stores extra meta information in order to provide rollup search, if you do not need rollup search, transform might be an option for you, it only stores what you are asking for and in case you want to further compress, you can tweak the mappings to use smaller data types.

Whether rollup or transform, the reduction should mainly depend on the bucket size you choose, in your case the date histogram interval.

Hi Hendrik.

Transform is what i need! I created a transform to extract only the information i need, but when i go to Discover and use the new index i created, theres no option to filter by time. What can i do to filter by time?

Kind regards

Víctor

Hi,

that's a current limitation, see this issue. The workaround is to either create the index pattern yourself, not using the transform wizard or you delete the already existing index pattern and create a new one. The limitation in the management UI has its own issue (contains a 3rd option: manually update the index pattern).

Hope that fixes it.

Got it!

Thanks a lot Hendrik

Hi Hendrik.

One more question. If i want to filter by time in Discover, Its required in the transform to group index data by Timestamp? Is there another way to do it?

Regards

Captura

I think in the group_by it makes the most sense. You can of course have time fields in aggregations, too. E.g. a last_updated field. Still if you do not group_by time, it will not result in a time series.

I am not sure I am getting your question, can you explain what you want to do?

I´m sending all traffic from my F5 load balancer to Elastic. At the moment just for 5 services, wich are indentified by the virtual_ip field. For each service, there are to many diferent requests, wich i differentiate by the http_path field. All i want to do is a transform to store in an index, total request for an specific virtual_server, total requests for an specific http_path of that virtual_server, average response time and also if its possible, status of each request. Heres my transform code:

POST _transform/_preview
{
 "source": {
"index": "my_index",
"query": {
  "bool": {
"must": [
  {
    "match": {
      "virtual_ip": "x.x.x.x"
    }
  }
],
"filter": [
  {
    "term": {
      "http_path.keyword": "my_path/my_file.aspx"
    }
  },
  {
    "range": {
      "@timestamp": {
        "time_zone": "+02:00",
        "gte": "2020-05-07T00:00:00",
            "lte": "2020-05-08T00:00:00"
              }
            }
          }
        ]
}
}
 },
 "dest": {
   "index": "my_dest_index"
},
"pivot": {
"group_by": {
  "status.keyword": {
    "terms": {
      "field": "status.keyword"
    }
  }
},
"aggregations": {
  "response_msecs.avg": {
    "avg": {
      "field": "response_msecs"
    }
  },
  "count": { "value_count": { "field": "@timestamp" }}
 }
 }
}

I dont want to use time range because that means i will have to create an index for each day for example.

Regards

Thanks, always easier to work with examples.

Adding a date_histogram in group_by should work, however you said you want "another way". What's the problem with something like this:

"group_by": {
  "day_bucket": {
    "date_histogram": {
      "field" : "@timestamp",
      "calendar_interval" : "1d"
    }
  },
  "status.keyword": {
    "terms": {
      "field": "status.keyword"
    }
  }
}

That is what i was looking for! Thank you very much Hendrik!

Kind Regards

Víctor

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.