Info about aggregation and periodic tasks

I've installed ELK stack on a CentOS box recently and I'm testing what is capable of. I've read some docs, but I can't figure out how to:

  • do some advanced aggregations I'm used to in MongoDB
  • add periodic aggregation tasks to elasticsearch and then see the output data in kibana for data plot.

I'l try to explain what I want to achieve.

I have the following JSON data in ES, populated by logstash:

{ "@timestamp": "data_timestamp", "user": "DOMAINID/username", "action": "login" }

In kibana I've easily plotted some stats like: active users per hour and total users. I've read some docs, but I can't figure out how to:

  • aggregate users by domain and count unique users on that domain in a time frame
  • make a periodic task (coded in Groovy) which does some aggregation every hour and stores data in elasticsearch

For the aggregation by domain, from the docs I've seen that in kibana it's possible to make scripted fields, but they are not available in ES, so if I develop an app which connects to ES, I will not get that field. One of the options I thought about is to dynamically add a "domain" field on every incoming JSON in logstash, with the help of some ruby code, but I'm wondering if it's the best approach.

For the periodic task, I need some directions. Is there an integrated scheduling system in ES or I have to develop an application which periodically connects to ES, does the aggregation and then posts the results to ES? My goal is to have complex aggregation happening on ES level and have the output data available on kibana for plotting. The best thing will be to write some Groovy scripts which does the aggregation.

Any suggestions or links to examples are very appreciated :smile:
Thank you

For the periodic task, you may wish to investigate the Elasticsearch Watcher Plugin - that's exactly what it was designed for. It can execute complex ES logic on a schedule and one of the actions is to index additional data into ES.

As far as your approach of enriching documents with domain information, I think that's a good approach - Kibana scripted fields do not currently allow you to parse strings (unless you use Groovy, but that comes with a host of complexity), so you're better off doing that in Logstash and storing the domain as a separate field.