Pipeline aggregations on millions unique terms

Yonatan_Omer · April 25, 2019, 10:37am

Our index holds log events from 100000 clients, mixed in a flat timeline, each doc looks like:
{
"@timestamp": "...",
"eventId": "Failed to start process foo.exe",
"clientId": "f68830d5-b1bf-45d6-b54b-abbf2438b709",
}

I'm trying to write a significant terms query, which should answer:

"find unusual eventId's for all clientId's having a given error".

The missing part in my following query, is to convert the filter query, to the list of all clientId terms which contains the searched error.
Can this be done by using pipeline aggregations, given the fact that the filter query can produce thousands of unique clientID's?

Should I rather hold another index, where each doc will have all events per 'clientId' ?

GET /events/_search
{
  "size" : 0,
  "query" : {
    "terms" : { "eventId.keyword": ["Failed to start process dumper at"]
    }
  },
  "aggregations": {
    "top_unusual_errors": {
      "significant_terms": {
        "field": "eventId.keyword",
        "size" : 10
      }
    }
  }
}

Mark_Harwood · April 25, 2019, 2:29pm

Yes, that would be the more scalable approach. Example scripts and walk-through here

system · May 23, 2019, 2:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Generate Aggregation List for Large Index Elasticsearch	4	515	January 25, 2017
Pagination on unique data Elasticsearch	2	1589	July 6, 2017
ES2.0 - Pipeline Aggregation for logging user? Elasticsearch	3	1210	July 6, 2017
Derived Unique values from the unique values Elasticsearch	2	658	July 5, 2017
Unique Count in aggregations Elasticsearch	3	334	July 6, 2017

Pipeline aggregations on millions unique terms

Related topics