Hi there
We have a problem with our ES aggregation query, it took 10-12s to execute.
so here is our cluster information
- we have 1 client node, 3 master nodes and 6 data nodes
- for data node, its 16 cores. and 64GB memory , we assigned 30gb to the heap
- for each index, we have 6 shards and 1 replica.
- Index size is around 15GB per day and 110 million records.
- maximum of segment is 1.
- ES version is 2.2, and doc_value is enabled for all fields.
- Query is across all indices , 114 shards, 2 Billion records and index size is 300GB.
here is my query
{ "size": 0, "query": { "filtered": { "filter": { "bool": { "must": [ { "terms": { "_cache": true, "kName": { "index": "client_index", "type": "client", "id": "123", "path": "cId" } } } ] } } } }, "aggs": { "dateTerms": { "terms": { "field": "date" }, "aggs": { "searchD": { "terms": { "field": "prodLog", "size": 10 } } } } } }
partial response
"took": 11570, "timed_out": false, "_shards": { "total": 114, "successful": 114, "failed": 0 }, "hits": { "total": 187772187, "max_score": 0, "hits": [ ]
so couple of questions.
- since we enabled doc_value, do we still need to assign 30gb to the heap?
- when i fire the query , i do see cpu usage reached 90% -100% for couple of seconds. does that mean CPU is the bottleneck?
- we have to do lots of terms agg against filed "prodLog" , should we disable doc_value and enable field data cache?
- is there any way to make it faster?
any comments are appreciated
Thanks
Alps