Hello,
We want to aggregate the results based on the time, project and a particular term. Our goal is mainly to aggregate. The current call looks like
{
"query": {
"bool": {
"filter": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"from": "now-14d/d",
"include_lower": true,
"include_upper": true,
"to": null
}
}
},
{
"range": {
"timestamp": {
"from": "now-14d",
"include_lower": true,
"include_upper": true,
"to": null
}
}
},
{
"term": {
"project": "234234234"
}
}
]
}
},
"must": {
"match_all": {}
}
}
},
"aggs": {
"testaggs": {
"aggregations": {
"filters": {
"aggregations": {
"mainType": {
"aggregations": {
"timesum": {
"sum": {
"field": "nest.time"
}
},
"count": {
"sum": {
"field": "nest.e"
}
}
},
"timepercetile": {
"percentiles": {
"field": "nest.time",
"percents": [
50,
75,
90,
95,
99
]
}
},
"fetchSources": {
"top_hits": {
"_source": {
"excludes": [],
"includes": [
"nest.group",
"method"
]
},
"size": 1
}
}
},
"terms": {
"field": "nest.hash",
"size": 1000
}
}
},
"filter": {
"match_all": {}
}
}
},
"nested": {
"path": "nest"
}
}
}
}
There is a structure called "nest", which is nested and it contains, a hash
(the hash of the group), group
, time
and count c
.
The above query and aggregation - will this be cached correctly? I read that using /d
with now
will cache. Is that correct. What we are seeing is, when we have around 80 million documents (for all projects), our aggregation query is taking sometimes 4-5 seconds, but sometimes 30 or even 70 seconds. It is really random.
So I would like to make sure that my query is cached and not recalculated everytime. Also would it be better to move the filtering into the aggregation instead of in the query?