We are running some rather large queries (thousands of terms) and they take dozens of seconds. Using the Profile API we can see scorers are built - even though we are running the large queries in a filter context.
Can someone please advise why scorers are built despite queries being run in the filter context, and what can be done to speed up the query potentially containing thousands of terms / points?
Our query (the imei and mail fields are keyword fields, source_ip is IP data type):
{
"size": "0",
"query": {
"bool": {
"must_not": [
{
"terms": {
"imei": [
(100 terms)
]
}
},
{
"terms": {
"mail": [
(100 terms)
]
}
}
],
"filter": {
"terms": {
"source_ip": [
(42000 ips)
]
}
}
}
}
}
Result of the Profile API (cropped) - note the build scorer part of the source_ip query:
{
"took": 52899,
"hits": {
"total": 178603,
"max_score": 0.0,
"hits": []
},
"profile": {
"shards": [
{...}
{
"id": "[J3q3J7lqS4K1BVgeVcttUQ][xdr20181127][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "-imei:(10036520012720226020065201601...",
"time_in_nanos": **37889706864**,
"breakdown": {
"score": 0,
"build_scorer_count": 276,
"match_count": 21506,
"create_weight": 8439,
"next_doc": 26548682,
"match": 17920605,
"create_weight_count": 1,
"next_doc_count": 21671,
"score_count": 0,
"build_scorer": **37845185684**,
"advance": 0,
"advance_count": 0
},
"children": [
{...},
{
"type": "PointInSetQuery",
"description": "source_ip:{10.0.71.61 10.0.99.20...}",
"time_in_nanos": **37372365796**,
"breakdown": {
"score": 0,
"build_scorer_count": 414,
"match_count": 0,
"create_weight": 2205,
"next_doc": 8767915,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 21671,
"score_count": 0,
"build_scorer": **37363573590**,
"advance": 0,
"advance_count": 0
}
}
]
},