How to avoid the calculation for "hits.total" in search?

CharlieChen · August 10, 2016, 3:00am

The result of search request has a value of "total", how to disable Elasticsearch to calculate the "hits.total"?
The reason is that we have native script filter which is very heavy, we need to avoid unnecessary calculation.

Result sample:
{
"took": 79,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.9581454,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "001",
"_score": 0.9581454,
"fields": {
"object_name": [
"nested group object"
]
}
}
]
}
}

Query sample:
GET /test/object/_search?pretty=true
{
"size":1,
"query": {
"bool": {
"must": [
{
"match": {
"object_name": "Nested"
}
}
],
"filter": {
"script": {
"script": "nativefilter",
"lang": "native",
"params": {
"user":"user1",
"field": "testfield"
}
}
}
}
},
"fields": ["object_name"]
}

dadoonet · August 10, 2016, 7:01am

I don't think it's possible and I don't believe it would change your response time.

CharlieChen · August 10, 2016, 7:21am

As the native script filter is quite heavy in my implementation, in most case it would increase the response time if there is no optimization to avoid the calculation.

dadoonet · August 10, 2016, 8:30am

Sure. It will increase the time but to which degree? 1ms more?

But may be you could share your Native Script so we could may be help to optimize it?

CharlieChen · August 10, 2016, 9:14am

Thank David for the follow-up, you can consider each native script filter will cost 5ms, then the hits.total will have a considerable cost.

ywelsch · August 10, 2016, 9:26am

you get the total hits essentially for free, because you will need to know which documents matched, calculate their score and then order the result by score. Even if you set "size":1 this still means that all matching documents need to be scored to determine the "best" document to return.

CharlieChen · August 10, 2016, 10:01am

Thank Yannick.

However, total hits is not for free according to my understanding and test.

Please confirm this in the following query sample, to count every single item will go through the native script filter, if the native script filter costs 5ms for each item, the cost would be high if we let Elasticsearch to do hits.total calculation.

GET /test/object/_search?pretty=true
{
  "from":0,
  "size":1,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "object_name": "object"
          }
        }
      ],
      "filter": {
        "script": {
          "script": "nativefilter",
          "lang": "native",
          "params": {
            "user":"user1",
            "field": "testfield"
          }
        }
      }
    }
  },
  "fields": ["object_name"]
}

colings86 · August 10, 2016, 11:38am

What @ywelsch says is correct. Regardless of whether you use the hits.total you will still need to run the native script on each document to determine if it matches the query just to get the top result. Elasticsearch does not give you the first document that matches your query, it attempts to give you the best document matching your query. In order to find the best matching document you need to know all the documents that match the query so keeping a running total of how many documents match the query as you go along does not affect performance significantly and as @ywelsch said you effectively get this for free.

If you are trying to improve the performance of your query I would instead try to optimise your native script so it's cost per document is reduced.

nik9000 · August 10, 2016, 11:53am

You can use terminate_after to execute the script fewer times. This might
not give you the most relevant results in general, instead giving you the
most relevant results before it terminated. It isn't always what you want,
but it might be ok for you.

CharlieChen · August 10, 2016, 3:09pm

Thank Nik, I have tried the terminate_after parameter, I assume the document set (being sent to the filter clause) has been ordered by score, is this correct? If not, how could we achieve this?

CharlieChen · August 10, 2016, 3:32pm

Thank Colin for your detailed comments.

From my understand to your comments, current procedure can be described as below:

Get the document set of the match (or other query) clause
Send the document set to native script filter for filtering
Return the result basing on the score of query clause

However I want a procedure to switch the step #2 and #3 above:

Get the document set of the match (or other query) clause
Order the result set basing on the score of query clause
Send the ordered document set to native script filter for filtering, and stop filtering when the result achieve the page size

How could ElasticSearch achieve this? If not ready, could we add this as a feature backlog for ElasticSearch 5.0?

nik9000 · August 10, 2016, 4:05pm

No it isn't. It is ordered by whatever order Lucene hits the document. I can't think of a good way to make it ordered by score either.

Ivan · August 10, 2016, 4:39pm

The query rescorer will allow you to use your native script only on the top
n documents returned by the Lucene scorer:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html

Ivan

Ivan · August 10, 2016, 4:44pm

Hit send too early. Probably not what you need, but from my understand,
native scripts are only for scoring, which is post filtering (but not post
post filtering, which is the phase where the query rescorer works).

nik9000 · August 10, 2016, 4:53pm

Yeah, it is close-ish but isn't quite right.

I wonder if post_filter could do the job here. It isn't really built for this but it might do. It is worth investigating.

Ivan · August 10, 2016, 10:09pm

What the OP wants is terminate_after during the post_filter stage. Not
supported AFAIK, but interesting nonetheless. And using a native script as
the filter as well.

CharlieChen · August 11, 2016, 9:20am

Hi Ivan, you are correct, terminate_after will firstly (maybe only? could you please confirm) affect the query, not post_filter.
However I find size can correctly control post_filter.

GET /test/object/_search?pretty=true
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "object_name": "object two"
          }
        }
      ]
    }
  },
  "post_filter": {
    "script": {
      "script": "nativefilter",
      "lang": "native",
      "params": {
        "user": "user1",
        "field": "testfield"
      }
    }
  },
  "fields": [
    "object_name"
  ]
}

CharlieChen · August 11, 2016, 4:40pm

Hi Nik, it seems post_filter can work. In 2.3, global filter is the same as post_filter, right?

nik9000 · August 11, 2016, 4:41pm

Yes.

CharlieChen · August 11, 2016, 5:23pm

Hi Ivan, rescore might not help me, the result is still returned with "_score": 0

GET /test/object/_search?pretty=true
{
  "size": 5,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "object_name": "object two"
          }
        }
      ]
    }
  },
  "rescore": {
    "window_size": 10,
    "query": {
      "score_mode": "multiply",
      "rescore_query": {
        "function_score": {
          "script_score": {
            "script": {
              "script": "nativefilterrescore",
              "lang": "native",
              "params": {
                "user": "user3",
                "field": "testfield"
              }
            }
          }
        }
      }
    }
  },
  "fields": [
    "object_name"
  ]
}

Result sample:
{
"took": 85,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 14,
"max_score": 1.691042,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "00000000000000000003",
"_score": 0.10011096,
"fields": {
"object_name": [
"object three"
]
}
},
{
"_index": "test",
"_type": "object",
"_id": "00000000000000000012",
"_score": 0,
"fields": {
"object_name": [
"multiple user group object two"
]
}
},
{
"_index": "test",
"_type": "object",
"_id": "00000000000000000001",
"_score": 0,
"fields": {
"object_name": [
"object one"
]
}
},
{
"_index": "test",
"_type": "object",
"_id": "00000000000000000002",
"_score": 0,
"fields": {
"object_name": [
"object two"
]
}
},
{
"_index": "test",
"_type": "object",
"_id": "00000000000000000004",
"_score": 0,
"fields": {
"object_name": [
"object four"
]
}
}
]
}
}

Topic		Replies	Views
How to stop score calculating? Elasticsearch	3	5495	November 20, 2018
Search query total hits not adding up when using multiple queries Elasticsearch	10	576	August 25, 2020
Filtering on Script Value Elasticsearch	3	427	July 6, 2017
What is the meaning of "hits.total" in an aggregate search query? Elasticsearch	1	449	September 2, 2021
Calculation on ES query? Elasticsearch	5	5162	September 29, 2017

How to avoid the calculation for "hits.total" in search?

Related topics