Most efficient way to query without a score

tpraizler · August 8, 2016, 9:07am

Hey,

I am trying to figure out what is the most efficient way to query elasticsearch without scoring. (I assume that scoring add more overhead, and no scoring make it faster).

I need to be able to say filter all that is not. (must_not)

So If I want to build a query that will result let's say with the documents that have the string "some_name" in field companyName, the creation date is after "2016-07-20" and must not have "foo" in companyName:

{
   "query": {
      "bool": {
         "filter": { 
            "bool": {
               "must": [
                  { "regexp": { "companyName": ".*some_name.*"} },
                  { "range": { "creationDate": { "from": "2016-07-20" }}}
               ], 
               "must_not":[
                     {"term": { "companyName" : "foo"}}
               ]
            }
         }
      }
   },
   "size": 150
}

Is this the most efficient way?
Do I have to use a bool query inside a filter to have the must not functionality? using constant_score is better?

I am a little confused..
Thanks!!

val · August 8, 2016, 9:15am

You don't need to wrap bool/must_not inside bool/filter as bool/must_not is already executed in a filter context. You can simply do it like this:

{
   "query": {
      "bool": {
         "filter": [
              { "regexp": { "companyName": ".*some_name.*"} },
              { "range": { "creationDate": { "from": "2016-07-20" }}}
          ], 
          "must_not":[
              {"term": { "companyName" : "foo"}}
          ]
      }
   },
   "size": 150
}

It is worth noting, though, that your regexp filter will not be very efficient. If you want to improve the speed of your query, you'll probably want to use an ngram token filter and then use a term query on companyName, that will be much more efficient.

tpraizler · August 8, 2016, 9:32am

Thanks!

taras · August 8, 2016, 2:36pm

You also want to use _doc sort order on the entire query: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html

tpraizler · August 10, 2016, 11:56am

Why would I want it? @taras

taras · August 12, 2016, 8:57pm

If you're concerned about cost of scoring and sorting, then you can tell elasticsearch to return documents in their natural appearance order _doc. It is simply a more efficient way to order documents. Do note that you will notice any real impact only if you're matching a large number of documents.

As Val mentioned, your main performance hit would come from regexp query. In addition to replacing it with ngram your companyName field could be possibly tokenized based on some boundary rule, or even regex. That would be more efficient than regex query and in some instances better than ngram approach since it would generate fewer terms per document.

Topic		Replies	Views
"Non scoring" queries Elasticsearch	4	2681	July 5, 2017
How to stop score calculating? Elasticsearch	3	5455	November 20, 2018
Query question: must not match anything but xyz Elasticsearch	2	940	July 6, 2017
Query without a constant_score and with constant_score Elasticsearch	4	1972	February 10, 2018
Performance for Filter Context containing Query Context Elasticsearch	2	328	June 3, 2019

Most efficient way to query without a score

Related Topics