Most efficient way to query without a score

Hey,

I am trying to figure out what is the most efficient way to query elasticsearch without scoring. (I assume that scoring add more overhead, and no scoring make it faster).

I need to be able to say filter all that is not. (must_not)

So If I want to build a query that will result let's say with the documents that have the string "some_name" in field companyName, the creation date is after "2016-07-20" and must not have "foo" in companyName:

{
   "query": {
      "bool": {
         "filter": { 
            "bool": {
               "must": [
                  { "regexp": { "companyName": ".*some_name.*"} },
                  { "range": { "creationDate": { "from": "2016-07-20" }}}
               ], 
               "must_not":[
                     {"term": { "companyName" : "foo"}}
               ]
            }
         }
      }
   },
   "size": 150
}

Is this the most efficient way?
Do I have to use a bool query inside a filter to have the must not functionality? using constant_score is better?

I am a little confused..
Thanks!!

You don't need to wrap bool/must_not inside bool/filter as bool/must_not is already executed in a filter context. You can simply do it like this:

{
   "query": {
      "bool": {
         "filter": [
              { "regexp": { "companyName": ".*some_name.*"} },
              { "range": { "creationDate": { "from": "2016-07-20" }}}
          ], 
          "must_not":[
              {"term": { "companyName" : "foo"}}
          ]
      }
   },
   "size": 150
}

It is worth noting, though, that your regexp filter will not be very efficient. If you want to improve the speed of your query, you'll probably want to use an ngram token filter and then use a term query on companyName, that will be much more efficient.

4 Likes

Thanks!

You also want to use _doc sort order on the entire query: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html

Why would I want it? @taras

If you're concerned about cost of scoring and sorting, then you can tell elasticsearch to return documents in their natural appearance order _doc. It is simply a more efficient way to order documents. Do note that you will notice any real impact only if you're matching a large number of documents.

As Val mentioned, your main performance hit would come from regexp query. In addition to replacing it with ngram your companyName field could be possibly tokenized based on some boundary rule, or even regex. That would be more efficient than regex query and in some instances better than ngram approach since it would generate fewer terms per document.

2 Likes