Elasticsearch query performance using filter query

I have an index upon which I run these two types of queries and gives me same result. I want to know which one can perform better as on production there would be millions of searches performed per day.

EDIT - Elasticsearch version is 5.6.1

Aim is to run query like:

SELECT * 
FROM MY_INDEX 
WHERE (col1 = "val1" OR col2 = "val2") 
    AND (col3 = "val3" OR col4 = "val4")

By the description of filter context in Elasticsearch guide it seems ES first filters out all records matching your criteria and then executes statements in query context by applying scores. So can anyone tell me which of the below queries can perform better? And what is the theory behind it?

This query using to should clauses within two bool queries

{
    "query" : {
        "bool" : { 
            "must" : [ 
                {
                    "bool" : {
                        "should" : [ 
                            { "match" : { "col1" : "val1" } },
                            { "match" : { "col2" : "val2" } } 
                        ]
                    } 
                },
                {
                    "bool" : {
                        "should" : [ 
                            { "match" : { "col3" : "val3" } },
                            { "match" : { "col4" : "val4" } }  
                        ]   
                    }
                } 
            ]
        }
    }
}

OR this query with a filter clause

{
    "query" : {
        "bool" : { 
            "should" : [ 
                { "match" : { "col1" : "val1" } },
                { "match" : { "col2" : "val2" } } 
            ],
            "minimum_should_match" : 1,
            "filter" : {
                "bool" : {
                    "should" : [ 
                        { "match" : { "col3" : "val3" } },
                        { "match" : { "col4" : "val4" } }   
                    ]   
                } 
            }
        }
    }
}

hey,

a filter clause does not need to execute any scoring calculation and can be cached (based on some internal heuristics of Lucene), so this should always be your option to go with.

--Alex

Hi,

How to cache that? Sorry, I am fairly new to ES and it is quite vast. Could you please explain what scoring and caching?

https://www.elastic.co/guide/en/elasticsearch/reference/6.0/query-filter-context.html
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/query-dsl-bool-query.html#_scoring_with_literal_bool_filter_literal

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.