Performance issues with non-restrictive filters

alxmck · February 27, 2019, 9:43am

Hi there!

I've got a index with ~12 Mio documents. All the documents have the same fields. The field "availability" is defined like that in the mapping:

"availability" : {
		    "type" : "long",
		    "store" : true
		}

Now I wanted to query all documents without any filter. In the first version, I did the following query with some aggregations:

{
    "query" : {
        "bool": {
            "must": [{
                   "match_all": {}
                }, {
                    "range": {
                        "availability": {
                            "lte": 10
                        }
                    }
                }]
        }
    },
    "from": 0,
    "size": 100
}

The thing is, that every document has availability < 10. We've had this match_all and the availability < 10 in our query builder as fallback.

During performance optimization we tried to delete these two filters (which do not filter) and performed the following query with some aggregations.

{
    "from": 0,
    "size": 100
}

The aggregations have been the same in both cases. But why is it, that the first version required 3/4 times longer than the second?

Is that because elasticsearch can not shrink the documents based on the filters? Or does it take longer because elastic has to calculate the score based on these (useless) filters? Or are both wrong and there is another reason?

We've increased the performance - but it would be great to understand, why the difference is that high.

I'm happy about any answer!

Thank you.

Alex

dadoonet · February 27, 2019, 9:59am

Could you share:

The exact queries you are running
The exact and full response you are getting in both case
Your elasticsearch version
What happens when you run it with size: 0?

You can share this on gist.github.com and paste the link here.

Side question (not related), why did you set store: true on that field?

alxmck · February 27, 2019, 12:09pm

Hi David,

Thank you for your answer. Is there any general reason for performance issues with filters which do not filter at all? I'll export the queries and post them. We're using Elastic 6.6.0.

We store the value to be able to read them later again from the documents we're querying.

Thanks again,
Alex

dadoonet · February 27, 2019, 2:38pm

Is there any general reason for performance issues with filters which do not filter at all?

I don't think so unless you are filtering on tons of fields...

BTW did you try to use the profile API to see where the time is spent in both cases?

system · March 27, 2019, 2:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.