Why range filter in Elasticsearch takes much more CPU then full text search?

un1t · October 21, 2015, 5:52am

I have a Filtred Query.

I first case I user full text search:

{ 'query': {'filtered': {'filter': {'bool': {'must': [{'term': {'status': 4}}]}},
                    'query': {'bool': {'should': [{'match': {'name': {'operator': 'and',
                                                                      'query': 'dog'}}},
                                                  {'match': {'description': {'boost': 0.9,
                                                                             'operator': 'and',
                                                                             'query': 'dog'}}},
                                                  {'match': {'author': {'boost': 0.8,
                                                                        'operator': 'and',
                                                                        'query': 'dog'}}},
                                                  {'match': {'tags': {'boost': 0.7,
                                                                      'operator': 'and',
                                                                      'query': 'dog'}}}]}}

In second case I use range filter:

{'query': {'filtered': {'filter': {'bool': {'must': [{'term': {'status': 4}},
                                                 {'range': {'year_to': {'lte': '1946'}}}]}}}}

I was very surprised, because second request takes 2 times more CPU than the first one;

What is going on?

My mapping:

"properties": {
"id": {
    "type": "integer"
},
"name": {
    "analyzer": "russian_morphology",
    "type": "string"
},
"description": {
    "analyzer": "russian_morphology",
    "type": "string"
},
"status": {
    "type": "integer"
},
"tags": {
    "analyzer": "russian_morphology",
    "type": "string"
},
"year_from": {
    "type": "integer"
},
"year_to": {
    "type": "integer"
}

nik9000 · October 21, 2015, 12:45pm

The first query is ultimately translated into a boolean combination of 5 term queries which can use the terms dictionary to jump directly to the documents that they need. The term queries are fast and the bool queries are fast.

The second query is ultimately translated into a term query (fast again) and a numeric range query. The numeric range query has to walk the terms dictionary to find its matches. Part of the work that it does is proportional to the number of distinct values less than or equal to 1946. I don't know if that is the term that dominates the runtime - it could be that there are lots of hits lte 1946. You could certainly try and take stack traces to figure out what exactly is up here - just spam the query with ab and then use jstack and look for stuff like TermRangeFilter (or TermRangeQuery post 2.0). But if you are just looking for an intuitive explanation of why complex looking queries are can be faster - what I wrote above might be good enough.

un1t · October 21, 2015, 1:57pm

Thanks for reply.

So seems {'range':{'lte':1946}} transforms into terms filters with about 80 values.

It is possible to rewrite second query to make it faster? I did try "numeric_range" filter, but perfomance seems the same.

nik9000 · October 21, 2015, 2:16pm

I suppose it depends on your mapping - numeric_range should automatically kick in for numbers. There is a precision_step you can play with but I don't know much about it. What kind of performance are you seeing - like how long is the query taking and how many documents is it hitting? Beyond that I'm not sure I can be much help. Usually this is where I'd break out ab and jstack to figure out what is going on.

un1t · October 21, 2015, 2:55pm

mapping for this field is integer.
Query takes 20 ms.
I have 60k documents in index. And I have 20 documents in response. Response "total" attribute 35k. Unfortunately I don't know much about java and jstack.

Topic		Replies	Views
Range Filter slower then no range query (full scan) Elasticsearch	4	1417	July 6, 2017
Elastic Search Slow Response time range filter query Elasticsearch	3	655	November 16, 2022
Very Slow query on ElasticSearch Elasticsearch	1	452	September 4, 2018
Massive perf difference with filter versus filtered query Elasticsearch	4	571	July 6, 2017
Difference between queries Elasticsearch	6	694	March 2, 2018

Why range filter in Elasticsearch takes much more CPU then full text search?

Related Topics