Which filter query is good as per performance wise?


#1

hi, i am new to elasticsearch, trying to implement it in place of mysql full text search.

i am confused in these 3 queries as which one is better over another and why ?

i am trying to to use filter and without use filter.

$params['body']['query']['bool']['must'][]['match']['category'] = 500;

$params['body']['query']['bool']['filter']['range']['category']['gte'] = 500;
$params['body']['query']['bool']['filter']['range']['category']['lte'] = 500;

$params['body']['query']['bool']['filter']['term']['category'] = 500;

all above queries gives same results set.

1 not using any filter , first one becomes [time taken : 20]

Array
(
    [index] => dataset
    [type] => content
    [body] => Array
        (
            [query] => Array
                (
                    [bool] => Array
                        (
                            [must] => Array
                                (
                                    [0] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => mother
                                                )
                                        )
                                    [1] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [category] => 500
                                                )
                                        )
                                )
                        )
                )
            [sort] => Array
                (
                    [uploaders] => Array
                        (
                            [order] => desc
                        )
                )
        )
)

2 using range filter, second query becomes [time taken : 8]

Array
(
    [index] => dataset
    [type] => content
    [body] => Array
        (
            [query] => Array
                (
                    [bool] => Array
                        (
                            [must] => Array
                                (
                                    [0] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => mother
                                                )
                                        )
                                )
                            [filter] => Array
                                (
                                    [range] => Array
                                        (
                                            [category] => Array
                                                (
                                                    [gte] => 500
                                                    [lte] => 500
                                                )
                                        )
                                )
                        )
                )
            [sort] => Array
                (
                    [uploaders] => Array
                        (
                            [order] => desc
                        )
                )
        )
)

3 using term filter , 3rd query becomes [time taken : 21]

Array
(
    [index] => dataset
    [type] => content
    [body] => Array
        (
            [query] => Array
                (
                    [bool] => Array
                        (
                            [must] => Array
                                (
                                    [0] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => mother
                                                )
                                        )
                                )
                            [filter] => Array
                                (
                                    [term] => Array
                                        (
                                            [category] => 500
                                        )
                                )
                        )
                )
            [sort] => Array
                (
                    [uploaders] => Array
                        (
                            [order] => desc
                        )
                )
        )
)

Thanks for your time.

my mapping is like this

"mappings": {
	"content": { 
		"properties": { 
			"title":{ 
				"type":     "text",
				"fields": {
					"raw": { 
						"type":  "keyword"
					}
				}
			},
			"tags":    		{ "type": "text" },
			"category":     { "type": "short" },
			"sub_category":	{ "type": "short" },
			"size":      	{ "type": "long" },
			"uploaders":      { "type": "integer" },
			"donwloader":		{ "type": "integer" },
			"upload_date":	{
				"type":   "date",
				"format": "yyyy-MM-dd HH:mm:ss"
			},
			"uploader":{
						"type":     "text",
						"fields":	{
									"raw": { 
										"type":  "keyword"
									}
						}
			}
		}
	}
}

i have around 4.6 million documents, on dedicated server with 128 gb RAM and dual xeon cpu, only runnning elasticsearch , mysql, nginx,php


(Abdon Pijpelink) #2

It's hard to give general performance advice. It's always best to test how specific queries perform on your documents, like you have done already. But keep in mind that measuring the performance of a single query executed once may not be representative.

Having said that - let me try to give you some pointers.

In general, if you do not care about a score, using a filter is better than using a bool query's must or should clause. Elasticsearch will not have to calculate a score for filters, so that will be faster. Additionally, some filters can be cached by Elasticsearch, which can have a significant positive impact on your cluster. So, for these two reasons - I would not go with your first query.

With regard to your queries 2 and 3: the category field is mapped as type short. Do you really need that field to be a number? Are you running mathematical operations like sum or average aggregations on that field? If not, it may be better to map that field as type keyword. Then, query 3 is going to be the most performant. As it is, with category mapped as type short, query 2 may be more performant, as numeric types are optimized for range queries.

You can find more general advice in our documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html


#3

Thanks
i ended up using 1st query as there are multiple where clauses and its easier to craft query in php
i.e.
where category = 100,
where category should 100 or should 200

so i am not sure how to use filter with should match terminology .

thanks


(Abdon Pijpelink) #4

For should-like behavior (doing an OR between two values). You could go with the 3rd query, but use the terms query instead of the term query (notice the extra "s" in terms). The query would look something like this:

$params['body']['query']['bool']['filter']['terms']['category'] = [100, 200]

(I am not that familiar with PHP, but something like that)


#5

Thanks , this looks good., trying it .


#6

update :
if i use it llike this

Array
(
    [index] => dataset
    [type] => content
    [body] => Array
        (
            [query] => Array
                (
                    [bool] => Array
                        (
                            [filter] => Array
                                (
                                    [terms] => Array
                                        (
                                            [category] => Array
                                                (
                                                    [0] => 100,200,300,400,500
                                                )

                                        )

                                )

                            [must] => Array
                                (
                                    [0] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => a
                                                )

                                        )

                                )

                        )

                )

            [sort] => Array
                (
                    [uploaders] => Array
                        (
                            [order] => desc
                        )

                )

            [from] => 0
            [size] => 30
        )

)

i get no result

but if i use it like this.

Array
(
    [index] => dataset
    [type] => content
    [body] => Array
        (
            [query] => Array
                (
                    [bool] => Array
                        (
                            [filter] => Array
                                (
                                    [terms] => Array
                                        (
                                            [category] => Array
                                                (
                                                    [0] => 100
                                                    [1] => 200
                                                    [2] => 300
                                                    [3] => 400
                                                )

                                        )

                                )

                            [must] => Array
                                (
                                    [0] => Array
                                        (
                                            [match] => Array
                                                (
                                                    [title] => a
                                                )

                                        )

                                )

                        )

                )

            [sort] => Array
                (
                    [uploaders] => Array
                        (
                            [order] => desc
                        )

                )

            [from] => 0
            [size] => 30
        )

)

i am getting proper results.,

so i guess, ill have to use it like 2nd example.,

but [time took is 32 ] and i have already set category and sub_category type as keyword in mappings.

is this normal behaviour ?

thanks.


(Abdon Pijpelink) #7

Yes, looks good to me.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.