Performance issues with multiple field flt queries


(crunchy) #1

Hi,

I have an index with roughly 16 million documents.

I am performing a filtered query, which is composed of a geodist filter and
a bool query. The bool query has 3 flt_field queries joined by should
clauses. I'm getting queries that take up to 20 seconds, and was wondering
what I can do to improve the performance.

My query was based off of what was being attempted in this thread (Fuzzy
matching against multiple fields):
http://elasticsearch-users.115913.n3.nabble.com/Fuzzy-matching-against-multiple-fields-td2830262.html

Thanks.


(crunchy) #2

The mapping for my index and a sample query is provided below. I've
attempted to replace the bool query with 3 fuzzy like this field queries,
with a single fuzzy like this field on the "pretty" field, and feel like
this is where the query is slowing down.

I'd appreciate any insight into how to where I can improve the performance
of the fuzzy string matching.

Here's the mapping for my index, where place is the document type.
{'place': {'properties': {'address': {'type': 'string'},
'aliases': {'type': 'string'},
'areacode': {'type': 'string'},
'city_id': {'type': 'long'},
'collection_id': {'type': 'long'},
'country_id': {'type': 'long'},
'county_id': {'type': 'long'},
'geo_id': {'type': 'long'},
'locality_id': {'type': 'long'},
'location': {'type': 'geo_point'},
'phonenumber': {'type': 'string'},
'place_type': {'type': 'long'},
'pretty': {'type': 'string'},
'province_id': {'type': 'long'}}}}

Here's a sample query:
{'query': {'filtered': {'filter': {'and': [{'terms': {'place_type': [2]}},
{'geo_distance': {'distance':
'0.3 mi',
'location':
{'lat': 37.797435,

'lon': -121.205995}}}]},
'query': {'bool': {'minimum_number_should_match': 1,
'should':
[{'fuzzy_like_this_field': {'pretty': {'like_text': 'Thomas Auto Sales',

            'max_query_terms': 2,
                                                                        
            'prefix_length': 2}}},

{'fuzzy_like_this_field': {'phonenumber': {'like_text': '4342345432',

            'max_query_terms': 2,
                                                                        
            'prefix_length': 2}}},

{'fuzzy_like_this_field': {'address': {'like_text': '123 Water St',

            'max_query_terms': 2,
                                                                        
            'prefix_length': 2}}}]}}}}}

(crunchy) #3

The mapping for my index and a sample query is provided below.

Here's the mapping for my index, where place is the document type.

{'place': {'properties': {'address': {'type': 'string'},
'aliases': {'type': 'string'},
'areacode': {'type': 'string'},
'city_id': {'type': 'long'},
'collection_id': {'type': 'long'},
'country_id': {'type': 'long'},
'county_id': {'type': 'long'},
'geo_id': {'type': 'long'},
'locality_id': {'type': 'long'},
'location': {'type': 'geo_point'},
'phonenumber': {'type': 'string'},
'place_type': {'type': 'long'},
'pretty': {'type': 'string'},
'province_id': {'type': 'long'}}}}

Here's a sample query:

curl -XGET 'http://localhost:9200/my_places/place/_search' -d

{'query': {'filtered': {'filter': {'and': [{'terms': {'place_type': [2]}},

                                       {'geo_distance': {'distance': 

'0.3 mi',

                                                         'location': 

{'lat': 37.797435,

'lon': -121.205995}}}]},

                    'query': {'bool': {'minimum_number_should_match': 1,

                                       'should': 

[{'fuzzy_like_this_field': {'pretty': {'like_text': 'Thomas Auto Sales',

            'max_query_terms': 2,

                                                                        
            'prefix_length': 2}}},

{'fuzzy_like_this_field': {'phonenumber': {'like_text': '4342345432',

            'max_query_terms': 2,

                                                                        
            'prefix_length': 2}}},

{'fuzzy_like_this_field': {'address': {'like_text': '123 Water St',

            'max_query_terms': 2,

                                                                        
            'prefix_length': 2}}}]}}}}}

I've attempted to replace the bool query with 3 fuzzy like this field
queries, with a single fuzzy like this field on the "pretty" field, and feel
like this is where the query is slowing down.

I'd appreciate any insight into how to where I can improve the performance
of the fuzzy string matching.

Thanks.


(system) #4