Filtered wildcard query

cleaner · August 7, 2018, 7:35am

Hi,

I need your help/advice to increase performance of wildcard query. The same question was here Performance of filtered wildcard queries , but is closed now.

We have to use wildcard for data that is filtered by customer id. So data for applying wildcard is really small, about 200-300 records and should not be big deal for ES. But time it takes is about 5-10 seconds, while just filtering by customerId is less than second.

What can we do to increase performance?

I need to run query like this one:
SELECT * FROM Transactions WHERE (creditCusomerId = 123 OR debitCustomerId=123) AND search_field LIKE '%FOO%'

Here is ES query:

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "bool": {
                "should": [
                  {
                    "term": {
                      "payload.creditCustomerId": 123
                    }
                  },
                  {
                    "term": {
                      "payload.debitCustomerId": 123
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "wildcard": {
                      "search_field": "*FOO*"   //type: KEYWORD
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Thanks

dadoonet · August 7, 2018, 9:52am

1st of all: don't use wildcard queries.
Then try something more like:

GET index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "payload.creditCustomerId": 123
                }
              },
              {
                "term": {
                  "payload.debitCustomerId": 123
                }
              }
            ]
          }
        }
      ],
      "must": [
              {
                "wildcard": {
                  "search_field": "*FOO*"
                }
              }
      ]
    }
  }
}

cleaner · August 8, 2018, 7:35am

Thanks David,

We know wildcard is expensive feature. But requirements forces us to search for a substring. On other hand we expected that wildcard will be applied to tiny subset, about 300 docs. We tried post_filter, results were just slightly better

I used Profile API and noticed that most time is spent on build_scorer. In our case we use constant_score. Why does build_scorer take so much time, when we do not need it?

{
	"type": "MultiTermQueryConstantScoreWrapper",
	"description": "search_field:*FOO*",
	"time_in_nanos": 10514827001,
	"breakdown": {
		"score": 0,
		"build_scorer_count": 15,
		"match_count": 0,
		"create_weight": 1535,
		"next_doc": 0,
		"match": 0,
		"create_weight_count": 1,
		"next_doc_count": 0,
		"score_count": 0,
		"build_scorer": 10514821338,
		"advance": 4093,
		"advance_count": 19
	}
}

dadoonet · August 8, 2018, 7:42am

That's why I'd encourage you looking at ngrams instead. You'll pay the price at index time (disk space wise and index time) but don't pay it at search time.

Did you try my query proposal?

Could you also try this one?

GET index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "payload.creditCustomerId": 123
                }
              },
              {
                "term": {
                  "payload.debitCustomerId": 123
                }
              }
            ]
          }
        },
        {
          "wildcard": {
            "search_field": "*FOO*"
          }
        }
      ]
    }
  }
}

BTW which version are you using?

dadoonet · August 8, 2018, 7:44am

BTW I forgot, you can use the Rescore API to first filter by ID which should be fast then apply the wildcard in the rescore part.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html

cleaner · August 8, 2018, 8:33am

ES version is 6.3.1
Yes, I tried your query - same results.
We have not experience yet with ngrams, not quite understand how it searches longer substrings than ngram length. If you know some good article, please post.
Thanks for Rescore API, will try it also

system · September 5, 2018, 8:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using filtered query with a wildcard Elasticsearch	3	562	July 6, 2017
Wildcard query performance Elasticsearch	8	12187	July 6, 2017
Performance of filtered wildcard queries Elasticsearch	2	2735	June 29, 2018
Slow query Elasticsearch	3	305	July 6, 2017
Query response time not scaling well with large resultsets Elasticsearch	8	438	July 6, 2017

Filtered wildcard query

Related topics