We have to use wildcard for data that is filtered by customer id. So data for applying wildcard is really small, about 200-300 records and should not be big deal for ES. But time it takes is about 5-10 seconds, while just filtering by customerId is less than second.
What can we do to increase performance?
I need to run query like this one: SELECT * FROM Transactions WHERE (creditCusomerId = 123 OR debitCustomerId=123) AND search_field LIKE '%FOO%'
We know wildcard is expensive feature. But requirements forces us to search for a substring. On other hand we expected that wildcard will be applied to tiny subset, about 300 docs. We tried post_filter, results were just slightly better
I used Profile API and noticed that most time is spent on build_scorer. In our case we use constant_score. Why does build_scorer take so much time, when we do not need it?
That's why I'd encourage you looking at ngrams instead. You'll pay the price at index time (disk space wise and index time) but don't pay it at search time.
ES version is 6.3.1
Yes, I tried your query - same results.
We have not experience yet with ngrams, not quite understand how it searches longer substrings than ngram length. If you know some good article, please post.
Thanks for Rescore API, will try it also
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.