While better hardware will undoubtedly help, I fully endorse (based on personal experience) finding alternatives to the wildcard
query for partial string/substring matching, which can quickly become slow and otherwise resource-intensive, especially when using leading wildcards.
Among options:
- If you need to use wildcard queries, consider using the
wildcard
type for those fields that you query (assuming you don't query the whole_source
and that you're using Elasticsearch v7.9 or later); see Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog for a nice intro. It's important to note that you need to reindex your dataset for these changes to take effect. - Use
fuzzy
queries; I have yet to use them myself, so I don't have any experience with them, but they can prove slow for large datasets, and adjusting thefuzziness
attribute value could help balance accuracy and performance. - Use
match
queries and custom analyzers to index fields as (edge-)N-grams; this is likely one of the most convenient solutions; you can use the analyze API (Test an analyzer | Elasticsearch Guide [8.15] | Elastic) to test your analyzers. You also need to reindex your dataset. Watch out for situations where your indexing results in an abundance of tokens due to the extensive character count (e.g., min_gram=2, max_gram=10); you can eventually mitigate by using n-gram with a low min_gram - max_gram range (e.g., 3 to 10 characters) or adding a search-as-you-type field.
In any case, you should thoroughly analyze the requirements to fine-tune the analyzers and queries—maybe not all fields need to be indexed, etc.
Additional resources:
Good luck!