The use case is a search engine over text documents for the general public.
We were previously using a simple match
query, but recently switched to simple_query_string
in order to easily support phrase matching.
I'm finding this transition difficult to deal with:
- Fuzziness:AUTO is not supported by
simple_query_string
, but we cannot expect users to manually append~N
to each term in their query. Is it standard practice to modify the query on the server side before querying Elasticsearch? Can this be smoothly accomplished instead through character/token filters or some other analyzer? I tried playing with character filters in custom analyzers for the past couple hours, but it seems likesimple_query_string
parses the special operators before analyzing the text. The documentation seems to support that:
This query uses a simple syntax to parse and split the provided query string into terms based on special operators. The query then analyzes each term independently before returning matching documents.
- The NOT operator has a similar issue since we're using the default_operator: OR. We can't expect users to write "+-" when they want to exclude terms from their search. What is the best practice?
Alternatively, should I have stuck with the other full-text queries? Without simple_query_string
, it seems like supporting phrase matching and other common search operations would involve parsing the query on the server side, then constructing complex compound queries before sending to Elasticsearch.
I'm not opposed to constructing complex queries in order to support operations like phrase matching, term exclusion, fuzziness, etc., but I want to ask what's best practice or recommended. I'd like to avoid reinventing the wheel if Elastic/Lucene provides simpler solutions that I'm overlooking.
Thank you for your help!