We are replacing a homegrown search engine written with pure Lucene (currently 4.9.1) with ElasticSearch. After adding the search analyzers we used in the old system, I'm now trying to replicate our queries. Right now I'm looking at replacing our "starts with" query with the same syntax in ElasticSearch. It appears that either query_string or simple_query_string should support the original Lucene syntax, but that doesn't seem to be the case.
I have a field that is analyzed using whitespace tokenizer and both lowercase and asciifolding filters. I'm trying to run a query that has a phrase query with a prefix.
Specifically, we are trying to find documents with one or more names, but I expect it to work in other cases as well. For example, I want to be able to find all documents with a name starting with “Smith John”, such as “Smith John”, “Smith Johnny”, “Smith John A”, etc. If a document has “Smith Barry” and “Wilson John”, but doesn’t have a version of “Smith John”, I don’t want to find that document.
This query will find all documents that have exactly "smith john" in the field "name" but no other variations.
{
"query": {
"simple_query_string": {
"query": "\"smith john\"",
"fields": ["name"],
"default_operator": "AND"
}
}
}
If I remove the quotes and add a prefix operator, the query will find “Smith John”, “Smith John A”, “Smith Johnny”, but also find any documents with both “Smith Barry” and “Wilson John” because it searches across instances.
{
"query": {
"simple_query_string": {
"query": "smith john*",
"fields": ["name"],
"default_operator": "AND"
}
}
}
The next query is what I'm trying to use, and it does work in our old system with pure Lucene. The quotes tell it to search across only one instance, and the asterisk () tells it to do a prefix query. However, in ElasticSearch that same syntax never produces any results. I’m guessing it is actually looking for “john” instead of treating the asterisk as a prefix operator.
{
"query": {
"simple_query_string": {
"query": "\"smith john*\"",
"fields": ["name"],
"default_operator": "AND"
}
}
}
I have tried variations of query_string as well with similar results.
I have successfully done this using "match_phrase_prefix" to search for "smith john", but that comes with its own limitations such as not allowing wildcards and needing to know or guess at a value for max_expansions. I found that if I use too small of a number I get partial results, and the documentation warns that too large of a number affects performance.
What do I need to change to get the results I want from this query? Thank you.