Phrase Query with Prefix produces 0 results

myronuecker · March 15, 2019, 2:19pm

We are replacing a homegrown search engine written with pure Lucene (currently 4.9.1) with ElasticSearch. After adding the search analyzers we used in the old system, I'm now trying to replicate our queries. Right now I'm looking at replacing our "starts with" query with the same syntax in ElasticSearch. It appears that either query_string or simple_query_string should support the original Lucene syntax, but that doesn't seem to be the case.

I have a field that is analyzed using whitespace tokenizer and both lowercase and asciifolding filters. I'm trying to run a query that has a phrase query with a prefix.

Specifically, we are trying to find documents with one or more names, but I expect it to work in other cases as well. For example, I want to be able to find all documents with a name starting with “Smith John”, such as “Smith John”, “Smith Johnny”, “Smith John A”, etc. If a document has “Smith Barry” and “Wilson John”, but doesn’t have a version of “Smith John”, I don’t want to find that document.

This query will find all documents that have exactly "smith john" in the field "name" but no other variations.

{
"query": {
    "simple_query_string": {
        "query": "\"smith john\"",
        "fields": ["name"],
        "default_operator": "AND"
        }
    }
}

If I remove the quotes and add a prefix operator, the query will find “Smith John”, “Smith John A”, “Smith Johnny”, but also find any documents with both “Smith Barry” and “Wilson John” because it searches across instances.

{
"query": {
    "simple_query_string": {
        "query": "smith john*",
        "fields": ["name"],
        "default_operator": "AND"
        }
    }
}

The next query is what I'm trying to use, and it does work in our old system with pure Lucene. The quotes tell it to search across only one instance, and the asterisk () tells it to do a prefix query. However, in ElasticSearch that same syntax never produces any results. I’m guessing it is actually looking for “john” instead of treating the asterisk as a prefix operator.

{
"query": {
    "simple_query_string": {
        "query": "\"smith john*\"",
        "fields": ["name"],
        "default_operator": "AND"
        }
    }
}

I have tried variations of query_string as well with similar results.

I have successfully done this using "match_phrase_prefix" to search for "smith john", but that comes with its own limitations such as not allowing wildcards and needing to know or guess at a value for max_expansions. I found that if I use too small of a number I get partial results, and the documentation warns that too large of a number affects performance.

What do I need to change to get the results I want from this query? Thank you.

myronuecker · March 15, 2019, 7:55pm

I did figure out that we do have a custom implementation of Query that is the reason this works for us now. However, I would be much happier if there was an out-of-the-box way to do it in ElasticSearch.

safwan · March 15, 2019, 8:40pm

It actually depends on your mapping and the analyzer you are using. Can you please share them as well?

myronuecker · March 15, 2019, 8:46pm

The analyzer we have is:

{
	"MyAnalyzer": {
		"filter": [
			"lowercase",
			"asciifolding"
		],
		"type": "custom",
		"tokenizer": "whitespace"
	}
}

and the field mapping is:

{
	"text_field": {
		"type": "text",
		"fields": {
			"raw": {
				"type": "keyword"
			}
		},
		"analyzer": "MyAnalyzer"
	}
}

safwan · March 15, 2019, 8:54pm

I think you should use edge_ngram filter in your analyzer for indexing and run match_phrase query with standard analyzer for making queries. You can read more about it here

Regarding your second question, can you explain a bit more?

myronuecker · March 18, 2019, 5:26pm

I tried your solution using edge_ngram. What I found is that it prefixes both names. Instead of just finding "smith john" it also found "smithson john". This makes me think that match_phrase_prefix is still the better solution for us.

safwan · March 19, 2019, 2:28am

Have you used match query or match_phrase query? You should use match_phrase query instead of match query.

myronuecker · March 19, 2019, 12:50pm

My query was:

{
	"query": {
		"match_phrase": {
			"name": "smith john"
		}
	}
}

myronuecker · March 21, 2019, 8:38pm

I found a post that mentioned the "span_near" query. That actually seems to work quite well. I had to first convert the terms to lower case since that is what the analyzer is doing when it stores them. Then I had to use "span_multi" with a prefix query for the final term.

I tried using span_first, but that doesn't work if there are multiple names indexed into the same field as an array as it will only find the first name in the list.

To get a true "starts with" query I had to index a special character at the beginning of the name. In our system we index "smith john" as "^ smith john" so the prefix query will work as expected.

You end up with a query like this:

{
    "query": {
        "span_near": {
            "clauses": [
                {
                    "span_term": {
                        "name": "^"
                    }
                },
                {
                    "span_term": {
                        "name": "smith"
                    }
                },
                {
                    "span_multi": {
                        "match": {
                            "prefix": {
                                "name": "john"
                            }
                        }
                    }
                }
            ],
            "in_order": true
        }
    }
}

system · April 18, 2019, 8:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lucene syntax for phrase prefix? Elasticsearch	1	358	July 6, 2017
"phrase_prefix" not working for some prefixes Elasticsearch	2	959	July 6, 2017
Prefix query with whitespaces Elasticsearch	2	1407	November 28, 2017
How can I combine in simple_query_string prefix search and phrase search (similar to match_phrase_prefix but with other features of simple_query_string) Elasticsearch	1	97	June 18, 2024
Query DSL: how to use phrase_prefix in query string? Elasticsearch	1	609	October 30, 2017

Phrase Query with Prefix produces 0 results

Related topics