Issues trying to search with ngram tokenizer

Hi, Recently we upgraded our version of elastic search to 7.10. Since then I've had issues querying with an ngram tokenizer from the java api. Specifically, searching for a text value only returns results if using as phrase_prefix query type. So, for example a query for 'abcd' will return results for 'abcdefg', as expected. But a search for 'bcdefg' will return no results.

I can work around it, reluctantly, with wild card queries. However I also have a need to search for non-alphanumeric characters, like -, %, / etc. I have not been able to make wild card queries work with those characters.

Analyzer

{
"max_ngram_diff": "7",
"analysis": {
"analyzer": {
"rp_analyzer": {
"type": "custom",
"tokenizer": "rp_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"rp_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol",
"custom"
],
"custom_token_chars":":+-_/%*?"
}
}
}
}

**Query **

MultiMatchQueryBuilder multiMatchQueryBuilder = new MultiMatchQueryBuilder(searchText.trim());
for (String fieldName :searchableFieldNames) {
multiMatchQueryBuilder.field(fieldName);
}
multiMatchQueryBuilder.operator(Operator.OR);
multiMatchQueryBuilder.slop(5);
multiMatchQueryBuilder.type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX);
return multiMatchQueryBuilder;

Document set up (also tried fieldType.Text)

@Document(indexName = "reviewable_product")
@Setting(settingPath = "/elasticsearch/settings/ReviewableProductAnalyzer.json")
public class ReviewableProduct
@Field(type = FieldType.Keyword, analyzer = "rp_analyzer", searchAnalyzer="rp_analyzer") private String reviewableProductId;
@Field(type=FieldType.Keyword, analyzer="rp_analyzer" , searchAnalyzer="rp_analyzer", name="productName") private String productName;

I appreciate any help. Thanks all.

Figured it out. I was not creating the index properly, so the ngrams were never built.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.