Match_phrase with exact substring without space

I have a dataset like

Text: a==b==c==
Text: a==b== c==
Text: a== b==c==

And my own analyzer to create tokens for each letter

"analysis": {
	"analyzer": {
		"my_analyzer": {
			"char_filter": [
				"my_char_filter"
			],
			"tokenizer": "my_tokenizer"
		}
	},
	"char_filter": {
		"my_char_filter": {
			"pattern": "==",
			"type": "pattern_replace",
			"replacement": "==<>"
		}
	},
	"tokenizer": {
		"my_tokenizer": {
			"pattern": [
				"<>",
				" "
			],
			"type": "pattern"
		}
	}
}

When I search for a==b==c== with match_phrase (which I use because users often just search for first couple of letters), I get all 3 documents, but I want to get the exact word ("a==b==c=="), without whitespace.

This is the query I use

{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "Text": "a==b==c=="
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

How to achieve when I search for "ab" I get result containing "abcd" and not "ab c", or "a bc"

Omit " " pattern from your pattern talknizer if possible. That's why a==b== c==and a==b== c== are analyzed to the same tokens.

Analyzer API will help you to make sure texts are analyzed intendedly.

It's generating same tokens with or without " " pattern, and I can't even do that because in that case token will have empty space in the beginning after whitespace, e.g.

a==b== c==

will produce tokens: [a,b, c] instead of [a,b,c]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.