Match_phrase with exact substring without space

ansamHox · September 23, 2022, 10:07pm

I have a dataset like

Text: a==b==c==
Text: a==b== c==
Text: a== b==c==

And my own analyzer to create tokens for each letter

"analysis": {
	"analyzer": {
		"my_analyzer": {
			"char_filter": [
				"my_char_filter"
			],
			"tokenizer": "my_tokenizer"
		}
	},
	"char_filter": {
		"my_char_filter": {
			"pattern": "==",
			"type": "pattern_replace",
			"replacement": "==<>"
		}
	},
	"tokenizer": {
		"my_tokenizer": {
			"pattern": [
				"<>",
				" "
			],
			"type": "pattern"
		}
	}
}

When I search for a==b==c== with match_phrase (which I use because users often just search for first couple of letters), I get all 3 documents, but I want to get the exact word ("a==b==c=="), without whitespace.

This is the query I use

{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "Text": "a==b==c=="
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

How to achieve when I search for "ab" I get result containing "abcd" and not "ab c", or "a bc"

Tomo_M · September 24, 2022, 4:46am

Omit " " pattern from your pattern talknizer if possible. That's why a==b== c==and a==b== c== are analyzed to the same tokens.

Analyzer API will help you to make sure texts are analyzed intendedly.

ansamHox · September 24, 2022, 6:59pm

It's generating same tokens with or without " " pattern, and I can't even do that because in that case token will have empty space in the beginning after whitespace, e.g.

a==b== c==

will produce tokens: [a,b, c] instead of [a,b,c]

system · October 22, 2022, 6:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4073	January 15, 2018
Is it possible to add a Space in Regex - Elasticsearch 1.7? Elasticsearch	8	3333	July 5, 2017
Looking for a phrase tokenizer or filter like this Elastic Search	4	228	November 2, 2022
Exact Phrase Match on a not_analyzed field with a space in the phrase Elasticsearch	3	1343	July 6, 2017
Elasticsearch Query for Exact Substring Matching with Spaces Elasticsearch	8	485	June 12, 2024

Match_phrase with exact substring without space

Related Topics