How to index the word_delimiter itself?


(lea) #1

When analyzing alpha+beta delta, I want the outcome of tokens to be [ALPHA+BETA DELTA, ALPHABETADELTA, ALPHA, BETA, DELTA, ALPHA+, ALPHA+BETA]. My anlyzer gives me the results that I am looking for, except for [ALPHA+, ALPHA+BETA]. How can I include them?

{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "word_joiner": {
          "type": "word_delimiter",
          "catenate_all": true,
          "preserve_original": "true"
        }
      },
      "analyzer": {
        "word_join_analyzer": {
          "type": "custom",
          "filter": [
            "word_joiner",
            "uppercase"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  }
}

(Val Crettaz) #2

I'm not certain the word_delimiter token filter can do what you need. You probably need something more involved. Especially, I don't see how you could get the ALPHA+ part using word_delimiter.


(lea) #3

But how can I get the other tokens without using it?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.