How to index the word_delimiter itself?

lea · January 9, 2018, 3:27pm

When analyzing alpha+beta delta, I want the outcome of tokens to be [ALPHA+BETA DELTA, ALPHABETADELTA, ALPHA, BETA, DELTA, ALPHA+, ALPHA+BETA]. My anlyzer gives me the results that I am looking for, except for [ALPHA+, ALPHA+BETA]. How can I include them?

{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "word_joiner": {
          "type": "word_delimiter",
          "catenate_all": true,
          "preserve_original": "true"
        }
      },
      "analyzer": {
        "word_join_analyzer": {
          "type": "custom",
          "filter": [
            "word_joiner",
            "uppercase"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  }
}

val · January 9, 2018, 3:36pm

I'm not certain the word_delimiter token filter can do what you need. You probably need something more involved. Especially, I don't see how you could get the ALPHA+ part using word_delimiter.

lea · January 9, 2018, 3:39pm

But how can I get the other tokens without using it?

system · February 6, 2018, 3:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Word_delimiter with split_on_numerics removes all tokens Elasticsearch	2	672	July 6, 2017
Using word_delimiter with edgeNGram ignores Word_Delimiter Token Elasticsearch	3	468	July 5, 2017
Why word_delimiter doesn's work on my index? Elasticsearch	5	345	March 17, 2021
Word delimiter filter with preserve_original Elasticsearch	4	654	December 18, 2019
Word_delimiter behaviour using match query with operator and Elasticsearch	1	203	September 26, 2022

How to index the word_delimiter itself?

Related topics