Combining multiple Tokenizer features on single _all field

cdekker · September 12, 2018, 9:46am

We have several customer defined indices on ES 6 with 100+ fields, where each field has a copy_to mapping to an _all field. This allows us to perform full-text search over all user-defined fields in the index.

I have several specific tokenizer requirements for this (and any other) field in those indices:

Emails should be tokenized as-is and not broken up: uax_url_email tokenizer
Support non-western languages: icu_tokenizer
(Company) domain names should be normalized without TLD ('Amazon.com' > 'Amazon'), so they will match queries without the '.com'.

I currently implemented 2. and 3. as follows in one Analyzer:

"analysis": {
  "filter": {
    "domain_name": {
      "type": "pattern_capture",
      "preserve_original": "true",
      "patterns": [
        "^(?:www\\.)?([^.]{3,})\\.[^.]+"
      ]
    }
  },
  "analyzer": {
    "icu": {
      "filter": [
        "icu_folding",
        "domain_name"
      ],
      "type": "custom",
      "tokenizer": "icu_tokenizer"
    }
  }
}

How can I also add requirement 1. to this to support email addresses? How can I somehow 'combine' the 2 different tokenizers?

Is there a better way to implement the (company) domain name tokenization?

cdekker · September 18, 2018, 9:39am

Does anyone have an idea on how to achieve the 3 different tokenizations of terms?

system · October 16, 2018, 9:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search multiple fields with “and” operator (but use fields' own analyzers) Elasticsearch	7	2420	July 6, 2017
Using uax_url_email tokenizer and ngram together Elasticsearch	2	623	November 23, 2017
Combine TokenFilter? Elasticsearch	1	245	July 6, 2017
Combining Analyzer/Tokenizer in one Elasticsearch	5	388	July 6, 2017
Use an analyzer and a normalizer at the same time on the same field? Elasticsearch	4	1341	November 13, 2020

Combining multiple Tokenizer features on single _all field

Related topics