Word_delimiter behaviour using match query with operator and

the_hoff · August 29, 2022, 8:35am

Hi folks,

I am currently using the word_delimiter for splitting up tokens to subtokens with Elastic Version 7.9.2.I do get an unexpected outcome using the delimiter filter with a match query and operator "and" and I am not sure if I understand the behaviour correctly.

So here is the thing:

// mapping and settings
PUT simple_test
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "filter": [
            "delimiter_filter",
            "lowercase", 
            "unique"
          ],
          "tokenizer": "whitespace"
        }
      },
      "filter": {
        "delimiter_filter": {
            "type": "word_delimiter_graph",
            "catenate_all": "false",
            "catenate_numbers": "true",
            "catenate_words": "true",
            "generate_number_parts": "true",
            "generate_word_parts": "true",
            "preserve_original": "true",
            "split_on_case_change": "false",
            "split_on_numerics": "true",
            "stem_english_possessive": "false",
            "adjust_offsets": "false"
          }
      }
    }
  }
}
// index document
PUT simple_test/_doc/1
{
  "title" : "30x32"
}

// query returns an empty result
GET simple_test/_search
{ 
  "query": {
    "match": {
      "title": {
        "query": "30/32",
        "operator": "and"
      }
    }
  }
}

As fas as I understood, the delimiter creates the following tokens:

For the search term "30/10" (with position)
"30/10" (0) , "3010" (0), "30" (0), "10" (1)
For the index document with "30x10" (with position)
"30x10" (0), "30" (0), "x" (1), "10" (2)

So if the document contains the term "30x10", I would assume a match query with operator "and" would match the search term "30/10" because there are matches for "30" and "10". But this is not the case.

What am I missing?

Thanks in advance!

system · September 26, 2022, 8:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Word_delimiter_graph + preserve_original = token position matters for "match" query Elasticsearch	2	417	April 29, 2020
Why word_delimiter doesn's work on my index? Elasticsearch	5	345	March 17, 2021
Issue with using word delimiter Elasticsearch	1	587	July 6, 2017
WordDelimiterTokenFilter doesn't seem to be generating expected tokens Elasticsearch	1	509	February 19, 2018
Word delimiter filter: match all word parts Elasticsearch	1	369	July 5, 2017

Word_delimiter behaviour using match query with operator and

Related topics