Hi folks,
I am currently using the word_delimiter for splitting up tokens to subtokens with Elastic Version 7.9.2.I do get an unexpected outcome using the delimiter filter with a match query and operator "and" and I am not sure if I understand the behaviour correctly.
So here is the thing:
// mapping and settings
PUT simple_test
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"delimiter_filter",
"lowercase",
"unique"
],
"tokenizer": "whitespace"
}
},
"filter": {
"delimiter_filter": {
"type": "word_delimiter_graph",
"catenate_all": "false",
"catenate_numbers": "true",
"catenate_words": "true",
"generate_number_parts": "true",
"generate_word_parts": "true",
"preserve_original": "true",
"split_on_case_change": "false",
"split_on_numerics": "true",
"stem_english_possessive": "false",
"adjust_offsets": "false"
}
}
}
}
}
// index document
PUT simple_test/_doc/1
{
"title" : "30x32"
}
// query returns an empty result
GET simple_test/_search
{
"query": {
"match": {
"title": {
"query": "30/32",
"operator": "and"
}
}
}
}
As fas as I understood, the delimiter creates the following tokens:
-
For the search term "30/10" (with position)
"30/10" (0) , "3010" (0), "30" (0), "10" (1) -
For the index document with "30x10" (with position)
"30x10" (0), "30" (0), "x" (1), "10" (2)
So if the document contains the term "30x10", I would assume a match query with operator "and" would match the search term "30/10" because there are matches for "30" and "10". But this is not the case.
What am I missing?
Thanks in advance!