Hi,
I have a document containing the following text
X A1 ABC
When searching for this text using phrase query
"X A1 ABC"
we fail to find the result.
PUT 6a4d8bd1a4d67152c0edd375c996b319
{
"mappings": {
"raw": {
"_source": {
"includes": [
"*"
],
"excludes": [
"OntoAll"
]
},
"properties": {
"OntoID": {
"type": "keyword"
},
"OntoAll": {
"type": "text"
},
"OntoFields": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"OntoAll"
]
}
}
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"OntoFilter": {
"split_on_numerics": "true",
"generate_word_parts": "true",
"preserve_original": "true",
"catenate_words": "false",
"generate_number_parts": "true",
"catenate_all": "false",
"split_on_case_change": "true",
"type": "word_delimiter_graph",
"catenate_numbers": "false"
}
},
"analyzer": {
"default": {
"filter": [
"OntoFilter",
"lowercase"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
}
and the document is added as follows:
PUT 6a4d8bd1a4d67152c0edd375c996b319/raw/id1
{
"OntoID": "S8371",
"OntoFields": {
"key": "prop",
"value": "X A1 ABC"
}
}
Running the query below gives 0 results
GET 6a4d8bd1a4d67152c0edd375c996b319/_search
{
"from": 0,
"size": 10,
"query": {
"query_string": {
"query": "\"X A1 ABC\"",
"fields": [
"OntoAll"
],
"tie_breaker": 0,
"default_operator": "and"
}
}
}
Analysis of the phrase doesn't show anything strange about the positions from what I can see, there is a clear path through these tokens which could be satisfied to produce a result. We see 'A', '1', and 'A1' are tokenized as expected:
GET 6a4d8bd1a4d67152c0edd375c996b319/_analyze
{
"text" : "X A1 ABC",
"explain" : true
}
<<See second post below for the analyzer output owing to 7000 character limit of this post.>>
As an experiment, trying to use a hyphen as a delimiter between the A1:
"X A-1 ABC"
seems to work fine, so I suspect something about lucene's WordDelimitedGraphFilter may play a part in this issue?
Note "X A1" returns a result, as does "A1 ABC"
Kind regards
Dan