Hi, got a weird situation where I was expecting to get a result for a phrase query but I'm getting nothing.
This is in 6.2.3 and in 6.3
The index structure is as follows:
PUT testing
{
"mappings": {
"raw": {
"_source": {
"includes": [
"*"
],
"excludes": [
"OntoAll"
]
},
"properties": {
"OntoID": {
"type": "keyword"
},
"OntoAll": {
"type": "text"
},
"OntoFields": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"OntoAll"
]
}
}
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"OntoFilter": {
"split_on_numerics": "true",
"generate_word_parts": "true",
"preserve_original": "true",
"catenate_words": "false",
"generate_number_parts": "true",
"catenate_all": "false",
"split_on_case_change": "true",
"type": "word_delimiter_graph",
"catenate_numbers": "false"
}
},
"analyzer": {
"default": {
"filter": [
"OntoFilter",
"lowercase"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
}
and the document is added as follows:
PUT testing/raw/id1
{
"OntoID": "S8371",
"OntoFields": {
"key": "prop",
"value": "SAL_S8371 - SAL SUBTEND 0001"
}
}
Running the query below gives 0 results
GET testing/_search
{
"from": 0,
"size": 10,
"query": {
"query_string": {
"query": "\"SAL_S8371 - SAL SUBTEND 0001\"",
"fields": [
"OntoAll"
],
"tie_breaker": 0,
"default_operator": "and"
}
}
}
I did notice something strange when analysing the query - the position offset skips over a value at position 4 when it encounters the '-' character
GET testing/_analyze
{
"text": [
"SAL_S8371 - SAL SUBTEND 0001"
]
}
{
"tokens": [
{
"token": "sal_s8371",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0,
"positionLength": 3
},
{
"token": "sal",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "s",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "8371",
"start_offset": 5,
"end_offset": 9,
"type": "word",
"position": 2
},
{
"token": "-",
"start_offset": 10,
"end_offset": 11,
"type": "word",
"position": 3
},
{
"token": "sal",
"start_offset": 12,
"end_offset": 15,
"type": "word",
"position": 5
},
{
"token": "subtend",
"start_offset": 16,
"end_offset": 23,
"type": "word",
"position": 6
},
{
"token": "0001",
"start_offset": 24,
"end_offset": 28,
"type": "word",
"position": 7
}
]
}
Without the '-' char in the query string, the positions increment as expected: 0-5 with no gaps
After I noticed this, I added a "phrase_slop": 1 to the query and I then get the result back but this shouldn't be required?
For info, the source/excludes in the mapping is due to that field getting populated directly as well as via the copy_to in the nested document. Could this be the cause of the offset skipping and the query failing?
Thanks