Hi,
We are testing server upgrade to 8.12.2 and we found that some of our queries stopped to works. We use 'query_string' with wildcard to search for user requested data. I have prepared some test data to show the challange we meet.
Lets assume we have index definition like this:
PUT test
{
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1",
"analysis": {
"filter": {
"appendZeros": {
"type": "pattern_replace",
"pattern": "^(a?)(\\d+)(\\D{3})?$",
"replacement": "$1$2\u006100$3 $1$2$3"
},
"appendZero": {
"type": "pattern_replace",
"pattern": "^(a?)(\\d+a\\d)(\\D{3})?$",
"replacement": "$1$20$3 $1$2$3"
},
"divideToken": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"(\\S+) (\\S+)"
]
}
},
"analyzer": {
"currencyAnalyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": [
"replaceSpecialCharacters"
],
"filter": [
"lowercase",
"appendZeros",
"appendZero",
"divideToken"
]
},
"textAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"replaceSpecialCharacters"
],
"filter": [
"lowercase"
]
}
},
"char_filter": {
"replaceSpecialCharacters": {
"type": "mapping",
"mappings": [
".=>\u0061",
",=>\u0061"
]
}
},
"normalizer": {
"lowercaseNormalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"dynamic": false,
"properties": {
"amount": {
"properties": {
"amount": {
"type": "scaled_float",
"scaling_factor": 100,
"copy_to": "amountValue"
},
"currency": {
"type": "keyword",
"copy_to": "amountValue"
}
}
},
"amountValue": {
"type": "text",
"analyzer": "currencyAnalyzer",
"store": false
}
}
}
}
Lets put in some data:
put test/_doc/1
{
"amount":{
"amount":123.45,
"currency": "USD"
}
}
put test/_doc/2
{
"amount":{
"amount":123.4,
"currency": "EUR"
}
}
Now query the data using wildcard:
post test/_search
{
"query":{
"wildcard": {
"amountValue":{
"value": "123a4*"
}
}
}
}
As a result we got: "hits": { "total": {"value": 2, .... }}
Now lets run query_string:
post test/_search
{
"query":{
"query_string": {
"fields": ["amountValue"],
"query": "123a4*"
}
}
}
This time we got: "hits": {"total": {"value": 0, ...}}
We tested earlier versions of Elasticserver and this behaviour started to appear since 8.9.
Now lets check tokens from documents we inserted:
post test/_analyze
{
"text":"123,4 123.45",
"field": "amountValue"
}
And we've got
"token": "123a4",
"token": "123a45",
And that is what we are looking for in a query_string. So why does query do not return the requested data? Is it some kind of a bug or my index settings are badly defined?
Any advice would be greatly appreciated as we are fighting with this for a some time.