Query on non-standard characters not working post upgrade from 5.6 to 6.8 on ElasticCloud

I performed an upgrade yesterday for our ElasticSearch and Kibana instances from 5.6 to 6.8 in ElasticCloud.

We had some recordswith non-standard characters due to a previous character encoding issue when importing data using logstash.

I notice that since the upgrade to 6.8 the following query for one of these existing records will no longer returns any data:

GET my_search_alias/_search
{
"query": {
"match_phrase": {
"ARTICLE_TITLE": "©"
}
}
}

This record is present and searching for another field returns the expected value:
GET my_search_alias/_search
{
"query": {
"term": {
"MY_ID": 1234
}
}
}
...
"MY_ID" : 1234
"ARTICLE_TITLE": "MarkSE016TEST CAR1Deutscher Titel Wort1 Wort2 Wort3 Wort4 Wort5 320 © Mark Evans"
...

However, on the 5.6 version of ElasticSearch a record is retrieved when searching using the non standard character string:

GET my_search_alias/_search
{
"query": {
"match_phrase": {
"ARTICLE_TITLE": "©"
}
}
}
...
"MY_ID" : 1234
"ARTICLE_TITLE": "MarkSE016TEST CAR1Deutscher Titel Wort1 Wort2 Wort3 Wort4 Wort5 320 © Mark Evans"
...

If I create a NEW record in the 6.8 version with a copy and paste of the previous title then the new record IS retrieved by the character search query BUT the record that I copied and pasted the title from is not returned.

  1. Is there any difference in character searching post upgrade from 5.6 to 6.8?
  2. Is there a way to recompile the index or find out why the old record is not being returned since the upgrade - Even through when I create a new record with exactly the same value this is returned?

Many thanks,
Mark

I have found a workaround- But this involved me creating a NEW index with the same settings as the old one and then effectively copying all the data from the old index to the new one:
POST _reindex
{
"source": {
"index": "my_search_index_1"
},
"dest": {
"index": "my_search_index_2"
}
}

Searching for
GET my_search_index2/_search
{
"query": {
"match_phrase": {
"ARTICLE_TITLE": "©"
}
}
}

Will then return the record that does not return on the original index.
This is not ideal as will then need to add the new index to the alias and drop the old index which will take time during a critical production downtime.
Is there a way to refresh the original index rather than copy it into a new index and delete the old one?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.