What is this red dot in index data?

Hi

I'm seeing this red dot appear in my index data as shown below:

This has also been reported by other people here in the discuss section of elasticsearch.
See:
Here
And here

In both cases there was nothing really found out about why this is causing problems or even how it came about.

The reason I am bringing this up is that it is affecting the relevancy scores of documents and not finding the actual result expected. Or in the cases of one of the links above the name of the field was not being recognized.

Has anyone else seen this, or found out what is causing those red dots?

Could this be from encoding differences? I use the javascript elasticsearch client to insert the data in the indexes. Javascript uses UTF-16 when storing string and elasticsearch uses UTF-8 correct?

Perhaps it is this..

Unicode Character “·” (U+00B7)

Name: Middle Dot[1]
Unicode Version: 1.1 (June 1993)[2]

I don't think it is.

Screenshot from 2021-04-08 17-42-38

The first character is the one you posted while the second one is the character i am copying from the data in the index.

This is in kibana btw.

Curious what are you displaying that with the Dev Tools in Kibana

can you show what that look like if you do a

GET /yourindex/_search

Do you have a unicode Editor to take a look? Can you paste it into one?

Do you have some sample text that you have indexed... yes I see no answer yet... I can not reproduce yet.

Also what version Elastic / Kibana are you on? <!--- Important to test

And yes back to your encoding that could be an issue I suspect. (Unfortunately I am not an encoding expert)

with this

POST test/_doc
{
  "name" : "stephen \u0080 aaa " 
  
}

I got this not exactly same... I suspect it is a poorly / miss-encoded character

When I copy the text with the red dot the dot becomes a space (which is what it should be in this case) when I paste it somewhere else outside of the Dev Tools of Kibana (Visual Studio Code for example).

The elasticsearch and kibana versions are 7.7.0. I'll post some sample data as soon as I can.

And yes I'm also suspecting that the space could have been poorly encoded. I'll keep trying to figure something out here (although I'm no encoding expert either).

1 Like

I figured it out.

The character is a non breaking space.
U+00A0 : NO-BREAK SPACE [NBSP]

Instead of a normal space
U+0020 : SPACE [SP]

I used this tool to figure it out.
I copied the character and pasted it there and it prints out the unicode value of it.

Guess i will just remove those spaces and replace them with normal spaces.

1 Like

Nicely Done!

1 Like

I'm curios though... does elasticsearch handle these spaces differently?

When I get back I will check in 7.12 and see if I see the same, it could just be a display artifact.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.