What is this red dot in index data?

Aurel_Drejta · April 8, 2021, 2:57pm

Hi

I'm seeing this red dot appear in my index data as shown below:

This has also been reported by other people here in the discuss section of elasticsearch.
See:
Here
And here

In both cases there was nothing really found out about why this is causing problems or even how it came about.

The reason I am bringing this up is that it is affecting the relevancy scores of documents and not finding the actual result expected. Or in the cases of one of the links above the name of the field was not being recognized.

Has anyone else seen this, or found out what is causing those red dots?

Aurel_Drejta · April 8, 2021, 3:18pm

Could this be from encoding differences? I use the javascript elasticsearch client to insert the data in the indexes. Javascript uses UTF-16 when storing string and elasticsearch uses UTF-8 correct?

stephenb · April 8, 2021, 3:34pm

Perhaps it is this..

Unicode Character “·” (U+00B7)

Name:	Middle Dot[1]
Unicode Version:	1.1 (June 1993)[2]

Aurel_Drejta · April 8, 2021, 3:44pm

I don't think it is.

The first character is the one you posted while the second one is the character i am copying from the data in the index.

This is in kibana btw.

stephenb · April 8, 2021, 4:01pm

Curious what are you displaying that with the Dev Tools in Kibana

can you show what that look like if you do a

GET /yourindex/_search

Do you have a unicode Editor to take a look? Can you paste it into one?

Do you have some sample text that you have indexed... yes I see no answer yet... I can not reproduce yet.

Also what version Elastic / Kibana are you on? <!--- Important to test

And yes back to your encoding that could be an issue I suspect. (Unfortunately I am not an encoding expert)

stephenb · April 8, 2021, 8:07pm

with this

POST test/_doc
{
  "name" : "stephen \u0080 aaa " 
  
}

I got this not exactly same... I suspect it is a poorly / miss-encoded character

Aurel_Drejta · April 8, 2021, 8:31pm

When I copy the text with the red dot the dot becomes a space (which is what it should be in this case) when I paste it somewhere else outside of the Dev Tools of Kibana (Visual Studio Code for example).

The elasticsearch and kibana versions are 7.7.0. I'll post some sample data as soon as I can.

And yes I'm also suspecting that the space could have been poorly encoded. I'll keep trying to figure something out here (although I'm no encoding expert either).

Aurel_Drejta · April 8, 2021, 8:39pm

I figured it out.

The character is a non breaking space.
U+00A0 : NO-BREAK SPACE [NBSP]

Instead of a normal space
U+0020 : SPACE [SP]

I used this tool to figure it out.
I copied the character and pasted it there and it prints out the unicode value of it.

Guess i will just remove those spaces and replace them with normal spaces.

stephenb · April 8, 2021, 8:48pm

Nicely Done!

Aurel_Drejta · April 8, 2021, 8:49pm

I'm curios though... does elasticsearch handle these spaces differently?

stephenb · April 8, 2021, 8:54pm

When I get back I will check in 7.12 and see if I see the same, it could just be a display artifact.

system · May 6, 2021, 8:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Red dot in index data Kibana	2	566	August 18, 2020
Pink spaces in Elasticsearch field names Elasticsearch	5	495	March 30, 2021
Unicode characters and spaces in elasticsearch field names Elasticsearch	3	2055	July 6, 2017
Getting characters to be stored correctly Elasticsearch	6	1417	July 5, 2017
Not Getting data from Index pattern using SQL Kibana	14	2124	March 18, 2020

What is this red dot in index data?

Related topics