Indexing Special characters as symbols and searching with unicode values

premkumar3438 · September 16, 2016, 1:08pm

I am Trying to index and search for Special characters in elastic search
I used White space tokenizer and i am able to index Special characters and search them fine.

But i have a situation where i need to index the special character Symbol and search it using its equivalent unicode values
Example:
I am indexing the below document

{
   "id": 1,
   "documentId": "334567"
   "fieldValue": [
      {
         "fieldId": 175699,
         "textValue": [{
         "paragraph":"@doc"
         },{
         "paragraph":"γcomp"// this a lowercased gamma symbol
         },
         {
         "paragraph":"@Keyboard"
         }
         ],
         "integerValue": "",
         "numericValue": "",
         "modifiedDate": "2010-01-01",
         "modifiedUser": "Tr"
      }
   ]
}

Now i want to search "γcomp" Using "γ" unicode value "&#947comp" but it is not working

Can anybody please help with this?

polyfractal · September 16, 2016, 1:48pm

There is not automatic conversion of HTML entities into their corresponding unicode. Elasticsearch expects UTF-8 encoded strings, so encoding schemes like HTML entities and url-encoding just look like valid UTF-8 and are indexed/searched as they are.

If you need to convert HTML entities into their corresponding unicode characters, I'd probably just run the conversion in my application.

If you have a relatively small list of characters you need to convert on a regular basis, you can use the char_filter as described here: https://www.elastic.co/guide/en/elasticsearch/guide/current/char-filters.html#_tidying_up_punctuation

Topic		Replies	Views
Special characters search in elastic search Elasticsearch	6	472	July 6, 2017
Elastic web crawler converting special chars to "�" Elastic Search elastic-app-search	2	138	June 17, 2024
Special Characters not indexed and hence not searchable Elasticsearch	9	2806	July 6, 2017
URL with special characters when searched not working in ElasticSearch 5.2.2 Elasticsearch	3	2028	September 7, 2017
Searching special characters in elastic Elasticsearch	4	189	April 10, 2024

Indexing Special characters as symbols and searching with unicode values

Related topics