Confused about ignore_above and how to update

gswartz · August 21, 2020, 8:49pm

I have an index that I created simply by importing a bunch of docs, so elasticsearch created all the mappings by default. One of the fields is a potentially large text field that we need the whole thing to be keyword searchable. Here's the mapping for it.

"notes" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
}

From what I've read, the ignore_above would limit it to only indexing the first 256 chars. So I then found this command I should be able to run in Kibana to update it.

PUT /notes-index/_mapping
	{
	  "properties": {
	    "notes": {
	      "type": "text",
              "ignore_above" : 5000
	    }
	  }
	}

When I run that I get an error.

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Mapping definition for [notes] has unsupported parameters:  [ignore_above : 5000]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Mapping definition for [notes] has unsupported parameters:  [ignore_above : 5000]"
  },
  "status": 400
}

So, I'm confused, is this notes field a text type or keyword type? I'm wondering if it's a text, and each analyzed word is a keyword? Is that how it works? Thanks.

cheiligers · August 21, 2020, 11:06pm

@gswartz, welcome to the community!
ignore_above is only applicable to keyword fields.
Elasticsearch tries to help one out by creating a mapping if one isn't defined but it doesn't always get it just the way you want it
If you want both text and keyword, you can use a multi-field mapping as follows:

PUT /notes-index/_mapping
	{
	  "properties": {
	    "notes": {
	      "type": "text",
              "fields": {
                  "text.keyword": {
                       "type": "keyword"
                       "ignore_above": 5000
                  }
             }
	    }
	}

I wouldn't go as far as 5000 though, because the keyword type treats each entry as an individual, unique term. The default is 256.
There is an important note at the bottom of the docs that I'l reiterate here:
" The value for ignore_above is the character count , but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes."

If you want to do a full text search and plan to use an analyzer, I suggest you remap the field as text only. There's a great explanation and how to use the analyzers here.
I hope that helps.

gswartz · August 24, 2020, 2:12pm

Thank you!

system · September 21, 2020, 2:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Field mapping: "ignore_above" and "index" settings Elasticsearch	1	218	November 7, 2022
Ignore above Elasticsearch	1	290	May 14, 2019
Ignore_above does not work Elasticsearch	3	442	May 11, 2020
Philosophy behind ignore_above mapping parameter Elasticsearch	6	1124	February 4, 2021
Ignore_above setting is not respected Elasticsearch	2	413	July 14, 2020

Confused about ignore_above and how to update

Related topics