Confused about ignore_above and how to update

I have an index that I created simply by importing a bunch of docs, so elasticsearch created all the mappings by default. One of the fields is a potentially large text field that we need the whole thing to be keyword searchable. Here's the mapping for it.

"notes" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
}

From what I've read, the ignore_above would limit it to only indexing the first 256 chars. So I then found this command I should be able to run in Kibana to update it.

PUT /notes-index/_mapping
	{
	  "properties": {
	    "notes": {
	      "type": "text",
              "ignore_above" : 5000
	    }
	  }
	}

When I run that I get an error.

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Mapping definition for [notes] has unsupported parameters:  [ignore_above : 5000]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Mapping definition for [notes] has unsupported parameters:  [ignore_above : 5000]"
  },
  "status": 400
}

So, I'm confused, is this notes field a text type or keyword type? I'm wondering if it's a text, and each analyzed word is a keyword? Is that how it works? Thanks.

@gswartz, welcome to the community!
ignore_above is only applicable to keyword fields.
Elasticsearch tries to help one out by creating a mapping if one isn't defined but it doesn't always get it just the way you want it :slight_smile:
If you want both text and keyword, you can use a multi-field mapping as follows:

PUT /notes-index/_mapping
	{
	  "properties": {
	    "notes": {
	      "type": "text",
              "fields": {
                  "text.keyword": {
                       "type": "keyword"
                       "ignore_above": 5000
                  }
             }
	    }
	}

I wouldn't go as far as 5000 though, because the keyword type treats each entry as an individual, unique term. The default is 256.
There is an important note at the bottom of the docs that I'l reiterate here:
" The value for ignore_above is the character count , but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes."

If you want to do a full text search and plan to use an analyzer, I suggest you remap the field as text only. There's a great explanation and how to use the analyzers here.
I hope that helps.

1 Like

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.