String mapping: ignore_above for not_analyzed fields


(BradVido) #1

I have some string fields in ES with mappings like so:

{
  "index": "analyzed",
  "type": "string",
  "fields": {
	"raw": {
	  "index": "not_analyzed",
	  "ignore_above": 256,
	  "type": "string",
	  "doc_values": true
	}
  }
}

From the docs, for the ignore_above setting:

The analyzer will ignore strings larger than this size. Useful for generic not_analyzed fields that should ignore long text.

What exactly does this mean? If my field is 257 characters, will the field.raw not be searchable at all? Will it not show up in aggregations. Will only the first 256 characters be included?


Problem with raw field
(Mark Walkom) #2

Only the first 256 chars will be included, everything else will be dropped.


#3

This doesn't seem to be correct. I have a type defined as follows:

              "index": "analyzed",
              "omit_norms": true,
              "type": "string",
              "fields": {
                "raw": {
                  "index": "not_analyzed",
                  "ignore_above": 384,
                  "type": "string"
                }
              }

Terms aggregation completely fails on this field's .raw value – values that are longer are not truncated; no buckets at all are returned for these values.


(BradVido) #4

I'm still trying to figure out what this means.

If the field is not_analyzed, then why do the docs mention that the analyzer will ignore strings larger than this size...?

Edit: I just found that the new 2.0 documentation better explains ignore_above. That documentation doesn't exist for 1.x versions, but I'm hoping behavior was the same or very similar.


(system) #5