What all are the possible problems if I don't give keyword mapping to my text type?

elasticheart · November 30, 2017, 9:58am

Hi,

I have a mapping like below;

PUT _template/my_template
{
  "index_patterns": ["mylogs*"],
  "mappings": {
    "doc": {
      "properties": {
        .
		.
		.
        "long_text_field_1": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
			  "ignore_above": 10922
            }
          }
        },
		"long_text_field_2": {
          "type": "text"
        },
        .
		.
		.
      }
    }
  }
}

The fields are long UTF-8 text fields. What all will be the performance / other difference between these two fields, assuming that the size of contents in these 2 fields are same. What all operations are possible / not possible?

Thanks.

Mark_Harwood · November 30, 2017, 10:23am

Why would you need a single token be up to 10922 characters in length?

Users are unlikely to perform exact-match searches by typing in a 10k length string
Users are unlikely to select a 10k length string from a drop-down box in a structured form
Users are unlikely to want to perform aggregations where a bar on a bar chart is labelled with a 10k length string.

The only scenario I can conceive of is where something like a long URL is stored in a system and you want to perform some analysis on URL usage. You're likely still going to end up with a cripplingly large number of unique strings in your index. Hashes may be a better approach in these circumstances

elasticheart · November 30, 2017, 10:28am

Thanks @Mark_Harwood

My intention was just to search inside those long text fields. Not the entire text, but keywords inside that. In this case, kindly tell me is it OK to go with the below config;

    "long_text_field_1": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
		  "ignore_above": 256
        }
      }
    },
	"long_text_field_2": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
		  "ignore_above": 256
        }
      }
    },

Mark_Harwood · November 30, 2017, 10:33am

That's perhaps the misinterpretation then. It's "keyword" singular, not plural. A keyword string is treated as a single untokenized keyword like science fiction as opposed to a text string which is tokenized into the words science and fiction.

elasticheart · November 30, 2017, 10:53am

Ok. So it denotes the size of "each keyword" inside my entire text. So I can very well go with 256

Mark_Harwood · November 30, 2017, 10:57am

Nope. It dictates the maximum length of a string that will be stored as an untokenized keyword.
"Keyword" does not mean substring. The process of creating substrings for search is tokenization to chop string fields of the type "text" into individual tokens.
Strings of the type keyword are not tokenized.

system · December 28, 2017, 10:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Long keyword/text fields just for retrieval Elasticsearch	6	1332	February 9, 2017
Mapping template field with keyword suffix Elasticsearch	3	577	May 25, 2021
Mapping file and the disk space Elasticsearch	3	439	July 14, 2019
How are strings mapped? Elasticsearch	4	493	May 2, 2018
Is it wasteful to configure text as keyword with "fields": keyword? Elasticsearch	4	506	March 11, 2020

What all are the possible problems if I don't give keyword mapping to my text type?

Related topics