Kibana Word Cloud on Text Field

I'm trying to create a word cloud on a text field called "message". This is my configuration for the field in elasticsearch:

{
  "my_index" : {
    "mappings" : {
      "message" : {
        "full_name" : "message",
        "mapping" : {
          "message" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },
            "fielddata" : true
          }
        }
      }
    }
  }
}

In kibana, I don't have an option for "message", but I do see an option for "message.keyword" however, it shows no data. Other fields like "type.keyword" seem to work. Is there something wrong with my configuration for the "message" text field?

Are the values of message longer than 256 characters? In that case they wouldn't be indexed.

Oh yes, way larger.

Is there a limit to ignore_above and is it a bad idea to make it very large?

edit: I tried creating a field with a super large limit just to see if it would work:

{
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above" : 100000000
          }
        }
      }
    }
  }
}

It didn't work. It seems like lucene might have a limit of 8191 characters according to this documentation.

Can you elaborate on your use case? I can't see how a word cloud would render nicely with such large terms. Maybe there is another way to achieve your goal.

I was looking to extract some of the most commonly used terms from a document.

I see, that's not what the terms aggregation will do - it will match the whole message as a single value (so I guess each one will occur exactly once) - e.g. "This is a very very long message" is a single term, it won't split "This" and "very" and so on.

I think using "significant terms" on the text field (message, not message.keyword) is what you want - it show you words commonly used in the message field

Ah, I must have been confused.

One final question. message doesn't show up in the kibana dropdown. I tried turning on fielddata but that didn't seem to work. Any idea on why it wouldn't appear in kibana?

Thanks for all the help.

Edit: i saw that the field wasn't aggregateable in the in the index patterns. I delete and readded the pattern and it says message is aggregatable and shows up iin kibana but still doesn't extract any significant terms.

Ah, I made a mistake here as well. A bunch of things to clear up:

  • On changing the mapping you need to refresh the index pattern in Kibana so it can pick up the changes (recreating works as well)
  • In your case "Terms" is probably the right thing, if used on a "text" field with fielddata enabled it does what you expect (showing the most common words in your document) - significant terms is just a special case of that highlighting unusually common terms in relation to your current query (doesn't make much sense without a query)
  • How did you turn on fielddata? You might need to reindex your data so it's properly populated

An example that worked for me:

  • Create the mapping and ingest data
PUT textindex2
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}

POST textindex2/_doc
{ "message": "This is my message" }

POST textindex2/_doc
{ "message": "This is my other message" }
1 Like

Thank you again. I had to re-index my data. The terms aggregation works now. It's not very helpful since the top terms are random numbers and things but I guess that is expected.

The values ending up in the index depend on the mapping - you can configure the used token filters to exclude certain values (e.g. the stop filter to exclude values from a list: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html#analysis-stop-tokenfilter-customize) - check out the side bar on the same page, there are a lot of options to clean it up.

If you change the mapping, you have to reindex your data for it to take effect.