Kibana Word Cloud on Text Field

spyderman4g63 · September 30, 2020, 2:33pm

I'm trying to create a word cloud on a text field called "message". This is my configuration for the field in elasticsearch:

{
  "my_index" : {
    "mappings" : {
      "message" : {
        "full_name" : "message",
        "mapping" : {
          "message" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },
            "fielddata" : true
          }
        }
      }
    }
  }
}

In kibana, I don't have an option for "message", but I do see an option for "message.keyword" however, it shows no data. Other fields like "type.keyword" seem to work. Is there something wrong with my configuration for the "message" text field?

flash1293 · September 30, 2020, 3:26pm

Are the values of message longer than 256 characters? In that case they wouldn't be indexed.

spyderman4g63 · September 30, 2020, 3:33pm

Oh yes, way larger.

spyderman4g63 · September 30, 2020, 3:45pm

Is there a limit to ignore_above and is it a bad idea to make it very large?

edit: I tried creating a field with a super large limit just to see if it would work:

{
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above" : 100000000
          }
        }
      }
    }
  }
}

It didn't work. It seems like lucene might have a limit of 8191 characters according to this documentation.

flash1293 · October 1, 2020, 12:53pm

Can you elaborate on your use case? I can't see how a word cloud would render nicely with such large terms. Maybe there is another way to achieve your goal.

spyderman4g63 · October 1, 2020, 1:36pm

I was looking to extract some of the most commonly used terms from a document.

flash1293 · October 1, 2020, 1:40pm

I see, that's not what the terms aggregation will do - it will match the whole message as a single value (so I guess each one will occur exactly once) - e.g. "This is a very very long message" is a single term, it won't split "This" and "very" and so on.

I think using "significant terms" on the text field (message, not message.keyword) is what you want - it show you words commonly used in the message field

spyderman4g63 · October 1, 2020, 2:03pm

Ah, I must have been confused.

One final question. message doesn't show up in the kibana dropdown. I tried turning on fielddata but that didn't seem to work. Any idea on why it wouldn't appear in kibana?

Thanks for all the help.

Edit: i saw that the field wasn't aggregateable in the in the index patterns. I delete and readded the pattern and it says message is aggregatable and shows up iin kibana but still doesn't extract any significant terms.

flash1293 · October 1, 2020, 3:02pm

Ah, I made a mistake here as well. A bunch of things to clear up:

On changing the mapping you need to refresh the index pattern in Kibana so it can pick up the changes (recreating works as well)
In your case "Terms" is probably the right thing, if used on a "text" field with fielddata enabled it does what you expect (showing the most common words in your document) - significant terms is just a special case of that highlighting unusually common terms in relation to your current query (doesn't make much sense without a query)
How did you turn on fielddata? You might need to reindex your data so it's properly populated

An example that worked for me:

Create the mapping and ingest data

PUT textindex2
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}

POST textindex2/_doc
{ "message": "This is my message" }

POST textindex2/_doc
{ "message": "This is my other message" }

Create index pattern for this index
Create tag cloud based on "terms" aggregation on the message field

Screenshot 2020-10-01 at 17.01.082130×1134 162 KB

spyderman4g63 · October 1, 2020, 3:28pm

Thank you again. I had to re-index my data. The terms aggregation works now. It's not very helpful since the top terms are random numbers and things but I guess that is expected.

flash1293 · October 2, 2020, 7:21am

The values ending up in the index depend on the mapping - you can configure the used token filters to exclude certain values (e.g. the stop filter to exclude values from a list: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html#analysis-stop-tokenfilter-customize) - check out the side bar on the same page, there are a lot of options to clean it up.

If you change the mapping, you have to reindex your data for it to take effect.

system · October 30, 2020, 7:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to increase message size limit of elasticsearch API (max_clause_count) Elasticsearch	3	5821	June 17, 2019
Ignore_above setting is not respected Elasticsearch	2	413	July 14, 2020
.keyword field not working & how can i split sentence? Kibana	3	848	August 27, 2018
Group same errors on .keyword Kibana	3	1010	February 13, 2019
Unable to display large text fields in Kibana Markdown visualization Kibana visualisation	11	567	June 14, 2023

Kibana Word Cloud on Text Field

Related topics