Visualization of the most frequent words

Hello ! i am new to kibana and elastic

My need is to display the most used words in a text type field on several documents.

Is it possible to do this with kibana and elastic? If so, how is this possible? With scripted fields? Or tokenizers?

Please help me

Thank you in advance for your answers

I just remembered and correct the reply.

If you set fielddata: true in text field, you can use visualization > aggregation based > Tag cloud.

PUT /test_field_data
{
  "mappings": {
    "properties": {
      "mytext":{
        "type":"text",
        "fielddata": true
      }
    }
  }
}

POST /test_field_data/_bulk
{"index":{}}
{"mytext": "Visualization of the most frequent words."}
{"index":{}}
{"mytext": "Hello ! i am new to kibana and elastic."}
{"index":{}}
{"mytext": "My need is to display the most used words in a text type field on several documents."}
{"index":{}}
{"mytext": "Is it possible to do this with kibana and elastic? If so, how is this possible? With scripted fields? Or tokenizers?"}
{"index":{}}
{"mytext": "Please help me Thank you in advance for your answers"}

Hello, thank you for your answer, it helps me a lot!

Is it possible to add an analyzer to, for example, remove stopwords (ex: "the") from the graph?

Yes, you can. (I'm not sure why, setting only index default analyzer doesn't work for me.) In ES there are a lot of built-in token filters which you can use.

PUT /test_field_data
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
          "type": "custom",
          "tokenizer": "standard",
          "filter":[
            "lowercase",
            "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "mytext":{
        "type":"text",
        "fielddata": true,
        "analyzer": "my_analyzer"
      }
    }
  }
}

If there are some problem about performance or memory consumption, the plan below which I deleted could be a next choice.

If you are using space-separeted language, one simple solution is to use the ingest pipeline with split processor to split texts with blanks, then set them into keyword field in array and visualize the field using Tag cloud. Unfortunately, this method does not utilize stemmer or normalizer, stop word filter...etc.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.