Count Each word in a text Field

vishesh08 · March 14, 2022, 11:02am

I have tweets stored in a text field with field Data True. I want to remove the english stopwords and get a count of each word in the tweets.
Data Mapping:

{
  "tweets" : {
    "mappings" : {
      "properties" : {
        "_class" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "text" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "fielddata" : true
        }
      }
    }
  }
}

Example of data:

{
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "1503057742633869318",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.twitter.elastic.models.Tweet",
          "text" : """RT @Ceszie_: youve worked hard! the show was awesome and fantastic! you did great! 

make sure you rest well okay? 

#ThankyouBTS 💜💜💜 @BTS_…""",
          "id" : "1503057742633869318"
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "1503057796983451651",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.twitter.elastic.models.Tweet",
          "text" : """RT @btsyauRJ2: Summary for D1, D2 &amp; D3 by me, i will treasure these three days forever &lt;3 BTS BTS BTS 💜🥺 

#ThankYouBTS @BTS_twt
#PTD_ON_ST…""",
          "id" : "1503057796983451651"
        }
      }

casterQ · March 15, 2022, 7:31am

vishesh08 · March 15, 2022, 9:33am

But this returns statistics of a particular document. I want to have the termvector of a particular field of all documents combined. Like if I have 10 documents. I want the term vector of all the 10 documents combined

casterQ · March 15, 2022, 9:37am

vishesh08 · March 15, 2022, 10:05am

I need the combined TermVector not individual. In 10 documents as I said. if the work "marry" occurs 10 times . I need its term frequency. What you sent gives frequency for a single document

casterQ · March 15, 2022, 10:15am

If term vectors can't meet your needs, you can enable fielddata on the text field to use terms agg on it. but enabling fielddata can significantly increase memory usage.

vishesh08 · March 15, 2022, 10:39am

Ok. can you help me with the query. my fieldData is already True

casterQ · March 15, 2022, 11:02am

{
  "aggs": {
    "wordCloud": {
      "terms": {
        "field": "text"
      }
    }
  }
}

system · April 12, 2022, 11:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to count each term of a text instead of the complete text to display in Kibana? Elasticsearch	5	4790	March 3, 2017
Count words/tokens in a field in a document Elasticsearch	3	4013	January 11, 2017
Count the term from an specific field Elasticsearch	1	365	July 6, 2017
Word count/frequency per field Elasticsearch	3	3333	January 10, 2019
Counting Terms In Text Field Kibana	3	31	August 6, 2024

Count Each word in a text Field

Related topics