Count Each word in a text Field

I have tweets stored in a text field with field Data True. I want to remove the english stopwords and get a count of each word in the tweets.
Data Mapping:

{
  "tweets" : {
    "mappings" : {
      "properties" : {
        "_class" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "text" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "fielddata" : true
        }
      }
    }
  }
}

Example of data:

{
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "1503057742633869318",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.twitter.elastic.models.Tweet",
          "text" : """RT @Ceszie_: youve worked hard! the show was awesome and fantastic! you did great! 

make sure you rest well okay? 

#ThankyouBTS πŸ’œπŸ’œπŸ’œ @BTS_…""",
          "id" : "1503057742633869318"
        }
      },
      {
        "_index" : "tweets",
        "_type" : "_doc",
        "_id" : "1503057796983451651",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.twitter.elastic.models.Tweet",
          "text" : """RT @btsyauRJ2: Summary for D1, D2 & D3 by me, i will treasure these three days forever <3 BTS BTS BTS πŸ’œπŸ₯Ί 

#ThankYouBTS @BTS_twt
#PTD_ON_ST…""",
          "id" : "1503057796983451651"
        }
      }

But this returns statistics of a particular document. I want to have the termvector of a particular field of all documents combined. Like if I have 10 documents. I want the term vector of all the 10 documents combined

I need the combined TermVector not individual. In 10 documents as I said. if the work "marry" occurs 10 times . I need its term frequency. What you sent gives frequency for a single document

If term vectors can't meet your needs, you can enable fielddata on the text field to use terms agg on it. but enabling fielddata can significantly increase memory usage.

Ok. can you help me with the query. my fieldData is already True

{
  "aggs": {
    "wordCloud": {
      "terms": {
        "field": "text"
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.