Exclude specific terms from term aggregation's buckets list

  1. There are lots of special characters in the words field
  2. The words field can contain multiple words and every word has to be a separate "entity" (token)

That's pretty much why I need to use a standard analyser on a text field with fielddata turned on.

The goal is to count the number of occurrences of each word within all of the words fields on the cluster (that's the query above). I really gave a lot of consideration for other options, researched a lot, this seems like the only option for my use case.

On the other hand, do you have any ideas how I could convert the query above to count every word within the words field and sort them based on the count number?
The current query produces a doc_count, which is the number of documents that contain the word, but some documents contain a word multiple times, so it isn't very precise.