Exclude specific terms from term aggregation's buckets list

randomuser · May 9, 2018, 12:38pm

There are lots of special characters in the words field
The words field can contain multiple words and every word has to be a separate "entity" (token)

That's pretty much why I need to use a standard analyser on a text field with fielddata turned on.

The goal is to count the number of occurrences of each word within all of the words fields on the cluster (that's the query above). I really gave a lot of consideration for other options, researched a lot, this seems like the only option for my use case.

On the other hand, do you have any ideas how I could convert the query above to count every word within the words field and sort them based on the count number?
The current query produces a doc_count, which is the number of documents that contain the word, but some documents contain a word multiple times, so it isn't very precise.

Topic		Replies	Views
Exclude specific bucket with integer key from term aggregation Elasticsearch	3	1953	July 6, 2017
Terms aggregation is breaking field into tokens Elasticsearch	2	718	July 5, 2017
Exclude Significant Term Aggregation With Different Field Elasticsearch	2	1402	November 29, 2019
Terms aggregation ignoring analyzers? Elasticsearch	4	488	June 1, 2018
Terms Aggregation buckets returns only single words and not phrases. Truncates the text after space Elasticsearch	3	1271	July 6, 2017

Exclude specific terms from term aggregation's buckets list

Related topics