Exclude specific terms from term aggregation's buckets list

randomuser · May 9, 2018, 7:55am

Hello,

Is there any way to filter the searched term from the results?

I'm using a standard tokeniser, the field "word" usually contains values like "green table", "big table", "yellow tables".. my aggregation will put the words "table" or "tables" at top as they're the most frequent.

In this example, I don't want the word "table" in the buckets list results.

By the way, regarding the query_string, whats the most efficient way (performance wise) to search for words that contain a word?

GET newindex/_search
    {
      "query": {
        "query_string": {
          "default_field": "word", 
          "query" : "*table*"
        }
      },
      "aggs" : {
          "tables" : {
              "terms" : { 
                "field" : "word.s",
                "size" : 100
              }
          }
      },
      "size" : 0
    }

randomuser · May 9, 2018, 8:30am

If anyone will have the same problem, the easiest solution is to add Exclude parameter:

GET newindex/_search
    {
      "query": {
        "query_string": {
          "default_field": "word", 
          "query" : "*table*"
        }
      },
      "aggs" : {
          "tables" : {
              "terms" : { 
                "field" : "word.s",
                "exclude": ".*table.*"
                "size" : 100
              }
          }
      },
      "size" : 0
    }

dadoonet · May 9, 2018, 8:35am

There is an exclude parameter in aggs.
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-aggregations-bucket-terms-aggregation.html#_filtering_values_3

About wildcard performances, prefer using ngrams. You ll pay the price at index time instead of query time.

randomuser · May 9, 2018, 8:38am

How to get full word tokens with Ngrams?
With a Ngram tokeniser here, the returned tokens would be "tab", "le " etc, can't aggregate on that as the buckets wouldn't make sense

dadoonet · May 9, 2018, 9:31am

Use multi-fields: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

The same content can use ngrams for search (text type) and no transformation as a keyword type on which you can compute aggs.

randomuser · May 9, 2018, 9:51am

Yep there was much dilemma a few days ago regarding that.. but in the example I gave, a keyword type would do buckets on "green table" instead on "green" and "table", or is there a way to achieve the same?

dadoonet · May 9, 2018, 10:07am

You can use a text field with fielddata on. Might work

randomuser · May 9, 2018, 11:13am

Wasn't sure if I tried, so I've done it again, it doesn't produce the wanted effect.
I know that using a standard analyser on a text field isn't really an optimal approach, but I didn't see any other way.
Thanks for the effort though.

dadoonet · May 9, 2018, 11:32am

Share what you got and what you want.

randomuser · May 9, 2018, 12:38pm

There are lots of special characters in the words field
The words field can contain multiple words and every word has to be a separate "entity" (token)

That's pretty much why I need to use a standard analyser on a text field with fielddata turned on.

The goal is to count the number of occurrences of each word within all of the words fields on the cluster (that's the query above). I really gave a lot of consideration for other options, researched a lot, this seems like the only option for my use case.

On the other hand, do you have any ideas how I could convert the query above to count every word within the words field and sort them based on the count number?
The current query produces a doc_count, which is the number of documents that contain the word, but some documents contain a word multiple times, so it isn't very precise.

dadoonet · June 1, 2018, 7:10pm

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

system · June 29, 2018, 7:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terms aggregation and regex filter Elasticsearch	1	2445	July 6, 2017
Filter out buckets in an aggregated query Elasticsearch	3	1243	July 6, 2017
Term aggregations and filtering values with the Java client Elasticsearch	2	2085	July 6, 2017
How to exclude keys by multi-terms/composite aggregations Elasticsearch	1	295	December 25, 2022
Exclude Significant Term Aggregation With Different Field Elasticsearch	2	1288	November 29, 2019

Exclude specific terms from term aggregation's buckets list

Related topics