Exclude specific terms from term aggregation's buckets list


#1

Hello,

Is there any way to filter the searched term from the results?

I'm using a standard tokeniser, the field "word" usually contains values like "green table", "big table", "yellow tables".. my aggregation will put the words "table" or "tables" at top as they're the most frequent.

In this example, I don't want the word "table" in the buckets list results.

By the way, regarding the query_string, whats the most efficient way (performance wise) to search for words that contain a word?

GET newindex/_search
    {
      "query": {
        "query_string": {
          "default_field": "word", 
          "query" : "*table*"
        }
      },
      "aggs" : {
          "tables" : {
              "terms" : { 
                "field" : "word.s",
                "size" : 100
              }
          }
      },
      "size" : 0
    }

#2

If anyone will have the same problem, the easiest solution is to add Exclude parameter:

GET newindex/_search
    {
      "query": {
        "query_string": {
          "default_field": "word", 
          "query" : "*table*"
        }
      },
      "aggs" : {
          "tables" : {
              "terms" : { 
                "field" : "word.s",
                "exclude": ".*table.*"
                "size" : 100
              }
          }
      },
      "size" : 0
    }

(David Pilato) #3

There is an exclude parameter in aggs.
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-aggregations-bucket-terms-aggregation.html#_filtering_values_3

About wildcard performances, prefer using ngrams. You ll pay the price at index time instead of query time.


#4

How to get full word tokens with Ngrams?
With a Ngram tokeniser here, the returned tokens would be "tab", "le " etc, can't aggregate on that as the buckets wouldn't make sense


(David Pilato) #5

Use multi-fields: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

The same content can use ngrams for search (text type) and no transformation as a keyword type on which you can compute aggs.


#6

Yep there was much dilemma a few days ago regarding that.. but in the example I gave, a keyword type would do buckets on "green table" instead on "green" and "table", or is there a way to achieve the same?


(David Pilato) #7

You can use a text field with fielddata on. Might work


#8

Wasn't sure if I tried, so I've done it again, it doesn't produce the wanted effect.
I know that using a standard analyser on a text field isn't really an optimal approach, but I didn't see any other way.
Thanks for the effort though. :slight_smile:


(David Pilato) #9

Share what you got and what you want.


#10
  1. There are lots of special characters in the words field
  2. The words field can contain multiple words and every word has to be a separate "entity" (token)

That's pretty much why I need to use a standard analyser on a text field with fielddata turned on.

The goal is to count the number of occurrences of each word within all of the words fields on the cluster (that's the query above). I really gave a lot of consideration for other options, researched a lot, this seems like the only option for my use case.

On the other hand, do you have any ideas how I could convert the query above to count every word within the words field and sort them based on the count number?
The current query produces a doc_count, which is the number of documents that contain the word, but some documents contain a word multiple times, so it isn't very precise.


(David Pilato) #11

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.