Stopwords in term aggregation

knob1 · November 7, 2015, 3:48pm

I want to filter the stop words from the term aggregation, so I defined an analyzer with a custom stop word list (around 1000 words) and applied it to my index.

If I query the index to do a term aggregation I get a result where the stopwords are dominating the result.

I tested the analyzer with

http://192.168.1.37:9200/myindex/_analyze?analyzer=custom_analyzer

And the analyzer seems to work as it should. But in the term aggregation the stopwords are still shown.

Or should I really use the exclude parameter in the term aggregation for the complete 1000 stopword list for every query? This can't be the way to go...

Did I forget something?!

softwaredoug · November 7, 2015, 4:10pm

Did you reindex after applying the stopwords?

Is this a query analyzer or an index analyzer?

knob1 · November 7, 2015, 4:14pm

I did not specify query / index analyzer. Should I do it and what should I choose?
I indexed after applying the configuration.

{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop": {
          "type": "stop",
          "name": "german",
          "stopwords_path": ".../stopwords.txt"
        }
      },
      "analyzer": {
        "custom_medaes_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "my_stop",
            "lowercase"
          ]
        }
      }
    },
    "mappings": {
      "text": {
        "properties": {
          "Inhalt": {
            "type": "string",
            "analyzer": "custom_medaes_analyzer"
          }
        }
      }
    }
  }
}

Ivan · November 7, 2015, 4:31pm

The aggregation should only work on the data present in the field.

Was the field indexed with that analyzer? Use the API to determine what the
actual mapping is. When using the analyze API, use the field parameter and
not the analyzer param, so that the actual mapped analyzer is used.

Cheers,

Ivan

knob1 · November 7, 2015, 4:44pm

I looked up the mapping with http://192.168.1.37:9200/tester/_mapping
This is what I got:

{
    -"tester": {
        -"mappings": {
            -"text": {
                -"properties": {
                    -"Dateiname": {
                        "type": "string"
                    },
                    -"Größe": {
                        "type": "long"
                    },
                    -"Inhalt": {
                        "type": "string"
                    },
                    -{
                        "type": "string"
                    }
                }
            }
        }
    }
}

If I understood you correctly, it seems that the "Inhalt" field wasn't analyzed with the analyzer.
But I don't get what you mean by

use the field parameter and not the analyzer param, so that the actual mapped analyzer is used.
Could you elaborate this ?

Thank you for your answer!

knob1 · November 7, 2015, 5:08pm

Ok, I think I found the error.
Thanks for pointing me into the mapping direction.
It was just a simple "}" error.
The mappings configuration was placed inside of the settings object...

So much stress for such a dump error...

Thank You!

Ivan · November 7, 2015, 5:55pm

Which is why I always tell people to use the API to find out what the
mapping is and NOT what they think the mapping is.

When you used the analyze API, you specified the analyzer to use. Instead,
you specify the field you want to use so that the exact analyzer defined in
the mapping is used. Look at the last example:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-analyze.html

Cheers,

Ivan

Topic		Replies	Views
Term Aggregations and StopWords Elasticsearch	2	955	July 6, 2017
Analyser doesn't remove English stopwords Elasticsearch	3	442	June 4, 2018
Stopwords in analyzer doesn't seem to work Elasticsearch	3	387	June 26, 2020
My stopwords filter is not working Elasticsearch	5	1937	July 6, 2017
Use Custom Stop Words in Custom Terms Query Elasticsearch	2	944	December 12, 2018

Stopwords in term aggregation

Related topics