Stopwords in term aggregation


(knob1) #1

I want to filter the stop words from the term aggregation, so I defined an analyzer with a custom stop word list (around 1000 words) and applied it to my index.

If I query the index to do a term aggregation I get a result where the stopwords are dominating the result.

I tested the analyzer with

http://192.168.1.37:9200/myindex/_analyze?analyzer=custom_analyzer

And the analyzer seems to work as it should. But in the term aggregation the stopwords are still shown.

Or should I really use the exclude parameter in the term aggregation for the complete 1000 stopword list for every query? This can't be the way to go...

Did I forget something?!


(Doug Turnbull) #2

Did you reindex after applying the stopwords?

Is this a query analyzer or an index analyzer?


(knob1) #3

I did not specify query / index analyzer. Should I do it and what should I choose?
I indexed after applying the configuration.

{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop": {
          "type": "stop",
          "name": "german",
          "stopwords_path": ".../stopwords.txt"
        }
      },
      "analyzer": {
        "custom_medaes_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "my_stop",
            "lowercase"
          ]
        }
      }
    },
    "mappings": {
      "text": {
        "properties": {
          "Inhalt": {
            "type": "string",
            "analyzer": "custom_medaes_analyzer"
          }
        }
      }
    }
  }
}

(Ivan Brusic) #4

The aggregation should only work on the data present in the field.

Was the field indexed with that analyzer? Use the API to determine what the
actual mapping is. When using the analyze API, use the field parameter and
not the analyzer param, so that the actual mapped analyzer is used.

Cheers,

Ivan


(knob1) #5

I looked up the mapping with http://192.168.1.37:9200/tester/_mapping
This is what I got:

{
    -"tester": {
        -"mappings": {
            -"text": {
                -"properties": {
                    -"Dateiname": {
                        "type": "string"
                    },
                    -"Größe": {
                        "type": "long"
                    },
                    -"Inhalt": {
                        "type": "string"
                    },
                    -{
                        "type": "string"
                    }
                }
            }
        }
    }
}

If I understood you correctly, it seems that the "Inhalt" field wasn't analyzed with the analyzer.
But I don't get what you mean by

use the field parameter and not the analyzer param, so that the actual mapped analyzer is used.
Could you elaborate this ?

Thank you for your answer!


(knob1) #6

Ok, I think I found the error.
Thanks for pointing me into the mapping direction.
It was just a simple "}" error.
The mappings configuration was placed inside of the settings object...

So much stress for such a dump error...

Thank You!


(Ivan Brusic) #7

Which is why I always tell people to use the API to find out what the
mapping is and NOT what they think the mapping is.

When you used the analyze API, you specified the analyzer to use. Instead,
you specify the field you want to use so that the exact analyzer defined in
the mapping is used. Look at the last example:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-analyze.html

Cheers,

Ivan


(system) #8