i found this in documentation "More-specific filters should be placed before less-specific filters in order to exclude as many documents as possible, as early as possible." But in custom language analyzer example we have " "analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}", where the stemmer is placed before stopwords and keywords. what is the right way?
A filter is totally different from an analyser though.
what is the order of filters in this analyzer?
Am i right that keywords filtering should be done before stemming?
This is referring to query filters that you send in a search request, not index token filters.
The english_keywords
filter is used if you want to specify a list of keywords that shouldn't be stemmed (it is empty by default), so the order is correct.
What about english_possessive_stemmer? it will be done before keywords. Is it normal?
BTW what is main difference between indexing filters and query filters, in a nutshell?
Yes, it will remove the possession ('s
) from any nouns, it is very unlikely that keywords would include possessive nouns.
thx, and Should I set "term_vector":"yes" for the fields which use this analyzer?