Filter order


#1

i found this in documentation "More-specific filters should be placed before less-specific filters in order to exclude as many documents as possible, as early as possible." But in custom language analyzer example we have " "analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}", where the stemmer is placed before stopwords and keywords. what is the right way?


(Mark Walkom) #2

A filter is totally different from an analyser though.


#3

what is the order of filters in this analyzer?


#4

Am i right that keywords filtering should be done before stemming?


(Dan Tuffery) #5

This is referring to query filters that you send in a search request, not index token filters.

The english_keywords filter is used if you want to specify a list of keywords that shouldn't be stemmed (it is empty by default), so the order is correct.


#6

What about english_possessive_stemmer? it will be done before keywords. Is it normal?


#7

BTW what is main difference between indexing filters and query filters, in a nutshell?


(Dan Tuffery) #8

Yes, it will remove the possession ('s) from any nouns, it is very unlikely that keywords would include possessive nouns.


#9

thx, and Should I set "term_vector":"yes" for the fields which use this analyzer?


(system) #10