Hi,
I encounter an issue with elasticsearch shingle filter. As the matter of fact, when i combine shingle filter and stopwords, it seems that the option output_unigrams, which I set to false, is no more taken into account.
This is the configuration of the analyzer I'm using:
"shingle_french" => {
"tokenizer" : "standard",
"filter": ["standard", "lowercase", "french_stop", "filter_shingle"]
},
And the filters:
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 5,
"min_shingle_size": 2,
"output_unigrams": false,
"filler_token": "",
"output_unigrams_if_no_shingles": true
},
"french_stop": {
"type": "stop",
"stopwords": "_french_"
}
When I analyse the tokens generated by the query "porte de garage", I have this result:
porte
start: 0 end: 9 pos: 1
porte garage
start: 0 end: 15 pos: 1
garage
start: 9 end: 15 pos: 2
However, I would like to only obtain one token: "porte garage"
What am I doing wrong?
Thank you in advance.