I have a field where multiple values are stored:
field: ["testOneTwo", "testThreeFour"]
I would like to analyze this field with an edge_ngram
filter, but also remove duplicate tokens. I tried the unique
and remove_duplicates
filter.
Example Settings:
{
"settings": {
"analysis": {
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 24
}
},
"tokenizer": {
"edgengram": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 24,
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"testunique": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"edgengram_filter",
"unique"
]
},
"testremove": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"edgengram_filter",
"remove_duplicates"
]
},
"testedgeunique": {
"tokenizer": "edgengram",
"filter": [
"unique"
]
},
"testedgeremove": {
"tokenizer": "edgengram",
"filter": [
"remove_duplicates"
]
}
}
}
}
}
For each analyzer the _analyze
API shows the tokens t
, te
, tes
, test
two times.
When having a field where both values are stored in a single value like "testOneTwo testThreeFour"
it works. But this is not a solution for me as I use copy_to
and edge_ngram
as a token filter.
Any way to enforce this? Thanks!