I'm using Elasticsearch 7.7 on AWS Elasticsearch Service.
I thought token filters works after tokenizers and tokenizers do nothing after token filters.
(like Tokenizers = > Token filters => END
)
But it seems tokenizers work again after Synonym graph token filters handle synonyms.
I wonder if it workds like Tokenizers = > Token filters => Tokenizers? => END
.
I tested like this:
GET /index/_analyze
{
"tokenizer": "standard",
"filter": [{"type": "synonym_graph", "synonyms":["brown fox => brown fox,black cat"]}],
"text": "brown fox"
}
and got brown, black, fox, cat
.
{
"tokens": [
{
"token": "brown",
"start_offset": 0,
"end_offset": 9,
"type": "SYNONYM",
"position": 0
},
{
"token": "black",
"start_offset": 0,
"end_offset": 9,
"type": "SYNONYM",
"position": 0,
"positionLength": 2
},
{
"token": "fox",
"start_offset": 0,
"end_offset": 9,
"type": "SYNONYM",
"position": 1,
"positionLength": 2
},
{
"token": "cat",
"start_offset": 0,
"end_offset": 9,
"type": "SYNONYM",
"position": 2
}
]
}
I expected I get only brown, fox
because brown fox
in the synonyms does not exists in the output of standard tokenizer.
Another example:
{
"tokenizer": "standard",
"filter": [{"type": "synonym_graph", "synonyms":["fox => fox,black cat"]}],
"text": "fox"
}
returned fox, black, cat
but I expected fox, black cat
because I thought tokenizer does not work after token filter to tokenize black cat
.
Could you teach me the exact order in which tokenizers and token filters work?