Tokenizer works after Synonym (graph) token filter too?

I'm using Elasticsearch 7.7 on AWS Elasticsearch Service.

I thought token filters works after tokenizers and tokenizers do nothing after token filters.
(like Tokenizers = > Token filters => END)
But it seems tokenizers work again after Synonym graph token filters handle synonyms.
I wonder if it workds like Tokenizers = > Token filters => Tokenizers? => END.

I tested like this:

GET /index/_analyze
{
  "tokenizer": "standard",
  "filter": [{"type": "synonym_graph", "synonyms":["brown fox => brown fox,black cat"]}], 
  "text": "brown fox"
}

and got brown, black, fox, cat.

{
    "tokens": [
        {
            "token": "brown",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "black",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "fox",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 1,
            "positionLength": 2
        },
        {
            "token": "cat",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 2
        }
    ]
}

I expected I get only brown, fox because brown fox in the synonyms does not exists in the output of standard tokenizer.

Another example:

{
  "tokenizer": "standard",
  "filter": [{"type": "synonym_graph", "synonyms":["fox => fox,black cat"]}], 
  "text": "fox"
}

returned fox, black, cat but I expected fox, black cat because I thought tokenizer does not work after token filter to tokenize black cat.

Could you teach me the exact order in which tokenizers and token filters work?

Hi taichi,
a text analyzer operates in sequence. it has (in order):

  • 0 or more character filters
  • exactly one tokenizer
  • 0 or more token filters, applied in order
    in your case, the synonym graph token filter is the last operation.
    I suggest you to read about token graph to better understand how token filters work: Token graphs | Elasticsearch Guide [7.14] | Elastic

Furthermore, the synonyms in synonym_graph are tokenized and analyzed with the chain which precedes this filter. That explains why "brown fox" synonym is applied. see:
[Synonym graph token filter | Elasticsearch Guide [7.14] | Elastic]

Thank you!

Furthermore, the synonyms in synonym_graph are tokenized and analyzed with the chain which precedes this filter

I almost understood that but a little confused. The docs say

Elasticsearch will use the token filters preceding the synonym filter in a tokenizer chain to parse the entries in a synonym file

It just says it applies preceding token filters. Does it apply preceding tokenizers to the synonym words too?
In other words, I wonder if the "standard" tokenizer is applied to the synonym words (like brown fox and black cat).