Word Delimiter Graph Token + Synonym Graph Token

Kushikawa · July 16, 2021, 7:30pm

I want to use a word delimiter graph token to avoid spliting terms with "+" and "-", e.g. so "3+c" doesn't change to ["3","c"] and stays ["3+c"].
I also want to define a list of synonyms to be applied to this field after filter tokenization.

Documentation says:

If you need to build analyzers that include both multi-token filters and synonym filters, consider using the multiplexer filter, with the multi-token filters in one branch and the synonym filter in the other.

but I don't see how my delimiter can generate multiple tokens since catenate_all, catenate_numbers, catenate_words and preserve_original are all false.

But when I tried to create the index without the multiplexer:

"my_analyzer": {
    "type": "custom",
    "tokenizer": "keyword",
    "filter": [
        "my_delimiter","my_synonyms"
    ]
}

it gives me this error:
Token filter [my_delimiter] cannot be used to parse synonyms

And when I tried with multiplexer:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "my_multiplexer"
                    ]
                }
            },
            "filter": {
                "my_multiplexer": {
                    "type": "multiplexer",
                    "filters": [
                        "my_delimiter",
                        "my_synonyms"
                    ]
                },
                "my_synonyms": {
                    "type": "synonym_graph",
                    "synonyms_path": "analysis/synonym.txt"
                },
                "my_delimiter": {
                    "type": "word_delimiter_graph",
                    "type_table": [
                        "+ => ALPHA",
                        "- => ALPHA"
                    ],
                    "split_on_case_change": false,
                    "split_on_numerics": false,
                    "stem_english_possessive": false
                }
            }
        }
    }
}

it continues giving an error, but now is "Increment must be zero or greater: -1"
I also tried to use flatten_graph without success.
Where am I doing wrong? Is my request of the multiplexer filter right?
Thanks in advance

system · August 13, 2021, 7:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonym_graph token filter and synonyms including comma's Elasticsearch	6	580	November 29, 2022
Tokenizer works after Synonym (graph) token filter too? Elasticsearch	3	416	September 5, 2021
Elasticsearch synonym_graph filter not giving all tokens Elasticsearch	1	374	November 6, 2020
ElasticSearch synonym and word delimiter analyzer are not compatible Elasticsearch	5	2373	July 6, 2017
Synonym and synonym graph filters Elasticsearch	1	559	February 17, 2020

Word Delimiter Graph Token + Synonym Graph Token

Related topics