Trouble with multiple synonym filters in a single analyzer


#1

When playing around with various schemes for implementing synonyms, I'll occasionally get 400s when trying to create the index, with the message 'failed to build synonyms'.

Below are settings that replicate the issue:

{
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "custom_analyzer" : {
            "tokenizer" : "whitespace",
            "filter" : [
              "syn_filter_1",
              "syn_filter_2"
            ],
            "type" : "custom"
          }
        },
        "filter" : {
          "syn_filter_1" : {
            "type" : "synonym",
            "synonyms" : [
                "foo => foo, bar"
            ]
          },
          "syn_filter_2": {
            "type": "synonym",
            "synonyms": [
                "baz => foo"
            ]
          }
        }
      }
    }
  }
}

I have a few questions:

  1. What's wrong with the synonym graph that this generates?
  2. Are there any general principles I need to follow to avoid errors like this? I imagine issues like this could be difficult to debug in larger files.
  3. Is there an easy way to get a more detailed summary of why synonyms would fail to build, or at least which lines of text are problematic?

#2

Here's another example that leads me to believe synonyms are not quite working as I expect:

{
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "custom_analyzer" : {
            "tokenizer" : "whitespace",
            "filter" : [
              "syn_filter_1",
              "syn_filter_2"
            ],
            "type" : "custom"
          }
        },
        "filter" : {
          "syn_filter_1" : {
            "type" : "synonym",
            "synonyms" : [
                "a => b"
            ]
          },
          "syn_filter_2": {
            "type": "synonym",
            "synonyms": [
                "a => c"
            ]
          }
        }
      }
    }
  }
}

If I send the following query to the analyze endpoint

{ "analyzer" : "custom_analyzer", "text" : "b" }

I get this response:

{
  "tokens": [
    {
      "token": "c",
      "start_offset": 0,
      "end_offset": 1,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

Where does this mapping from "b" to "c" come from?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.