Multi words synonyms with tokens between the synonym's parts

Hi,

given, that I have following synonyms configuration:

"želvárium,aquaterrarium for turtles"
(želvárium is czech short-term for aquarium for turtles)

Is there a way that allowes me to match a document with following title when searching "želvárium"?

"pacific aquaterrarium 50x25x19cm for water turtles"

With the settings below, I can match only "pacific aquaterrarium for turtles" - there is a problem with the tokens "50x25x19cm" and "water" between "aquaterrarium" and "for turtles".

Setting for this example:

PUT /syntest
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_synonyms": {
            "tokenizer": "standard",
            "filter": [
                "my_synonym_graph"
            ]
          }
        },
        "filter": {
          "my_synonym_graph": {
            "type": "synonym_graph",
            "lenient": true,
            "synonyms": [
              "želvárium,aquaterrarium for turtles"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
         "type": "text",
         "search_analyzer": "my_synonyms"
      }
    }
  }  
}

POST /syntest/_doc
{
  "title": "pacific aquaterrarium 50x25x19cm for water turtles"
}

GET /syntest/_search
{
  "query" : {
    "match": {
      "title": {
        "query": "želvárium",
        "analyzer": "my_synonyms"
      }
    }
  }
}

Looks like one possible solution is to use match_phrase with slop parameter.
In this case:

GET /syntest/_search
{
  "query" : {
    "match_phrase": {
      "title": {
        "query": "želvárium",
        "analyzer": "my_synonyms",
        "slop": 2
       }
    }
  }
}

would allow to ignore those two tokens [50x25x19cm] and [water] and return the desired document.
Not sure if this is the right way hot to solve it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.