Taxonmy in ES 7.10 using search-time analyzer

Hi all!

Since I've been struggling with this for quite a while now, I decided to try my luck here :wink:

I'm working on a taxonomy search using managed vocabularies, as described in this article: Patterns for Elasticsearch Synonyms: Taxonomies and Managed Vocabularies - OpenSource Connections
Some background on the problem I want to solve:

  • mapping of the documents:
"synonyms-test" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "analyzer" : "default",
          "search_analyzer" : "synonym_operator"
        }
      }
    }
  }
  • documents in index:
"name":  "machineoperator" , 
"name": "woodoperator", 
"name" : "welder", 
"name" : "worker",
  • synonyms file:
machineoperator => machineoperator, operator, worker 
woodoperator => woodoperator, operator, worker 
welder => welder, handyman, worker

The synonyms file is located on an AWS server and loaded into a custom analyzer as follows:

PUT synonyms-test/_settings
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_operator": {
            "tokenizer": "whitespace",
            "filter": ["lowercase","my_synonyms"]
          }
        },
        "filter": {
          "my_synonyms": {
            "type": "synonym",
            "synonyms_path": "analyzers/F261139661",
            "updateable": true
          }
        }
      }
    }
  }
}

I want to show documents using this taxonomy, so a woodoperator is an operator, and an operator is a worker. If I then search for "woodoperator", the document containing "woodoperator" should be pushed to the top, with the "machineoperator" in second place (since he is also an operator and worker) and the "welder" and "worker" slotting in at third and fourth place. However, if I try out this query:

GET synonyms-test/_search 
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name" : {
            "query" : "woodoperator", 
            "analyzer" : "synonym_operator"
          }
        }}
      ], 
      "should": [
        {"match": {
          "name": "woodoperator"
        }}
      ]
    }
  }
}

I get the following output:

"hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.3862942,
    "hits" : [
      {
        "_index" : "synonyms-test",
        "_type" : "_doc",
        "_id" : "9FmnH4EB7ewmZOJ5D3O9",
        "_score" : 1.3862942,
        "_source" : {
          "name" : "worker"
        }
      },
      {
        "_index" : "synonyms-test",
        "_type" : "_doc",
        "_id" : "8VmFH4EB7ewmZOJ5VnOF",
        "_score" : 0.5753642,
        "_source" : {
          "name" : "woodoperator"
        }
      }
    ]
  }

Which is not exactly what I expected the output to be.

My reasoning was as follows:

  • the search string "woodoperator" gets split up into "woodoperator", "operator", "worker"
  • the strings in the "name" field get split up into their synonyms at runtime, e.g "machineoperator" becomes "machineoperator", "operator", "worker"
  • the query tries to match as many string as possible for the above two expansions

In the article, the synonyms list is hard-coded into the analyzer. Since my synonym list will be updated along the way, I had to use a text file in an updateable filter. It's also not possible to set de default analyzer of the "name" or "name.keyword" field to "synonym_operator" since this would require re-indexing all documents whenever there is a change in the synonyms file (as far as I understand).

So my question is really: is there a way to get these taxonomical results without the need for setting the default analyzer in the mapping to "synonym_operator" or am I doing something else wrong?

Edit: I've also noticed that the
POST /synonyms-test/_reload_search_analyzer
query does not work for my index. I assume this means that the "synonym_operator" does not even get loaded in properly into the index so that may be a cause of failure as well.

I hope my question/problem is described in a clear way, it's a lot to get your head around. Thanks!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.