Synonyms as option at query time


(Kåre Jonsson) #1

My process is to create an index offline and then move it to the production environment. Synonyms are easy enough to implement but that the risc is to create a lot of hits. I want to offer the users the option to use synonyms or not. The first problems I've come across are:

  • Mapping could not contain multiple document types
  • The analyzed field is the same in multiple mappings

I obviously do not want to create two indexes for the cases with and without synonyms.

Is there a best practice or some success-story about synonyms optionally at search time?


(Luiz Santos) #2

Hi @karejonsson,

You could create two different analyzers (with and without synonyms) and use multi-fields in the mapping:

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym": {
          "type": "synonym",
          "synonyms": [
            "universe, cosmos"
          ]
        }
      },
      "analyzer": {
        "analyzer_with_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "synonym"
          ]
        },
        "analyzer_without_synonyms": {
          "tokenizer": "standard",
          "filter": []
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "field1": {
          "type": "text",
          "analyzer": "analyzer_without_synonyms",
          "fields": {
            "synonym": {
              "type": "text",
              "analyzer": "analyzer_with_synonyms"
            }
          }
        }
      }
    }
  }
} 

When you search in field1 it won't use synonyms:

POST my_index/doc/1
{
  "field1": "universe"
}

GET my_index/_search
{
  "query": {
    "match": {
      "field1": "cosmos"
    }
  }
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

And if you search in field1.synonym it will use synonyms:

GET my_index/_search
{
  "query": {
    "match": {
      "field1.synonym": "cosmos"
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.46029136,
    "hits": [
      {
        "_index": "my_index",
        "_type": "doc",
        "_id": "1",
        "_score": 0.46029136,
        "_source": {
          "field1": "universe"
        }
      }
    ]
  }
}

Hope it helps.

Cheers,
Luiz Santos


(Kåre Jonsson) #3

Thanks Luiz
I made an implementation according to your details and I am very satisfied with it. The solution I found on my own had double fields with double analysis to fit into what I knew about elastic. This is so much better.
Kind regards,
Kåre Jonsson


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.