Synonyms Exact match Multiword Search

Hi Team,

We are looking for solution to search synonyms with exact match.

For Example User is searching string - abc xyz

abc has synonyms - abc1 abc2
xyz has synonyms - xyz1 xyz2

Data in Index
Article1 - test abc1 abc2
Article2 - test abc1
Article3 - test xyz1 xyz2
Article4 - test xyz1
Article4 - test xyz2
Article5 - test xyz1 data xyz2
Article6 - test abc1 data abc2
Article7 - abc abc1

Expected Result in Response-

  1. Article1
  2. Article3

Please let me know which approach should we take to acheive this.

Hi @Sahil5

I did a example but divide terms search.

ex:

GET test/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "name": "abc"
          }
        },
        {
          "match_phrase": {
            "name": "xyz"
          }
        }
      ]
    }
  }
}

Index

PUT test
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonym_filter"
            ]
          }
        },
        "filter": {
          "synonym_filter": {
            "type": "synonym_graph",
            "expand": true,
            "synonyms": [
              "abc => abc1 abc2",
              "xyz => xyz1 xyz2"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "search_analyzer": "synonym_analyzer"
      }
    }
  }
}

Doc

POST test/_bulk
{"index":{}}
{"name":"Article1 - test abc1 abc2"}
{"index":{}}
{"name":"Article2 - test abc1"}
{"index":{}}
{"name":"Article3 - test xyz1 xyz2"}
{"index":{}}
{"name":"Article4 - test xyz1"}
{"index":{}}
{"name":"Article4 - test xyz2"}
{"index":{}}
{"name":"Article5 - test xyz1 data xyz2"}
{"index":{}}
{"name":"Article6 - test abc1 data abc2"}
{"index":{}}
{"name":"Article7 - abc abc1"}

Results:

    "hits": [
      {
        "_index": "test",
        "_id": "UiYaUYYBXRBApbJcGNqx",
        "_score": 1.9216721,
        "_source": {
          "name": "Article1 - test abc1 abc2"
        }
      },
      {
        "_index": "test",
        "_id": "VCYaUYYBXRBApbJcGNqx",
        "_score": 1.8387749,
        "_source": {
          "name": "Article3 - test xyz1 xyz2"
        }
      }
    ]```

Hi @RabBit_BR,

Thanks, but there are some performance issues with this approach.
Suppose user has entered a string of 10 words then we needs to hit 10 match_phrase queries which will slow down the response time or may leads to heavy load on server.

More the number of words the query will be more heavy.

Any thoughts on this.

Did you measure the impact of the query with 10 words?
I've written queries with up to 6 clauses and the execution time was quite good.
It is also quite unusual for a user to search for a text with 10 words.

You can ignore the part of dividing the terms but with that you would ask for your requirement to obtain the Article1 and Article3 docs, also returning the Article7 doc.

Anyway, it was a quick simulation I did, you can improve what was presented.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.