Synonym search filter

ronyarmon · November 13, 2018, 7:33am

Hello, I'm testing a search engine that should retrieve documents with synonyms to the search term.
Version: 6.4.2
Example:
Search term='good'; Synonyms='right, in_effect, proficient, in_force, unspoiled'
Index: syn_search_test

Code:

PUT /syn_search_test
    {
      "settings": {
        "index": {
          "analysis": {
            "filter": {
              "search_synonym_filter": {
                "type": "synonym",
                "lenient": true,
                "synonyms": ["right, in_effect, proficient, in_force, unspoiled"]
              },
              "analyzer": {
                "search_synonyms": {
                  "type": "custom",
                  "tokenizer": "keyword",
                  "filter": ["lowercase", "search_synonym_filter"]
                }
              }
            }
          }
        }
      }
    }

I'm getting the following error even after deleting and rebuilding the index with the documents:

{
"error": {
"root_cause": [
{
"type": "resource_already_exists_exception",
"reason": "index [syn_search_test/1Q38vCelTAuxVagoiKqRrg] already exists",
"index_uuid": "1Q38vCelTAuxVagoiKqRrg",
"index": "syn_search_test"
}
],
"type": "resource_already_exists_exception",
"reason": "index [syn_search_test/1Q38vCelTAuxVagoiKqRrg] already exists",
"index_uuid": "1Q38vCelTAuxVagoiKqRrg",
"index": "syn_search_test"
},
"status": 400
}

Can you tell me what am I doing wrong?
Cheers,
Rony

cbuescher · November 13, 2018, 9:28am

This looks like the deletion of the index didn't work as expected. Which steps did you take to delete and reindex? What is the request where you are getting this error?

ronyarmon · November 13, 2018, 10:05am

I'm using requests (Python) as follows:
#delete the old version
response = requests.delete('http://localhost:9200/syn_search_test?pretty')

#create the new version
response = requests.put('http://localhost:9200/syn_search_test?pretty') 
print (json.loads(json.dumps (response.text)))

#check indices list
response = requests.get('http://localhost:9200/_cat/indices?v')
print (json.loads(json.dumps (response.text)))

I'm getting the following response indicating that a new index was created. I'm using the same statements to delete and re-create the index which is searchable and seems to work fine:

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open syn_search_test 1Q38vCelTAuxVagoiKqRrg 5 1 0 0 1.1kb 1.1kb
yellow open customer wol9nQkzQ9igbEqLlsA9HA 5 1 0 0 1.2kb 1.2kb
yellow open emails fLI6MBkTSEG8qjMcaVnHpQ 5 1 0 0 1.2kb 1.2kb
green open .kibana au3mvglaRF-nEuVIQeaRuw 1 0 3 0 14.8kb 14.8kb

cbuescher · November 13, 2018, 10:27am

So you issue the above PUT statement after you have programatically created the index? That won't work because the index already exists (like the exception says). You either need to create the index programmatically with all the analysis settings already (don't know how this works with the python client to be honest) or you can update the index analysis later, but you will need to close and later reopen the index and use the "_settings" endpoint like so:

POST /syn_search_test/_close

PUT /syn_search_test/_settings
{
  "analysis": {
    "filter": {
      "search_synonym_filter": {
        "type": "synonym",
        "lenient": true,
        "synonyms": [
          "right, in_effect, proficient, in_force, unspoiled"
        ]
      }
    },
    "analyzer": {
      "search_synonyms": {
        "type": "custom",
        "tokenizer": "keyword",
        "filter": [
          "lowercase",
          "search_synonym_filter"
        ]
      }
    }
  }
}

POST /syn_search_test/_open

ronyarmon · November 14, 2018, 7:03am

Thanks Christoph, I've missed that and updating did the trick. But I cannot use this statement to update the search criteria. My idea was that when searching for one of the words (say 'right') I'll get the sentences having with the other synonyms as a results.
But executing:
GET /syn_search_test/_search
{
"query": {
"match": {
"text": {
"query": "in_effect",
"analyzer": "search_synonyms"
}
}
}
}
I'm getting:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.9808292,
"hits": [
{
"_index": "syn_search_test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808292,
"_source": {
"text": "This dog is the in_effect one"
}
}
]
}
}

What am I doing wrong?

cbuescher · November 14, 2018, 7:54am

What the mapping for the "text" field and what is an example of a document you expect to find with the above query but don't?

ronyarmon · November 14, 2018, 11:15am

To test synonym search I produced and loaded the following sentences to the field text in each document: >

'\n{\n "text":"This dog is the well one" \n}',
'\n{\n "text":"This dog is the in_force one" \n}',
'\n{\n "text":"This dog is the serious one" \n}',
'\n{\n "text":"This dog is the undecomposed one" \n}',
'\n{\n "text":"This dog is the commodity one" \n}',
'\n{\n "text":"This dog is the honorable one" \n}',
'\n{\n "text":"This dog is the skilful one" \n}',
'\n{\n "text":"This dog is the dependable one" \n}',
'\n{\n "text":"This dog is the expert one" \n}',
'\n{\n "text":"This dog is the honest one" \n}'

The synonyms (updated in the filter): "well, in_force, serious, undecomposed, commodity"

Executing:
GET /syn_search_test/_search
{
"query": {
"match" : {
"text" : "well"
}
}
}

Gets:

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "syn_search_test",
"_type": "_doc",
"_id": "1",
"_score": 0.6931472,
"_source": {
"text": "This dog is the well one"
}
}
]
}
}

I want to get the documents where text= in_force/serious,/ undecomposed/ commodity as well

cbuescher · November 14, 2018, 1:38pm

I might have missed it in your last response, but what is the mapping for the "text" field? Or the whole index for that matter (e.g. output of GET /syn_search_test/_mapping)

ronyarmon · November 14, 2018, 2:17pm

{
  "syn_search_test": {
    "mappings": {
      "_doc": {
        "properties": {
          "text": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

cbuescher · November 14, 2018, 5:46pm

I tried to re-create your whole example now and all seems to work well for me, at least on 6.4.3.
See my the reproduction below to check where the differences might be? I haven't asked yet but which version of ES are you using?

DELETE /syn_search_test

PUT /syn_search_test
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "search_synonym_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [
              "well, in_force, serious, undecomposed, commodity"
            ]
          }
        },
        "analyzer": {
          "search_synonyms": {
            "type": "custom",
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "search_synonym_filter"
            ]
          }
        }
      }
    }
  }
}

POST /syn_search_test/_doc/_bulk
{ "index" : { "_id" : "1" } }
{ "text":"This dog is the well one"}
{ "index" : { "_id" : "2" } }
{ "text":"This dog is the in_force one"}
{ "index" : { "_id" : "3" } }
{ "text":"This dog is the serious one" }
{ "index" : { "_id" : "4" } }
{ "text":"This dog is the undecomposed one" }
{ "index" : { "_id" : "5" } }
{ "text":"This dog is the commodity one" }
{ "index" : { "_id" : "6" } }
{ "text":"This dog is the honorable one" }
{ "index" : { "_id" : "7" } }
{ "text":"This dog is the skilful one" }
{ "index" : { "_id" : "8" } }
{ "text":"This dog is the dependable one" }
{ "index" : { "_id" : "9" } }
{ "text":"This dog is the expert one" }
{ "index" : { "_id" : "10" } }
{ "text":"This dog is the honest one" }


GET /syn_search_test/_search
{
  "query": {
    "match": {
      "text": {
        "query": "well",
        "analyzer": "search_synonyms"
      }
    }
  }
}

Gives:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.2039728,
    "hits": [
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "5",
        "_score": 1.2039728,
        "_source": {
          "text": "This dog is the commodity one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.9808292,
        "_source": {
          "text": "This dog is the in_force one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.9808292,
        "_source": {
          "text": "This dog is the undecomposed one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.6931472,
        "_source": {
          "text": "This dog is the well one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "text": "This dog is the serious one"
        }
      }
    ]
  }
}

ronyarmon · November 15, 2018, 6:53am

Brilliant, problem solved though we did use the same statements to (re)produce the index. I was using 6.4.2 and upgraded to 6.5 where I ran your script. Could it be a version issue? In any case, many thanks for your help.

cbuescher · November 15, 2018, 9:38am

I used 6.4.3 when trying and I don't think it differs at all from 6.4.2 in those regards. So I think something might have been off somewhere else, but great it works now.

system · December 13, 2018, 9:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I have got a little Problem with my synonym filter Elasticsearch	5	565	July 6, 2017
Synonyms not being used in search results Elasticsearch	6	271	February 22, 2023
Synonym search in dictionary elasticsearch not result Elasticsearch	3	549	June 15, 2017
Custom Search analyzer Elasticsearch	4	567	January 12, 2017
Synonyms relevance help Elasticsearch	7	558	December 27, 2021

Synonym search filter

Related topics