Synonym search filter


(Rony Armon) #1

Hello, I'm testing a search engine that should retrieve documents with synonyms to the search term.
Version: 6.4.2
Example:
Search term='good'; Synonyms='right, in_effect, proficient, in_force, unspoiled'
Index: syn_search_test

Code:

PUT /syn_search_test
    {
      "settings": {
        "index": {
          "analysis": {
            "filter": {
              "search_synonym_filter": {
                "type": "synonym",
                "lenient": true,
                "synonyms": ["right, in_effect, proficient, in_force, unspoiled"]
              },
              "analyzer": {
                "search_synonyms": {
                  "type": "custom",
                  "tokenizer": "keyword",
                  "filter": ["lowercase", "search_synonym_filter"]
                }
              }
            }
          }
        }
      }
    }

I'm getting the following error even after deleting and rebuilding the index with the documents:

{
"error": {
"root_cause": [
{
"type": "resource_already_exists_exception",
"reason": "index [syn_search_test/1Q38vCelTAuxVagoiKqRrg] already exists",
"index_uuid": "1Q38vCelTAuxVagoiKqRrg",
"index": "syn_search_test"
}
],
"type": "resource_already_exists_exception",
"reason": "index [syn_search_test/1Q38vCelTAuxVagoiKqRrg] already exists",
"index_uuid": "1Q38vCelTAuxVagoiKqRrg",
"index": "syn_search_test"
},
"status": 400
}

Can you tell me what am I doing wrong?
Cheers,
Rony


(Christoph) #2

This looks like the deletion of the index didn't work as expected. Which steps did you take to delete and reindex? What is the request where you are getting this error?


(Rony Armon) #3

I'm using requests (Python) as follows:
#delete the old version
response = requests.delete('http://localhost:9200/syn_search_test?pretty')

#create the new version
response = requests.put('http://localhost:9200/syn_search_test?pretty') 
print (json.loads(json.dumps (response.text)))

#check indices list
response = requests.get('http://localhost:9200/_cat/indices?v')
print (json.loads(json.dumps (response.text)))

I'm getting the following response indicating that a new index was created. I'm using the same statements to delete and re-create the index which is searchable and seems to work fine:

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open syn_search_test 1Q38vCelTAuxVagoiKqRrg 5 1 0 0 1.1kb 1.1kb
yellow open customer wol9nQkzQ9igbEqLlsA9HA 5 1 0 0 1.2kb 1.2kb
yellow open emails fLI6MBkTSEG8qjMcaVnHpQ 5 1 0 0 1.2kb 1.2kb
green open .kibana au3mvglaRF-nEuVIQeaRuw 1 0 3 0 14.8kb 14.8kb


(Christoph) #4

So you issue the above PUT statement after you have programatically created the index? That won't work because the index already exists (like the exception says). You either need to create the index programmatically with all the analysis settings already (don't know how this works with the python client to be honest) or you can update the index analysis later, but you will need to close and later reopen the index and use the "_settings" endpoint like so:

POST /syn_search_test/_close

PUT /syn_search_test/_settings
{
  "analysis": {
    "filter": {
      "search_synonym_filter": {
        "type": "synonym",
        "lenient": true,
        "synonyms": [
          "right, in_effect, proficient, in_force, unspoiled"
        ]
      }
    },
    "analyzer": {
      "search_synonyms": {
        "type": "custom",
        "tokenizer": "keyword",
        "filter": [
          "lowercase",
          "search_synonym_filter"
        ]
      }
    }
  }
}

POST /syn_search_test/_open

(Rony Armon) #5

Thanks Christoph, I've missed that and updating did the trick. But I cannot use this statement to update the search criteria. My idea was that when searching for one of the words (say 'right') I'll get the sentences having with the other synonyms as a results.
But executing:
GET /syn_search_test/_search
{
"query": {
"match": {
"text": {
"query": "in_effect",
"analyzer": "search_synonyms"
}
}
}
}
I'm getting:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.9808292,
"hits": [
{
"_index": "syn_search_test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808292,
"_source": {
"text": "This dog is the in_effect one"
}
}
]
}
}

What am I doing wrong?


(Christoph) #6

What the mapping for the "text" field and what is an example of a document you expect to find with the above query but don't?


(Rony Armon) #7

To test synonym search I produced and loaded the following sentences to the field text in each document: >

'\n{\n "text":"This dog is the well one" \n}',
'\n{\n "text":"This dog is the in_force one" \n}',
'\n{\n "text":"This dog is the serious one" \n}',
'\n{\n "text":"This dog is the undecomposed one" \n}',
'\n{\n "text":"This dog is the commodity one" \n}',
'\n{\n "text":"This dog is the honorable one" \n}',
'\n{\n "text":"This dog is the skilful one" \n}',
'\n{\n "text":"This dog is the dependable one" \n}',
'\n{\n "text":"This dog is the expert one" \n}',
'\n{\n "text":"This dog is the honest one" \n}'

The synonyms (updated in the filter): "well, in_force, serious, undecomposed, commodity"

Executing:
GET /syn_search_test/_search
{
"query": {
"match" : {
"text" : "well"
}
}
}

Gets:

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "syn_search_test",
"_type": "_doc",
"_id": "1",
"_score": 0.6931472,
"_source": {
"text": "This dog is the well one"
}
}
]
}
}

I want to get the documents where text= in_force/serious,/ undecomposed/ commodity as well


(Christoph) #8

I might have missed it in your last response, but what is the mapping for the "text" field? Or the whole index for that matter (e.g. output of GET /syn_search_test/_mapping)


(Rony Armon) #9
{
  "syn_search_test": {
    "mappings": {
      "_doc": {
        "properties": {
          "text": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

(Christoph) #10

I tried to re-create your whole example now and all seems to work well for me, at least on 6.4.3.
See my the reproduction below to check where the differences might be? I haven't asked yet but which version of ES are you using?

DELETE /syn_search_test

PUT /syn_search_test
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "search_synonym_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [
              "well, in_force, serious, undecomposed, commodity"
            ]
          }
        },
        "analyzer": {
          "search_synonyms": {
            "type": "custom",
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "search_synonym_filter"
            ]
          }
        }
      }
    }
  }
}

POST /syn_search_test/_doc/_bulk
{ "index" : { "_id" : "1" } }
{ "text":"This dog is the well one"}
{ "index" : { "_id" : "2" } }
{ "text":"This dog is the in_force one"}
{ "index" : { "_id" : "3" } }
{ "text":"This dog is the serious one" }
{ "index" : { "_id" : "4" } }
{ "text":"This dog is the undecomposed one" }
{ "index" : { "_id" : "5" } }
{ "text":"This dog is the commodity one" }
{ "index" : { "_id" : "6" } }
{ "text":"This dog is the honorable one" }
{ "index" : { "_id" : "7" } }
{ "text":"This dog is the skilful one" }
{ "index" : { "_id" : "8" } }
{ "text":"This dog is the dependable one" }
{ "index" : { "_id" : "9" } }
{ "text":"This dog is the expert one" }
{ "index" : { "_id" : "10" } }
{ "text":"This dog is the honest one" }


GET /syn_search_test/_search
{
  "query": {
    "match": {
      "text": {
        "query": "well",
        "analyzer": "search_synonyms"
      }
    }
  }
}

Gives:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.2039728,
    "hits": [
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "5",
        "_score": 1.2039728,
        "_source": {
          "text": "This dog is the commodity one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.9808292,
        "_source": {
          "text": "This dog is the in_force one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.9808292,
        "_source": {
          "text": "This dog is the undecomposed one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.6931472,
        "_source": {
          "text": "This dog is the well one"
        }
      },
      {
        "_index": "syn_search_test",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "text": "This dog is the serious one"
        }
      }
    ]
  }
}

(Rony Armon) #11

Brilliant, problem solved though we did use the same statements to (re)produce the index. I was using 6.4.2 and upgraded to 6.5 where I ran your script. Could it be a version issue? In any case, many thanks for your help.


(Christoph) #12

I used 6.4.3 when trying and I don't think it differs at all from 6.4.2 in those regards. So I think something might have been off somewhere else, but great it works now.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.