Synonyms not being used in search results

Hello, I have an index for services named 'testing_services':

'properties' => [
            'id' => ['type' => 'keyword'],
            'name' => [
                'type' => 'text',
                'analyzer' => 'english_analyser',
                'search_analyzer' => 'english_analyser',
                'fields' => [
                    'keyword' => ['type' => 'keyword'],
                ],
            ],
            'intro' => [
                'type' => 'text',
                'analyzer' => 'english_analyser',
                'search_analyzer' => 'english_analyser',
            ],
            'description' => [
                'type' => 'text',
                'analyzer' => 'english_analyser',
                'search_analyzer' => 'english_analyser',
            ],
            'wait_time' => ['type' => 'keyword'],
            'is_free' => ['type' => 'boolean'],
            'status' => ['type' => 'keyword'],
            'score' => ['type' => 'integer'],
            'organisation_name' => [
                'type' => 'text',
                'analyzer' => 'english_analyser',
                'search_analyzer' => 'english_analyser',
                'fields' => [
                    'keyword' => ['type' => 'keyword'],
                ],
            ],
            'taxonomy_categories' => [
                'type' => 'text',
                'analyzer' => 'english_analyser',
                'search_analyzer' => 'english_analyser',
                'fields' => [
                    'keyword' => ['type' => 'keyword'],
                ],
            ],
            'collection_categories' => ['type' => 'keyword'],
            'collection_personas' => ['type' => 'keyword'],
            'service_locations' => [
                'type' => 'nested',
                'properties' => [
                    'id' => ['type' => 'keyword'],
                    'location' => ['type' => 'geo_point'],
                ],
            ],
            'service_eligibilities' => [
                'type' => 'text',
                'analyzer' => 'english_analyser',
                'search_analyzer' => 'english_analyser',
                'fields' => [
                    'keyword' => ['type' => 'keyword'],
                ],
            ],
        ],

The settings for the index are:

'analysis' => [
                'analyzer' => [
                    'english_analyser' => [
                        'type' => 'custom',
                        'tokenizer' => 'standard',
                        'filter' => [
                            'lowercase',
                            'synonym',
                            'english_stop',
                            'stopwords'
                        ],
                    ],
                ],
                'filter' => [
                    'synonym' => [
                        'type' => 'synonym',
                        'synonyms' => $this->getThesaurus(),
                    ],
                    'english_stop' => [
                        'type' => 'stop',
                        'stopwords' => '_english_'
                    ],
                    'stopwords' => [
                        'type' => 'stop',
                        'stopwords' => $this->getStopWords(),
                    ],
                ],
            ],

The getThesaurus() and getStopwords() methods return an array of synonyms and stop words respectively.
The test synonyms are:

autism,autistic,asd
not drinking,dehydration,
dehydration,thirsty,drought

A call to:

http://elasticsearch:9200/testing_services/_analyze
            [
                'field' => 'name',
                'text' => 'Helping asd'
            ]

Shows the synonyms are being used:

"tokens" => array:4 [
    0 => array:5 [
      "token" => "helping"
      "start_offset" => 0
      "end_offset" => 7
      "type" => "<ALPHANUM>"
      "position" => 0
    ]
    1 => array:5 [
      "token" => "asd"
      "start_offset" => 8
      "end_offset" => 11
      "type" => "<ALPHANUM>"
      "position" => 1
    ]
    2 => array:5 [
      "token" => "autism"
      "start_offset" => 8
      "end_offset" => 11
      "type" => "SYNONYM"
      "position" => 1
    ]
    3 => array:5 [
      "token" => "autistic"
      "start_offset" => 8
      "end_offset" => 11
      "type" => "SYNONYM"
      "position" => 1
    ]
  ]

If I create a service with a name 'Helping asd' and search with the query 'autism':

"body" => array:3 [
    "query" => array:1 [
      "function_score" => array:2 [
        "query" => array:1 [
          "bool" => array:4 [
            "must" => []
            "should" => array:5 [
              0 => array:1 [
                "match" => array:1 [
                  "name" => array:3 [
                    "query" => "autism"
                    "boost" => 3
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              1 => array:1 [
                "match" => array:1 [
                  "organisation_name" => array:3 [
                    "query" => "autism"
                    "boost" => 3
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              2 => array:1 [
                "match" => array:1 [
                  "intro" => array:3 [
                    "query" => "autism"
                    "boost" => 2
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              3 => array:1 [
                "match" => array:1 [
                  "description" => array:3 [
                    "query" => "autism"
                    "boost" => 1.5
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              4 => array:1 [
                "match" => array:1 [
                  "taxonomy_categories" => array:3 [
                    "query" => "autism"
                    "boost" => 1
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
            ]
            "filter" => array:1 [
              0 => array:1 [
                "term" => array:1 [
                  "status" => "active"
                ]
              ]
            ]
            "minimum_should_match" => 1
          ]
        ]
        "functions" => array:1 [
          0 => array:1 [
            "field_value_factor" => array:3 [
              "field" => "score"
              "missing" => 1
              "modifier" => "ln1p"
            ]
          ]
        ]
      ]
    ]
    "from" => 0
    "size" => 25
  ]

I get no results, yet a query for the term 'asd' returns the correct result.

It looks as thought the index is created correctly and the service data is mapped to the fields. The index analyzer is in place and assigned to the fields. A test of the analyzer shows it is using the supplied synonyms, but when I query the index with a synonym I get no results.
Could you give me some insights into where I should look to resolve this?
Thanks

Hi @appsol

Could you provide a sample doc that you expect when search by term "autism"

Hello @RabBit_BR,
thanks for looking. Sure the doc I would expect to be returned is:

"id" => "d59ddde4-4f3a-40e0-8a36-392d2198acb1"
  "name" => "Helping asd"
  "intro" => "Id et est ut rerum id vel cupiditate"
  "description" => "Ipsam ea et qui voluptatem quia excepturi"
  "wait_time" => null
  "is_free" => true
  "status" => "active"
  "score" => 1
  "organisation_name" => "Stevens Ltd dolorum 37063"
  "taxonomy_categories" => []
  "collection_categories" => []
  "collection_personas" => []
  "service_locations" => []
  "service_eligibilities" => array:7 [
    0 => "Age Group All"
    1 => "Disability All"
    2 => "Gender All"
    3 => "Income All"
    4 => "Language All"
    5 => "Ethnicity All"
    6 => "Housing All"
  ]

A search for 'asd' will return this doc, but a search for one of the synonyms, e.g. 'autism' does not.

Do you have access to devTools?
I did a test with the given data and got the document when the search term is "autism".
Maybe I forgot some configuration and that's why I can't reproduce your error.

Mapping

PUT test/
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonym",
              "english_stop"
            ]
          }
        },
        "filter": {
          "english_stop": {
            "type": "stop",
            "stopwords": [
              "_english_"
            ]
          },
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "autism,autistic,asd",
              "not drinking,dehydration",
              "dehydration,thirsty,drought"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "my_analyzer"
      }
    }
  }
}

Document

POST test/_doc
{
  "id": "d59ddde4-4f3a-40e0-8a36-392d2198acb1",
  "name": "Helping asd",
  "intro": "Id et est ut rerum id vel cupiditate",
  "description": "Ipsam ea et qui voluptatem quia excepturi",
  "is_free": true,
  "status": "active",
  "score": 1
}

Query

GET test/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "term": {
                "status": "active"
              }
            }
          ],
          "minimum_should_match": 1, 
          "should": [
            {
              "match": {
                "name": {
                  "query": "autism",
                  "fuzziness": "AUTO"
                }
              }
            },
            {
              "match": {
                "organisation_name": {
                  "query": "autism",
                  "fuzziness": "AUTO"
                }
              }
            },
            {
              "match": {
                "intro": {
                  "query": "autism",
                  "fuzziness": "AUTO"
                }
              }
            },
            {
              "match": {
                "description": {
                  "query": "autism",
                  "fuzziness": "AUTO"
                }
              }
            }
          ]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "score",
            "missing": 1,
            "modifier": "none"
          }
        }
      ]
    }
  }
}

Response

"hits": [
      {
        "_index": "test",
        "_id": "zWVW6YUBQum6OxY14_ER",
        "_score": 0.32088596,
        "_source": {
          "id": "d59ddde4-4f3a-40e0-8a36-392d2198acb1",
          "name": "Helping asd",
          "intro": "Id et est ut rerum id vel cupiditate",
          "description": "Ipsam ea et qui voluptatem quia excepturi",
          "is_free": true,
          "status": "active",
          "score": 1
        }
      }
    ]

@RabBit_BR Thank you so much! The fact that you could use my settings and successfully return the doc meant that I knew it wasn't a settings issue. So i dug into the implementation of the Elasticsearch, particularly how the models are exposed on the index and discovered that the model was not registering itself on the correct index when running tests. There was an index prefix for tests (i.e. 'testing_services' rather than 'services') which was not being applied when the model was registering which index to add it to.
Mea culpa!
Thanks again.

1 Like

Glad you got it right.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.