Requesting help with synonym's combined with a query string

Hi everybody,

Just stepping in to the world of ES so I'm sorry if I am a bit of a newbie.

Currently I'm stuck at a point in ES and I can't seem to find an awser online, perhaps one of you could be of assistance.

What I'm trying is the following, whilst using a synonym list I would like to use a query string.

As an example:

Input = 'corporation' + 'garage'
Synonym = corporation > company
Result = 'The garage company'

At this time I've got the synonym list working using the code below.

PUT company
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms_path" : "analysis/synonyms.txt",
          "updateable": true
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text_field": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "my_synonyms"
      }
    }
  }
}

But when I try to use the following code in a query I dont get any results.

GET company/_search
{

    "query" : {
      "query_string" : {
          "query" : "(corporation) AND (garage)",
          "default_field": "COMBINED_FIELD",
          "fuzziness": "AUTO",
          "analyzer": "my_synonyms"
        }
    }
}

Can any of you please help me?

Hi Bram,
Can you supply a full recreation script with mapping, synonyms, and example doc?
I don't see the "COMBINED_FIELD" mentioned in your query in your mapping, for instance.

This script is working for me:

DELETE company
PUT company
{
  "settings": {
    "analysis": {
      "filter": {
          "my_synonym_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [ "corporation, company => company" ]
          }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text_field": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "my_synonyms"
      }
    }
  }
}
POST company/_doc/1
{
  "text_field":"company garage"
}
GET company/_search
{

    "query" : {
      "query_string" : {
          "query" : "corporation AND garage",
          "default_field": "text_field",
          "fuzziness": "AUTO",
          "analyzer": "my_synonyms"
        }
    }
}

Thanks for the quick reply Mark.

Am I right in thinking you are requesting the mappings that are provided when executing

GET company/

In that case I get the following response.

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security-minimal-setup.html to enable security.
{
  "company" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "COMBINED_FIELD" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "null" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "text_field" : {
          "type" : "text",
          "analyzer" : "standard",
          "search_analyzer" : "my_synonyms"
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "company",
        "creation_date" : "1628684658444",
        "analysis" : {
          "filter" : {
            "my_synonym_filter" : {
              "type" : "synonym",
              "synonyms_path" : "analysis/synonyms.txt",
              "updateable" : "true"
            }
          },
          "analyzer" : {
            "my_synonyms" : {
              "filter" : [
                "lowercase",
                "my_synonym_filter"
              ],
              "tokenizer" : "keyword"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "Nz4PJn5xSJ-Jq0hNso5FMQ",
        "version" : {
          "created" : "7130299"
        }
      }
    }
  }
}

At this time I'm unable to provide the data in the COMBINED_FIELD as it contains some sensitive information that I'm not permitted to share.

The synonyms file currently consists of the follow words:

"Company","Business firm","Organisation","Corporation"

The input requests and output for the "COMBINED_FIELD" can contain any of the four synonyms mentioned above.

In the spirit of the old "teach a man to fish..." mantra, check out the "analyze" api to see what your choice of analyzer does to document and query text.

GET company/_analyze
{
  "analyzer": "my_synonyms",
  "text": ["company garage"]
}

What it reveals may make you want to change your mapping and the tokenizer used for the synonym filter. It was keyword but it looks like you might want standard:

        "my_synonyms": {
          "tokenizer": "standard"

Again, experiment with the analyze API.
When it comes to debugging how the query is processed the explain API is your friend - give it the index and doc ID of a doc you expect to match and it will tell you why the query does or does not match that doc:

GET company/_doc/1/_explain
{

    "query" : {
      "query_string" : {
          "query" : "corporation AND garage",
          "default_field": "text_field",
          "fuzziness": "AUTO",
          "analyzer": "my_synonyms"
        }
    }
}

Thank you!

With your suggestion of using a different tokenizer along with the /_analyze suggestion I was able to figure it out.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.