Query_string not applying synonyms filter

ankur_singla · May 6, 2019, 11:49am

Hi

I am facing one issue while using query_string. For strings having space it's not applying synonyms filter but same thing is working with match query .

index creation script-

   PUT my_index1
    {
      "settings": {
        "analysis": {
          "filter": {
            "content_synonyms": {
              "type": "synonym",
              "synonyms": [
                "air print => airprint"
              ]
            }
            },
            "analyzer": {
              "search_a": {
                "filter": [
                  "lowercase",
                  "content_synonyms"
                ],
                "type": "custom",
                "tokenizer": "standard"
              },
              "index_a": {
                "filter": [
                  "lowercase"
                ],
                "type": "custom",
                "tokenizer": "standard"
              }
            }
          }
        
      },
      "mappings": {
        "test": {
          "properties": {
            "body": {
              "type": "text",
              "analyzer": "index_a",
              "search_analyzer": "search_a",
              "search_quote_analyzer": "index_a"
            }
          }
        }
      }
    }

    PUT my_index1/test/3
    {"body": "air"}

    PUT my_index1/test/2
    {"body": "print"}

    PUT my_index1/test/1
    {"body": "airprint"}

query_string I am using-

GET my_index1/test/_search?explain
{
  "size": 10,
  "query": {
    "query_string": {
      "query": "body:(air print)"
    }
  }
}

query_string response -

[
      {
        "_index": "my_index1",
        "_type": "test",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "body": "print"
        }
      },
      {
        "_index": "my_index1",
        "_type": "test",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "body": "air"
        }
      }
    ]

I was expecting the 1st doc with "airprint"

In explain of query response(weight(body:print in 0) & weight(body:air in 0)) we can see it's only splitting as per tokenizer but not applying "air print => airprint" synonyms but while using same with match query synonyms are applied.

match query-

GET my_index1/test/_search?explain
{
  "query": {
    "match": {
      "body": "air print"
    }
  }
}

response-

[
      {
        "_index": "my_index1",
        "_type": "test",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "body": "airprint"
        }
      }
    ]

Here it's applying "air print => airprint" synonyms & returning the 1st docs with "airprint" & applying the synonyms. We can check the explain of this query(weight(body:airprint in 0)).

It's kind weird to me why synonyms (when strings with space) not working with query_string but same thing works with match query. I was thinking both (match query & query_string) internally worked almost similar.

Can anyone explain this please?

I am using ES version 5.6.1

dadoonet · May 6, 2019, 12:05pm

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

ankur_singla · May 6, 2019, 12:20pm

Hi @dadoonet,

I have added all the settings & query scripts in my post.

dadoonet · May 6, 2019, 12:24pm

It could be useful to have a full script that we can just copy and paste to play with. Here people needs to write manually the script to reproduce it.
Look again at the example I shared in the post I linked to. It's a complete one that we can just copy, paste and play.

ankur_singla · May 6, 2019, 12:51pm

Hi @dadoonet

I have edited my post with running scripts. I think this will work.

dadoonet · May 6, 2019, 1:22pm

So I tried your script on 7.0 and adapted it a bit:

DELETE my_index1
PUT my_index1
{
  "settings": {
    "analysis": {
      "filter": {
        "content_synonyms": {
          "type": "synonym",
          "synonyms": [
            "air print => airprint"
          ]
        }
      },
      "analyzer": {
        "search_a": {
          "filter": [
            "lowercase",
            "content_synonyms"
          ],
          "type": "custom",
          "tokenizer": "standard"
        },
        "index_a": {
          "filter": [
            "lowercase"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "index_a",
        "search_analyzer": "search_a",
        "search_quote_analyzer": "index_a"
      }
    }
  }
}

PUT my_index1/_doc/3
{"body": "air"}
PUT my_index1/_doc/2
{"body": "print"}
PUT my_index1/_doc/1
{"body": "airprint"}

GET my_index1/_search?explain=true
{
  "size": 10,
  "query": {
    "query_string": {
      "query": "body:(air print)"
    }
  }
}

This gives:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_shard" : "[my_index1][0]",
        "_node" : "bT2ZhV62S9WFG16Hx2mDzQ",
        "_index" : "my_index1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.9808292,
        "_source" : {
          "body" : "airprint"
        },
        "_explanation" : {
          "value" : 0.9808292,
          "description" : "weight(body:airprint in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.9808292,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                {
                  "value" : 2.2,
                  "description" : "boost",
                  "details" : [ ]
                },
                {
                  "value" : 0.98082924,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                    },
                    {
                      "value" : 3,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Which is what your are expecting.
I did not test on 5.6 though.

So I'd recommend upgrading to 7.0. May be a more recent version of 5.x would fix this though?

ankur_singla · May 6, 2019, 1:36pm

Thanks @dadoonet for your quick response.

Yeah just tried in 7.0, it's working fine over there. But we can't upgrade to 7.0.

Can you please check if we can do some trick with in the query or settings for the quick fix in 5.6.1?

dadoonet · May 6, 2019, 1:48pm

At least I'd try with 5.6.15: https://www.elastic.co/fr/downloads/past-releases/elasticsearch-5-6-15

ankur_singla · May 7, 2019, 5:54am

Hi @dadoonet,

I tried in ES 5.6.16, it's not working there.

dadoonet · May 7, 2019, 6:12am

I'd upgrade. But may be @jimczi has another idea for 5.x series.

jimczi · May 7, 2019, 6:35am

In 5.x whitespaces are considered as operators in the query_string query:
The query string parser would interpret your query as a search for "air OR print" , while the token stored in your index is actually "airprint" . The option split_on_whitespace=false will protect it from being touched by the query string parser and will let the analysis run on the entire input ( "air print" ). Starting 6.x split_on_whitespace is always set to false which is why your query is working in this version. See the warning in the documentation regarding this behavior:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html

ankur_singla · May 7, 2019, 1:45pm

Thanks, @jimczi for your help.

Yeah, I have gone through the documentation it worked.

system · June 4, 2019, 1:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonym in Query_string not working as expected Elasticsearch	2	760	July 17, 2020
Synonyms not getting applied Elasticsearch	2	374	February 13, 2020
Requesting help with synonym's combined with a query string Elasticsearch	5	294	September 8, 2021
Querying synonyms Elasticsearch	2	567	October 26, 2017
Query with synonym doesn't work as expected Elasticsearch	6	2522	July 5, 2017

Query_string not applying synonyms filter

Related topics