Query_string not applying synonyms filter

Hi

I am facing one issue while using query_string. For strings having space it's not applying synonyms filter but same thing is working with match query .

index creation script-

   PUT my_index1
    {
      "settings": {
        "analysis": {
          "filter": {
            "content_synonyms": {
              "type": "synonym",
              "synonyms": [
                "air print => airprint"
              ]
            }
            },
            "analyzer": {
              "search_a": {
                "filter": [
                  "lowercase",
                  "content_synonyms"
                ],
                "type": "custom",
                "tokenizer": "standard"
              },
              "index_a": {
                "filter": [
                  "lowercase"
                ],
                "type": "custom",
                "tokenizer": "standard"
              }
            }
          }
        
      },
      "mappings": {
        "test": {
          "properties": {
            "body": {
              "type": "text",
              "analyzer": "index_a",
              "search_analyzer": "search_a",
              "search_quote_analyzer": "index_a"
            }
          }
        }
      }
    }

    PUT my_index1/test/3
    {"body": "air"}

    PUT my_index1/test/2
    {"body": "print"}

    PUT my_index1/test/1
    {"body": "airprint"}

query_string I am using-

GET my_index1/test/_search?explain
{
  "size": 10,
  "query": {
    "query_string": {
      "query": "body:(air print)"
    }
  }
}

query_string response -

[
      {
        "_index": "my_index1",
        "_type": "test",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "body": "print"
        }
      },
      {
        "_index": "my_index1",
        "_type": "test",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "body": "air"
        }
      }
    ]

I was expecting the 1st doc with "airprint"

In explain of query response(weight(body:print in 0) & weight(body:air in 0)) we can see it's only splitting as per tokenizer but not applying "air print => airprint" synonyms but while using same with match query synonyms are applied.

match query-

GET my_index1/test/_search?explain
{
  "query": {
    "match": {
      "body": "air print"
    }
  }
}

response-

[
      {
        "_index": "my_index1",
        "_type": "test",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "body": "airprint"
        }
      }
    ]

Here it's applying "air print => airprint" synonyms & returning the 1st docs with "airprint" & applying the synonyms. We can check the explain of this query(weight(body:airprint in 0)).

It's kind weird to me why synonyms (when strings with space) not working with query_string but same thing works with match query. I was thinking both (match query & query_string) internally worked almost similar.

Can anyone explain this please?

I am using ES version 5.6.1

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Hi @dadoonet,

I have added all the settings & query scripts in my post.

It could be useful to have a full script that we can just copy and paste to play with. Here people needs to write manually the script to reproduce it.
Look again at the example I shared in the post I linked to. It's a complete one that we can just copy, paste and play.

Hi @dadoonet

I have edited my post with running scripts. I think this will work.

So I tried your script on 7.0 and adapted it a bit:

DELETE my_index1
PUT my_index1
{
  "settings": {
    "analysis": {
      "filter": {
        "content_synonyms": {
          "type": "synonym",
          "synonyms": [
            "air print => airprint"
          ]
        }
      },
      "analyzer": {
        "search_a": {
          "filter": [
            "lowercase",
            "content_synonyms"
          ],
          "type": "custom",
          "tokenizer": "standard"
        },
        "index_a": {
          "filter": [
            "lowercase"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "index_a",
        "search_analyzer": "search_a",
        "search_quote_analyzer": "index_a"
      }
    }
  }
}

PUT my_index1/_doc/3
{"body": "air"}
PUT my_index1/_doc/2
{"body": "print"}
PUT my_index1/_doc/1
{"body": "airprint"}

GET my_index1/_search?explain=true
{
  "size": 10,
  "query": {
    "query_string": {
      "query": "body:(air print)"
    }
  }
}

This gives:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_shard" : "[my_index1][0]",
        "_node" : "bT2ZhV62S9WFG16Hx2mDzQ",
        "_index" : "my_index1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.9808292,
        "_source" : {
          "body" : "airprint"
        },
        "_explanation" : {
          "value" : 0.9808292,
          "description" : "weight(body:airprint in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.9808292,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                {
                  "value" : 2.2,
                  "description" : "boost",
                  "details" : [ ]
                },
                {
                  "value" : 0.98082924,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                    },
                    {
                      "value" : 3,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Which is what your are expecting.
I did not test on 5.6 though.

So I'd recommend upgrading to 7.0. May be a more recent version of 5.x would fix this though?

Thanks @dadoonet for your quick response.

Yeah just tried in 7.0, it's working fine over there. But we can't upgrade to 7.0.

Can you please check if we can do some trick with in the query or settings for the quick fix in 5.6.1?

At least I'd try with 5.6.15: https://www.elastic.co/fr/downloads/past-releases/elasticsearch-5-6-15

Hi @dadoonet,

I tried in ES 5.6.16, it's not working there.

I'd upgrade. But may be @jimczi has another idea for 5.x series.

In 5.x whitespaces are considered as operators in the query_string query:
The query string parser would interpret your query as a search for "air OR print" , while the token stored in your index is actually "airprint" . The option split_on_whitespace=false will protect it from being touched by the query string parser and will let the analysis run on the entire input ( "air print" ). Starting 6.x split_on_whitespace is always set to false which is why your query is working in this version. See the warning in the documentation regarding this behavior:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html

2 Likes

Thanks, @jimczi for your help. :heart_eyes:

Yeah, I have gone through the documentation it worked. :+1:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.