Query string query: default fuzziness?


(kassnl_87) #1

Hello people,

I'm using ES for quite a while but I'm stuck on something I cannot explain. I do a simple query like:

{
"query": {
"query_string" : {
"query" : "daan"
}
}
}

having all settings on default, and I get back results with the term 'dans' instead of 'daan'. I was under the impression that I had to use ~ in order to enable fuzziness, and that by default I would only get exact matches. Isn't that the case? If not, how can I require exact matches with query string queries? Thank you!


(Mark Harwood) #2

Use the explain API to figure out what terms/fields are being matched for the docs in question.
If you have trouble interpreting the output paste it here.


(kassnl_87) #3

Hi @Mark_Harwood!
I used the explain API for one case without exact matches like this:

GET my-index/test/58de9cf6bfcd0b7cf3ab29d8/_explain
{"query": {
  "function_score" : {
    "query" : {
      "bool" : {
        "must" : [
          {
            "query_string" : {
              "query" : "daan",
              "fields" : [
                "match_education^1.0",
                "match_experience^2.0",
                "match_personal^1.0",
                "match_skill^2.0"
              ]
            }
          }
        ]
      }
    }
  }
}
}

and I got back:

{
  "_index": "my-index",
  "_type": "test",
  "_id": "58de9cf6bfcd0b7cf3ab29d8",
  "matched": true,
  "explanation": {
    "value": 26.325321,
    "description": "sum of:",
    "details": [
      {
        "value": 26.325321,
        "description": "sum of:",
        "details": [
          {
            "value": 26.325321,
            "description": "max of:",
            "details": [
              {
                "value": 10.659221,
                "description": "weight(match_education:dan in 38031) [PerFieldSimilarity], result of:",
                "details": [
                  {
                    "value": 10.659221,
                    "description": "score(doc=38031,freq=1.0 = termFreq=1.0\n), product of:",
                    "details": [
                      {
                        "value": 7.654514,
                        "description": "idf(docFreq=33, docCount=70689)",
                        "details": []
                      },
                      {
                        "value": 1.3925406,
                        "description": "tfNorm, computed from:",
                        "details": [
                          {
                            "value": 1,
                            "description": "termFreq=1.0",
                            "details": []
                          },
                          {
                            "value": 1.2,
                            "description": "parameter k1",
                            "details": []
                          },
                          {
                            "value": 0.75,
                            "description": "parameter b",
                            "details": []
                          },
                          {
                            "value": 12.864208,
                            "description": "avgFieldLength",
                            "details": []
                          },
                          {
                            "value": 4,
                            "description": "fieldLength",
                            "details": []
                          }
                        ]
                      }
                    ]
                  }
                ]
              },
              {
                "value": 26.325321,
                "description": "weight(match_experience:dan in 38031) [PerFieldSimilarity], result of:",
                "details": [
                  {
                    "value": 26.325321,
                    "description": "score(doc=38031,freq=5.0 = termFreq=5.0\n), product of:",
                    "details": [
                      {
                        "value": 2,
                        "description": "boost",
                        "details": []
                      },
                      {
                        "value": 6.6921854,
                        "description": "idf(docFreq=85, docCount=68919)",
                        "details": []
                      },
                      {
                        "value": 1.9668703,
                        "description": "tfNorm, computed from:",
                        "details": [
                          {
                            "value": 5,
                            "description": "termFreq=5.0",
                            "details": []
                          },
                          {
                            "value": 1.2,
                            "description": "parameter k1",
                            "details": []
                          },
                          {
                            "value": 0.75,
                            "description": "parameter b",
                            "details": []
                          },
                          {
                            "value": 64.27036,
                            "description": "avgFieldLength",
                            "details": []
                          },
                          {
                            "value": 20.897959,
                            "description": "fieldLength",
                            "details": []
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value": 0,
        "description": "match on required clause, product of:",
        "details": [
          {
            "value": 0,
            "description": "# clause",
            "details": []
          },
          {
            "value": 1,
            "description": "_type:talent, product of:",
            "details": [
              {
                "value": 1,
                "description": "boost",
                "details": []
              },
              {
                "value": 1,
                "description": "queryNorm",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
} 

My surprise is that it matches: 'dan' instead of 'daan'. Why would that be?


(Mark Harwood) #4

Your choice of Analyzer dictates how documents and queries are sliced into tokens.
Can you supply the results of GET my-index/_mapping


(kassnl_87) #5

Oh yes! Makes total sense now!
Thank you for your time @Mark_Harwood


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.