Tolerance spelling


(Maher Ben Taleb Ali) #1

Hi,

I need help in optimizing the query to increase the number of matches without affecting the accuracy.
What are the best options to tolerate some spelling mistakes?

Thank you.


(Peter Dyson) #2

Hi Maher,

One option might be to look at a fuzzy query:
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-fuzzy-query.html

POST /test-fuzz/fuzztype/
{
  "message": "this is a test"
}

POST /test-fuzz/fuzztype/
{
  "message": "this is a toast"
}

POST /test-fuzz/fuzztype/
{
  "message": "this is a testing time"
}

POST /test-fuzz/fuzztype/
{
  "message": "this is a tester"
}

GET test-fuzz/_search

GET test-fuzz/_search
{
  "query": {
    "match": {
      "message": "test"
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0126973,
    "hits": [
      {
        "_index": "test-fuzz",
        "_type": "fuzztype",
        "_id": "AVvRtnqQtT7Egp-Jj5Qk",
        "_score": 1.0126973,
        "_source": {
          "message": "this is a test"
        }
      }
    ]
  }
}



GET test-fuzz/_search
{
  "query": {
    "fuzzy": {
      "message": {
        "value": "test",
        "fuzziness": 2
      }
    }
  }
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1.0126973,
    "hits": [
      {
        "_index": "test-fuzz",
        "_type": "fuzztype",
        "_id": "AVvRtnqQtT7Egp-Jj5Qk",
        "_score": 1.0126973,
        "_source": {
          "message": "this is a test"
        }
      },
      {
        "_index": "test-fuzz",
        "_type": "fuzztype",
        "_id": "AVvRtuobtT7Egp-Jj5VM",
        "_score": 0.50634867,
        "_source": {
          "message": "this is a toast"
        }
      },
      {
        "_index": "test-fuzz",
        "_type": "fuzztype",
        "_id": "AVvRtuzqtT7Egp-Jj5Wc",
        "_score": 0.14384104,
        "_source": {
          "message": "this is a tester"
        }
      }
    ]
  }
}

There's also the term suggester that might be of use:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html


(bashar) #3

Hi, you can use the phonetic-matching https://www.elastic.co/guide/en/elasticsearch/guide/current/phonetic-matching.html


(Peter Dyson) #4

Another great tip!


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.