ELSER2 | Spell check before creating embeddings

Rakesh_Nayak · January 3, 2024, 12:49pm

Hello Team,

Any suggestion of doing spell check before creating embeddings? e.g. if the query is misspelt "toiket rolls" instead of "toilet rolls" can we create the embeddings for "toilet rolls" using ELSER2 model

POST _ml/trained_models/elser-model-2-for-ingest-search/_infer
{
  "docs":{
    "text_field": "toiket rolls"
  }
}

Result:

{
  "inference_results": [
    {
      "predicted_value": {
        "##ike": 2.758485,
        "##t": 2.2162936,
        "roll": 2.0772736,
        "to": 1.8866745,
        "rolls": 1.8158195,
        "rolling": 1.4678012,
        "##uge": 0.92356,
        "bring": 0.9175947,
        "sue": 0.7835926,
        "##k": 0.63172615,
        "technique": 0.61145663,
        "festival": 0.5835248,
        "##te": 0.5779746,
        "dutch": 0.5647326,
        "wheel": 0.5633026,
        "##nt": 0.5176124,
        "roller": 0.5174776,
        "japanese": 0.50211316,
        "flute": 0.49578997,
        "movement": 0.4836977,
        "german": 0.4779577,
        "rake": 0.46293172,
        "cake": 0.44880012,
        "horse": 0.42376143,
        "hand": 0.39447936,
        "dance": 0.3883844,
        "stunt": 0.3841404,
        "craft": 0.35372037,
        "stock": 0.31527817,
        "puppet": 0.29949415,
        "##ts": 0.28825995,
        "film": 0.27967602,
        "hang": 0.27863201,
        "beer": 0.25969,
        "paper": 0.25739628,
        "rice": 0.2504973,
        "rope": 0.20884833,
        "ski": 0.17991425,
        "dodge": 0.17231494,
        "ko": 0.16818042,
        "art": 0.15494661,
        "whip": 0.15116276,
        "foot": 0.14420456,
        "band": 0.14200562,
        "windmill": 0.13235468,
        "welcome": 0.12275867,
        "weaving": 0.10461076,
        "production": 0.07868658,
        "truck": 0.06703148,
        "vehicle": 0.05202646,
        "ride": 0.030120868,
        "build": 0.023026925,
        "french": 0.021837963,
        "fake": 0.019907437,
        "brake": 0.012350509,
        "wright": 0.0088391695,
        "piece": 0.006032948,
        "style": 0.0019135037
      }
    }
  ]
}

Tom_Veasey · January 3, 2024, 2:07pm

For definite you will get better results with spelling correction when the query is mistyped. It is always a tricky balance to strike since sometimes the user may mean to exact match on a character sequence, such a product code, which is not a real word. One might envision a very simple test for this case by checking if there are exact matches on each alphanumeric sequence in the query.

At the moment any spell correction logic would have to live upstream from the query to Elasticsearch. (It's beyond the scope of this forum to give detailed advice on how best to do spelling correction.) However, models do have some typo tolerance and we do train ELSER with some typos. We haven't yet tried to systematically improve its robustness to spelling errors via training, which is something we plan to explore. We've also recently been exploring training a seq2seq model to create "did you mean" suggestions. IMO this is the best way of tackling this problem. This work is still at an early stage, but it is something we are currently researching.

system · January 31, 2024, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improving search results for misspelled queries with ELSER semantic search Elasticsearch	0	17	November 22, 2024
Spell check suggestion Elasticsearch	2	5097	June 1, 2017
ELSER ingest pipeline with ingest processor Elasticsearch ingest-pipeline	1	138	April 24, 2024
Semantic search - how to get correct results Elastic Search elastic-app-search	2	196	March 27, 2024
Context Error During Reindex with Elser Elasticsearch elastic-stack-machine-learning , docker , painless , ingest-pipeline , esre-elasticsearch-relevance-engine	5	253	June 25, 2024

ELSER2 | Spell check before creating embeddings

Related topics