ES partial matching (ngram) use case

mhmtnlr · September 16, 2015, 11:01am

hi everybody
I have an index for keeping book records such as;
ElasticSearch Cookbook
ElasticSearch Server
Mastering ElasticSearch
ElasticSearch

i have more than 2M records.
search cases:

search term --- expected result --- (case) elastic cook ---
search cook --- ElasticSearch Cookbook --- (partial match)
ElasticSearhCookBook --- ElasticSearch Cookbook --- (no space)
ekasticsearch --- ElasticSearch --- (typo)

etc.

I try to write whole problem in here but there is a character limit for topics so;
to check analyzer, mapping and query for this problem pls look following link?

whole problem definition

So, I am doing something wrong or is it normal?

nik9000 · September 16, 2015, 1:33pm

You are better of linking to some gists.

In the link you ask about response times. Fuzzy matching is my guess for what is taking the time. Phrase matching on ngrams is also expensive because you end up with lots and lots of tokens. So I'm not surprised its slow.

There are lots of things you can do about it - don't use fuzzy matching at all, for one. You could try looking into the term or phrase or completion suggester for spelling correction. Or you could get spelling results by going from a phrase query on those ngrams to a terms query. I'm not sure how you'd word it to elasticsearch but if you were to ngram the input and require only one of the terms to match you'd still find the books even with spelling errors. But you'd find too many books. Hopefully scoring would make the one you wanted come back higher.

Another other option is to index the books with a title but also with common misspellings. When you search you search both those fields.

mhmtnlr · September 17, 2015, 2:28pm

thanks a lot,
after removing fuzziness and phrase matching from must query (which i was applying on ngram) , response time is became much much better. for 14 character max response time= 100 ms.
Thanks a lot
new query

 {
  "bool": {
    "must": {
      "match": {
        "name": {
          "query": "elastic cook",
          "type": "boolean",
          "operator": "OR",
          "minimum_should_match": "1",
          "cutoff_frequency": 0.01
        }
      }
    },
    "should": [
      {
        "match": {
          "name.exact": {
            "query": "elastic cook",
            "type": "phrase",
            "boost": 4
          }
        }
      },
      {
        "match": {
          "name.token": {
            "query": "elastic cook",
            "type": "phrase"
          }
        }
      },
      {
        "match": {
          "name.edgeNGnoSplit": {
            "query": "elastic cook",
            "type": "phrase",
            "fuzziness": "1",
            "max_expansions": 8
          }
        }
      },
      {
        "match": {
          "name.edgeNG": {
            "query": "elastic cook",
            "type": "phrase",
            "fuzziness": "1",
            "max_expansions": 4
          }
        }
      }
    ]
  }
}

Topic		Replies	Views
Storage problem with ngram filters Elasticsearch	8	1196	November 24, 2017
Word matching (partial and full) Elasticsearch	5	1440	July 5, 2017
Which is the best (right) use of NGrams? Elasticsearch	19	5576	July 6, 2017
Elasticsearch ngram tokenizer Elasticsearch	4	807	February 10, 2020
nGram performance Elasticsearch	3	3578	July 6, 2017

ES partial matching (ngram) use case

hi everybody I have an index for keeping book records such as; ElasticSearch Cookbook ElasticSearch Server Mastering ElasticSearch ElasticSearch

Related topics

hi everybody
I have an index for keeping book records such as;
ElasticSearch Cookbook
ElasticSearch Server
Mastering ElasticSearch
ElasticSearch