Autocomplete best match is not first in results

When I search for "charles sta" the first result in the list is "U.S. Dept. of State, Office of Logistics Management, Charleston" and the second result is "Charles Stark Draper Laboratory, Inc. (CSDL)"

The second result is a better match. I'm wondering why it's showing the first result as the best match rather than the second? And how can I update what I'm doing to have the second result be the best match.

Here is my mapping:

companies.json mapping:

{
    "settings": {
        "index": {
            "max_result_window": 1000000,
            "number_of_shards": 4
        },
        "analysis": {
            "normalizer": {
                "case_insensitive": {
                    "filter": "lowercase"
                }
            },
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase",
                    "filter": [
                        "asciifolding"
                    ]
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 10,
                    "token_chars": [
                        "letter"
                    ]
                }
            }
        }
    },
    "aliases": {
        "companies-loading": {}
    },
    "mappings": {
        "properties": {
            "esIndex": {
                "type": "keyword"
            },
            "name": {
                "type": "text",
                "analyzer": "autocomplete",
                "search_analyzer": "autocomplete_search",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "name_suggest": {
                "type": "completion",
                "contexts": [
                    {
                      "name": "index_name",
                      "type": "category"
                    }
                ]
            }
        }
    }
}

agency.json mapping

{
    "settings": {
        "index": {
            "max_result_window": 1000000,
            "number_of_shards": 4
        },
        "analysis": {
            "normalizer": {
                "case_insensitive": {
                    "filter": "lowercase"
                }
            },
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase",
                    "filter": [
                        "asciifolding"
                    ]
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 10,
                    "token_chars": [
                        "letter"
                    ]
                }
            }
        }
    },
    "aliases": {
        "agency-loading": {}
    },
    "mappings": {
        "properties": {
            "esIndex": {
                "type": "keyword"
            },
            "uid": {
                "type": "integer"
            },
            "name": {
                "type": "text",
                "analyzer": "autocomplete",
                "search_analyzer": "autocomplete_search",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "name_suggest": {
                "type": "completion",
                "contexts": [
                    {
                      "name": "index_name",
                      "type": "category"
                    }
                ]
            }
        }
    }
}

Search query I'm using:

// url: https://localhost:9200/companies,agency/_search
// elasticsearch 7.9
{
  "_source": [
    "name",
    "type",
    "uid"
  ],
  "size": 2,
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "charles sta"
            }
          }
        }
      ]
    }
  },
  "suggest": {
    "nameSuggestions": {
      "prefix": "charles sta",
      "completion": {
        "field": "name_suggest",
        "skip_duplicates": true,
        "contexts": {
          "index_name": [
            {
              "context": "agency",
              "boost": 2
            },
            {
              "context": "companies",
              "boost": 2
            }
          ]
        },
        "size": 2
      }
    }
  }
}

And the results that come back:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 8,
        "successful": 8,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 171,
            "relation": "eq"
        },
        "max_score": 8.716487,
        "hits": [
            {
                "_index": "agency-1626509281",
                "_type": "_doc",
                "_id": "O7OCs3oByp01hMO1rWYM",
                "_score": 8.716487,
                "_source": {
                    "uid": 43638,
                    "name": "U.S. Dept. of State, Office of Logistics Management, Charleston",
                    "type": "agency"
                }
            },
            {
                "_index": "companies-1626509290",
                "_type": "_doc",
                "_id": "vICCs3oBQl5LKuna7PeW",
                "_score": 7.525751,
                "_source": {
                    "name": "Charles Stark Draper Laboratory, Inc. (CSDL)",
                    "type": "companies"
                }
            }
        ]
    },
    "suggest": {
        "nameSuggestions": [
            {
                "text": "charles sta",
                "offset": 0,
                "length": 11,
                "options": []
            }
        ]
    }
}

You just hit the miracles of full text search. The way you configured your index/analysis chain by using ngrams means, it is harder to score on full terms as you lose that information.

How about indexing your data as text without any configuration and create a bool query that contains should clause with a match query against that field, so that hits that match full terms in your query are scored higher?

Even though it is somewhat dated, Elasticsearch - The definitive Guide is still a really valuable resource in order to take a step back and understand how mapping and full text search works. See Dealing with Human Language | Elasticsearch: The Definitive Guide [2.x] | Elastic as a start (I highly recommend chapters before and after as well).

Also, as a side note,you may want to take a look at the search as you type field type

hope this helps as a start!

Thanks for info. I'll check that out. My overall goal is to create a good autocomplete. What I'm doing is from this blog article: Autocompletion for Public Transportation | mimacom

It has it's strengths but then its weak in the following scenario I showed. Would it be possible to improve it in the scenario I showed by combined what you said with what they did? What I mean is, would it make sense to index another field without any configuration. And then add that to the query chain to increase the score?

Thanks for the book. I've read through that before and it was helpful. I have a pretty large application that has searching and filtering kind of like an amazon website. I'm falling short on building a good autocomplete. It's mostly good but falls short in the example I provided.

Thanks for the mention on search_as_you_type field. I'm going to try that out to see how it compares to what I have so far. I'm currently using the other autocomplete option, context suggestor, and the reason was so I could boost certain contexts.

Yes, I think influencing scoring when someone typed a word that is a full hit makes sense.. however test with outliers like very common terms in your dataset to be sure it works as expected!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.