Autocomplete address search

Hi,
I'm trying to build an index & query for address search autocomplete for City, State & Country.
I have a structured data fields:

  • City,
  • State
  • Country
  • Full (a complete field that concatenates City State & Country, separated by space)

I used ngram approach, and that was working fine when I'm using it on City+Country. But when the field is longer, the ngram approach is not working. for eg I'm not getting any result for query term 'san fran', but I get result for 'san francisco'

I'm using this for indexing:

{
  "index_patterns": ["address_book*"],
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 50,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "full": {
        "type": "text",
        "analyzer": "autocomplete",
        "search_analyzer": "autocomplete_search"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
 }

And the query is:

{
   "query": {
    "match": {
      "full": {
        "query": "san fran", 
        "operator": "and"
      }
    }
  }
}

I'm guessing that the index field (full) needs to be tokenized if that is not happening already?
Any suggestion on how I can get the auto complete working on the partial matches?

Thank you

Did you look at

Should work pretty good on the full concatenated address.

As people start to type the address. You have to be a little careful on the front and so that you don't flood your back end with queries, You need a couple millisecond pause between submissions, but I'm not the front-end expert :slight_smile:

1 Like

Ohh and App Search pretty much has the Out of the Box

ah didn't realize there already is search_as_you_type field type! Will give that a try thanks :slight_smile:

On the frontend side, I do have a bouncer that waits for 500ms after the last char before sending the search request.

@stephenb Still I'm having exact same issue with search_as_you_type
Here is the new index setting:

 {
  "index_patterns": ["address_book*"],
  "mappings": {
    "properties": {
      "full": {
        "type": "search_as_you_type"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
 }

Here is a field value: 'San Francisco California USA US'
I get results for: 'san francisco', 'san usa', 'san us'
But no results for: 'san fran', 'san cali'

Hmmmm I get all expected results

PUT discuss-search-as-type
{
  "mappings": {
    "properties": {
      "full": {
        "type": "search_as_you_type"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}

POST discuss-search-as-type/_doc
{
  "full" : "San Francisco California USA US"
}

POST discuss-search-as-type/_doc
{
  "full" : "San Diego USA US"
}

I get

GET discuss-search-as-type/_search
{
  "query": {
    "multi_match": {
      "query": "san",
      "type": "bool_prefix",
      "fields": [
        "full",
        "full._2gram",
        "full._3gram"
      ]
    }
  }
}

#Result Both as Expected
{
  "took" : 67,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "discuss-search-as-type",
        "_id" : "OU77wIABpvjF7kd-eB1V",
        "_score" : 1.0,
        "_source" : {
          "full" : "San Francisco California USA US"
        }
      },
      {
        "_index" : "discuss-search-as-type",
        "_id" : "Ok4BwYABpvjF7kd-Dx1b",
        "_score" : 1.0,
        "_source" : {
          "full" : "San Diego USA US"
        }
      }
    ]
  }
}

For

I get results for: 'san francisco', 'san usa', 'san us'
Also for 'san fran', 'san cali'

GET discuss-search-as-type/_search
{
  "query": {
    "multi_match": {
      "query": "san fran",
      "type": "bool_prefix",
      "fields": [
        "full",
        "full._2gram",
        "full._3gram"
      ]
    }
  }
}

Results Both as Expected, Scored as expected san fran higher

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.1743946,
    "hits" : [
      {
        "_index" : "discuss-search-as-type",
        "_id" : "OU77wIABpvjF7kd-eB1V",
        "_score" : 2.1743946,
        "_source" : {
          "full" : "San Francisco California USA US"
        }
      },
      {
        "_index" : "discuss-search-as-type",
        "_id" : "Ok4BwYABpvjF7kd-Dx1b",
        "_score" : 0.19100355,
        "_source" : {
          "full" : "San Diego USA US"
        }
      }
    ]
  }
}

And for just "fran"

GET discuss-search-as-type/_search
{
  "query": {
    "multi_match": {
      "query": "fran",
      "type": "bool_prefix",
      "fields": [
        "full",
        "full._2gram",
        "full._3gram"
      ]
    }
  }
}

Results

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "discuss-search-as-type",
        "_id" : "OU77wIABpvjF7kd-eB1V",
        "_score" : 1.0,
        "_source" : {
          "full" : "San Francisco California USA US"
        }
      }
    ]
  }
}
1 Like

Turns out my query was not good:

{
   "query": {
    "match": {
      "full_address": {
        "query": "san fran", 
        "operator": "and"
      }
    }
  }
}

I tried it against your example, It didn't produce any result. I was expecting to partially match all search terms to avoid looking into score values to discard unwanted results.
For eg, for the query 'san fran', the result 'san diego' is not relevant.

I think the sorted result is good enough for now. Thanks a lot for digging in deep :slight_smile:

1 Like