Elasticseach completion strange behavior


(pierrem) #1

I'm pretty new in ElasticSearch. I have tried most tutorial and looked at forum but I can't find a good solution. For the workaround, I'm feeding using R and elastic package and Elastic API is bridged using Laravel/PHP.

I'm trying to create a geocoding index with all addresses in France in order to : 1) autocomplete address 2) geocode address

After many tests I came to choose nGram because to many problems in handling combined text and digits request with others or I didn't had the expected behavior or results.

My problem is that completion fails for long request or is not tolerant enough.

Let's say that in the autocompletion I want to target "11, rue de douai 75009 Paris".

I'll have it with following requests : 11, rue de d rue de douai

But following requests will fails having results :

11 douai

11, rue de do

rue de douai 75

rue de douai 11

for 11 rue du faubourg poissonière

11 rue du works 11 rue du f does not work no result

rue du faubourg works rue du faubourg p does not work no result

faubourg poissioner works faubourg poissionere does not work no result

My index config is as follow


"settings": {
        "analysis": {
          "analyzer": {
            "completion_analyzer": {
              "type": "custom",
              "filter": [
                "lowercase",
                "asciifolding",
                "trim",
                "completion_filter"
              ],
              "tokenizer": "keyword"
            }
          },
          "filter": {
            "completion_filter": {
            "type": "nGram",
            "min_gram": 2,
            "max_gram": 20,
            "token_chars": [ "letter", "digit", "punctuation" ]
          }
        }
      }
    },
    "mappings": {
      "geocoding": {
        "properties": {
          "numero": {
            "type": "long"
          },
          "nom_voie": {
            "type": "text"
          },
          "ville": {
            "type": "text"
          },
          "code_postal": {
            "type": "text"
          },
          "code_insee": {
            "type": "text"
          },
          "lon": {
            "type": "float"
          },
          "lat": {
            "type": "float"
          },
          "full_address": {
            "type": "text"
          },
          "address_suggest": {
            "type": "completion",
            "max_input_length" : 150,
            "analyzer": "completion_analyzer",
            "search_analyzer": "standard",
            "preserve_position_increments": false
          }
        }
      }
    }
}

I inserted data as follow :


{
    "numero" : 11,
    "nom_voie" : "rue du faubourg poissonière",
    "code_postal" : "75008",
    "code_insee" : "75108",
    "ville" : "PARIS",
    "lon" : 2.37352,
    "lat" : 48.85759,
    "full_address" : "11, rue du faubourg poissonière 75008 PARIS",
    "address_suggest" : "11 rue du faubourg poissonière 75008 PARIS",
    "weight" : 2,
}

Request is made as follow :


{
    "_source" : "full_address",
    "suggest" : {
        "text" : query,
        "completion" : {
            "field" : "address_suggest",
            "size" : 5,
            "skip_duplicates" : TRUE,
            "fuzzy" : {
                "fuzziness" : 5
            }
        }
    }
}

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.