Is reindexing to a new index necessary after updating settings and mappings to support a multi-field in elasticsearch?

Please consider the scenario.

Existing System

  1. I have an index named contacts_index with 100 documents.
  2. Each document has property named city with some text value in it.
  3. Index has settings as the following
    {
      "analyzer": {
        "city_analyzer": {
          "filter": [
            "lowercase"
          ],
          "tokenizer": "city_tokenizer"
        },
        "search_analyzer": {
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword"
        }
      },
      "tokenizer": {
        "city_tokenizer": {
          "token_chars": [
            "letter"
          ],
          "min_gram": "2",
          "type": "ngram",
          "max_gram": "30"
        }
      }
    }
  1. The index has the following mapping for city field to support matching sub-text search.
    {
       "city" : {
              "type" : "text",
              "analyzer" : "city_analyzer",
              "search_analyzer" : "search_analyzer"
            }
    }

Proposed System

Now we want to perform autocomplete on city field. for example for city with value Seattle. We want to get the document when the user types s, se, sea, seat, seatt, seattl, seattle but Only when they query with the above prefix text. For example not when they type eattle. etc..

We have planned to attain this with the help of one more multi-field for city property with different of type text and different analyzer.

To attain this we have done the following.

  1. Updated the settings to support autocomplete
    PUT /staging-contacts-index-v4.0/_settings?preserve_existing=true
    {
      "analysis": {
        "analyzer": {
          "autocomplete_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "autocomplete_tokenizer"
          }
        },
        "tokenizer": {
          "autocomplete_tokenizer": {
            "token_chars": [
              "letter"
            ],
            "min_gram": "1",
            "type": "edge_ngram",
            "max_gram": "100"
          }
        }
      }
    }
  1. Update the mapping of city field with multi-field autocomplete to support autocomplete
    {
       "city" : {
              "type" : "text",
              "fields" : {
                "autocomplete" : {
                  "type" : "text",
                  "analyzer" : "autocomplete_analyzer",
                  "search_analyzer" : "search_analyzer"
                }
              },
              "analyzer" : "city_analyzer",
              "search_analyzer" : "search_analyzer"
            }
    }

Findings

  1. For any new document that will be newly created after updating autocomplete multi-field settings, autocomplete search is working as expected

  2. For existing documents, if the value of city field changes, for example seattle to chicago, the document is fetched when making autocomplete search.

  3. We are planning to make use of update api to fetch and update the existing 100 documents so that autocomplete works for existing documents as well. However while trying to use the update api, we are getting
    {"result" : "noop"}
    And the autocomplete search is not working.

I can infer that since the values were not changing, elasticsearch not creating tokens for autocomplete field.

Question

From the research we have done, there are two options to make sure the existing 100 documents can perform autocomplete search.

  1. Use Reindex api for existing 100 documents.
  2. Fetch all 100 documents and Use document Index api to update the existing 100 documents which will create all the tokens in the process.

Which option is preferable and why?

Thanks for taking time to read through.

Use reindex as it's a single call to do it.

1 Like

Okay @warkolm . Thank you for taking time to reply. 100 Documents i gave here as an example, The cluster has 4 nodes with 70 million documents of size 1.3 TB. You still recommend the same?

Reindex API is made for this scenario. There is nothing that would work better than this.

1 Like

Okay @defalt . Thank you for clarifying.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.