How to update fields that are not opened in the source document without affecting them

My template

{
"mappings":{"dynamic": "false","_source":{"includes":["a","b"]},"properties":{"a":{"type": "keyword"},"b":{"type": "keyword"},"c":{"type": "keyword"}}}}

PUT /your_index_name/_doc/1

{"a": "value_a","b": "value_b","c": "value_c"}

POST /your_index_name/_update/1

{"doc": {"a": "new_value_a","b": "new_value_b"}}

Although field c has not been stored yet, I hope that in this update situation, the index value of c should remain unchanged under the updated data, rather than being left blank.

What is exactly your issue? It is not clear if you are facing any issue.

You don’t need to hope, that is the documented and expected behavior! Just try it , eg in Kibana DevTools:


# 152: PUT /your_index_name/_doc/1 [201 Created]
# {"a": "value_a","b": "value_b","c": "value_c"}
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

# 154: GET /your_index_name/_doc/1 [200 OK]
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "a": "value_a",
    "b": "value_b",
    "c": "value_c"
  }
}

# 155: POST /your_index_name/_update/1 [200 OK]
# {"doc": {"a": "new_value_a","b": "new_value_b"}}
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 1
}

# 157: GET /your_index_name/_doc/1 [200 OK]
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 2,
  "_seq_no": 1,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "a": "new_value_a",
    "b": "new_value_b",
    "c": "value_c"
  }
}

Essentially the doc is updated via

  • getting (current) version N of the doc
  • fields (from that doc and the POST call) are merged
  • version N+1 is stored in the index

@S-Dragon0302 Are you seeing something different?

PUT _index_template/your_template_name
{
  "index_patterns": ["your_index_name"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    },
    "mappings": {
      "dynamic": false,
      "_source": {
        "includes": ["a", "b"]
      },
      "properties": {
        "a": { "type": "keyword" },
        "b": { "type": "keyword" },
        "c": { "type": "keyword" }
      }
    }
  }
}
PUT /your_index_name/_doc/1
{
  "a": "value_a",
  "b": "value_b",
  "c": "value_c"
}
POST your_index_name/_update/1
{
  "doc": {
    "a": "updated_value_a",
    "b": "updated_value_b"
  }
}

I think we should keep the values in the index file instead of field c being null

Create an index using this template

PUT _index_template/your_template_name
{
  "index_patterns": ["your_index_name"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    },
    "mappings": {
      "dynamic": false,
      "_source": {
        "includes": ["a", "b"]
      },
      "properties": {
        "a": { "type": "keyword" },
        "b": { "type": "keyword" },
        "c": { "type": "keyword" }
      }
    }
  }
}

You have in your mapping explicitly not included the c field. As seen in the documentation this means that the field c is indexed but then removed from source before this is stored. Updates are, like reindexing, based on the source document so the behaviour you are seeing is expected.

Elasticsearch is based on Lucene, which uses immutable segments. Updates are therefore not done in place, which means the source need to be retrieved, updated and reindexed into a new segment where the indexed value that was not included in the source is not available.

To ensure the c field remains searchable you, as far as I know, need to either enable it in the source or send it in with the update request.

Apologies, I missed the specifics of the template.

But:

The first bullet “getting (current) version N of the doc” translates into retrieving that document ID, parsing _source, and merging this with data from the POST-ed _update call. Since “c” is not in _source nor the update doc, the N+1 version of this doc has no “c” field.

There’s at least 2 ways to get the behavior you hoped for:

a) do the field merging at app level, therefore providing “c” and its current value when doing the partial update

b) don’t exclude “c” from _source

1 Like

What I'm thinking is that the source doesn't have a value, but the values in the index are still there. If we need to pass values externally and update frequently, it will put a lot of pressure on other storage devices

If I have 100 fields, I only enabled 2 fields to be stored in ES, and the remaining 98 fields were indexed but not stored. The original value of HBase is 100 fields, and every time I update 2 fields in ES, I need to retrieve the complete 100 fields from HBase. What I want is that I don't need to retrieve 100 fields from HBase, I only need to update 2 fields in ES, 98 sources can be left blank, at least to ensure that the values in the index are not affected

If I have 100 fields, I only enabled 2 fields to be stored in ES, and the remaining 98 fields were indexed but not stored. The original value of HBase is 100 fields, and every time I update 2 fields in ES, I need to retrieve the complete 100 fields from HBase. What I want is that I don't need to retrieve 100 fields from HBase, I only need to update 2 fields in ES, 98 sources can be left blank, at least to ensure that the values in the index are not affected

If I have 100 fields, I only enabled 2 fields to be stored in ES, and the remaining 98 fields were indexed but not stored. The original value of HBase is 100 fields, and every time I update 2 fields in ES, I need to retrieve the complete 100 fields from HBase. What I want is that I don't need to retrieve 100 fields from HBase, I only need to update 2 fields in ES, 98 sources can be left blank, at least to ensure that the values in the index are not affected

There is no need to repeat the same response 3 times. As explained earlier you need to include all 100 fields in the source to get the behaviour you are looking for when using Elasticsearch. @RainTown showed this with the original example where he used a different mapping.