How to update fields that are not opened in the source document without affecting them

My template

{
"mappings":{"dynamic": "false","_source":{"includes":["a","b"]},"properties":{"a":{"type": "keyword"},"b":{"type": "keyword"},"c":{"type": "keyword"}}}}

PUT /your_index_name/_doc/1

{"a": "value_a","b": "value_b","c": "value_c"}

POST /your_index_name/_update/1

{"doc": {"a": "new_value_a","b": "new_value_b"}}

Although field c has not been stored yet, I hope that in this update situation, the index value of c should remain unchanged under the updated data, rather than being left blank.

What is exactly your issue? It is not clear if you are facing any issue.

You don’t need to hope, that is the documented and expected behavior! Just try it , eg in Kibana DevTools:


# 152: PUT /your_index_name/_doc/1 [201 Created]
# {"a": "value_a","b": "value_b","c": "value_c"}
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

# 154: GET /your_index_name/_doc/1 [200 OK]
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "a": "value_a",
    "b": "value_b",
    "c": "value_c"
  }
}

# 155: POST /your_index_name/_update/1 [200 OK]
# {"doc": {"a": "new_value_a","b": "new_value_b"}}
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 1
}

# 157: GET /your_index_name/_doc/1 [200 OK]
{
  "_index": "your_index_name",
  "_id": "1",
  "_version": 2,
  "_seq_no": 1,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "a": "new_value_a",
    "b": "new_value_b",
    "c": "value_c"
  }
}

Essentially the doc is updated via

  • getting (current) version N of the doc
  • fields (from that doc and the POST call) are merged
  • version N+1 is stored in the index

@S-Dragon0302 Are you seeing something different?

PUT _index_template/your_template_name
{
  "index_patterns": ["your_index_name"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    },
    "mappings": {
      "dynamic": false,
      "_source": {
        "includes": ["a", "b"]
      },
      "properties": {
        "a": { "type": "keyword" },
        "b": { "type": "keyword" },
        "c": { "type": "keyword" }
      }
    }
  }
}
PUT /your_index_name/_doc/1
{
  "a": "value_a",
  "b": "value_b",
  "c": "value_c"
}
POST your_index_name/_update/1
{
  "doc": {
    "a": "updated_value_a",
    "b": "updated_value_b"
  }
}

I think we should keep the values in the index file instead of field c being null

Create an index using this template

PUT _index_template/your_template_name
{
  "index_patterns": ["your_index_name"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    },
    "mappings": {
      "dynamic": false,
      "_source": {
        "includes": ["a", "b"]
      },
      "properties": {
        "a": { "type": "keyword" },
        "b": { "type": "keyword" },
        "c": { "type": "keyword" }
      }
    }
  }
}

You have in your mapping explicitly not included the c field. As seen in the documentation this means that the field c is indexed but then removed from source before this is stored. Updates are, like reindexing, based on the source document so the behaviour you are seeing is expected.

Elasticsearch is based on Lucene, which uses immutable segments. Updates are therefore not done in place, which means the source need to be retrieved, updated and reindexed into a new segment where the indexed value that was not included in the source is not available.

To ensure the c field remains searchable you, as far as I know, need to either enable it in the source or send it in with the update request.

Apologies, I missed the specifics of the template.

But:

The first bullet “getting (current) version N of the doc” translates into retrieving that document ID, parsing _source, and merging this with data from the POST-ed _update call. Since “c” is not in _source nor the update doc, the N+1 version of this doc has no “c” field.

There’s at least 2 ways to get the behavior you hoped for:

a) do the field merging at app level, therefore providing “c” and its current value when doing the partial update

b) don’t exclude “c” from _source

1 Like

What I'm thinking is that the source doesn't have a value, but the values in the index are still there. If we need to pass values externally and update frequently, it will put a lot of pressure on other storage devices

If I have 100 fields, I only enabled 2 fields to be stored in ES, and the remaining 98 fields were indexed but not stored. The original value of HBase is 100 fields, and every time I update 2 fields in ES, I need to retrieve the complete 100 fields from HBase. What I want is that I don't need to retrieve 100 fields from HBase, I only need to update 2 fields in ES, 98 sources can be left blank, at least to ensure that the values in the index are not affected

If I have 100 fields, I only enabled 2 fields to be stored in ES, and the remaining 98 fields were indexed but not stored. The original value of HBase is 100 fields, and every time I update 2 fields in ES, I need to retrieve the complete 100 fields from HBase. What I want is that I don't need to retrieve 100 fields from HBase, I only need to update 2 fields in ES, 98 sources can be left blank, at least to ensure that the values in the index are not affected

If I have 100 fields, I only enabled 2 fields to be stored in ES, and the remaining 98 fields were indexed but not stored. The original value of HBase is 100 fields, and every time I update 2 fields in ES, I need to retrieve the complete 100 fields from HBase. What I want is that I don't need to retrieve 100 fields from HBase, I only need to update 2 fields in ES, 98 sources can be left blank, at least to ensure that the values in the index are not affected

There is no need to repeat the same response 3 times. As explained earlier you need to include all 100 fields in the source to get the behaviour you are looking for when using Elasticsearch. @RainTown showed this with the original example where he used a different mapping.

I am using the hbase+elasticsearch architecture. hbase stores source data, while es only has index data without source data. The index needs to obtain source data from hbase to update, which will face the problem I mentioned.

That does not work. Elasticsearch need all data included in the source document it initially stores in order to be able to perform updates with only a select number of fields the way you describe. This was shown in the initial example where different mappings were used. To get the behaviour you require you will need to recreate the index after you have removed the following from the index template:

I would recommend that you test this and compare the result to your initial test that failed.

Storing the full source in Elasticsearch will take up more space, but is required in order to support partial updates. The source documents are compressed so may however not add as much overhead as you expect. If you do not want all fields returned in response to the query you can specify this at query time.

This isn’t accurately describing what your example did.

the only 2 fields you claim to wish Elasticsearch to fully care about were a and b, you didn’t want Elasticsearch to care much about c, or the other 97 fields if there were 100. This is what your template implemented. But you did give a specific type for c, so a bit undecided there ?

Yet your complaint was specifically about an inaccurate value of c? The one you want Elasticsearch to not give its full attention to? Note that your update handled an and b correctly.

this is IMO a tad illogical.

But even that’s irrelevant - important is that @Christian_Dahlqvist has explained how it works. That’s how it works. If you had understood or hoped differently, then your understanding has been improved, or hopes (sadly) dashed.

I sometimes wish the 41 bus went a slightly different route, it would suit me better. The bus company is sadly not persuaded. And said something about other passengers … :rofl:

1 Like