This is kinda subtle. The _update
API does a realtime get to obtain the latest version of the doc, followed by a write with optimistic concurrency control to ensure that there were no concurrent writes. The refresh you're seeing is coming from the realtime get.
Realtime gets may read recent operations from the translog, but can only do this if the location of the document in the translog is tracked in memory. This tracking is itself expensive, and it's unnecessary unless you're doing realtime gets, so it's disabled until the shard sees a realtime get happening. If the translog location is unavailable then a realtime get must perform a refresh so it can retrieve the document with a search instead.
Thus we'd expect the first update on a shard to trigger a refresh, as you've observed, because at this point the shard isn't tracking translog locations in memory. The first update also flips it into tracking-translog-locations mode so that subsequent updates will be able to read docs directly from the translog without needing further refreshes, and indeed that's what we can observe:
DELETE /test
# 200 OK
# {
# "acknowledged": true
# }
PUT /test
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"refresh_interval": -1
}
}
# 200 OK
# {
# "acknowledged": true,
# "index": "test",
# "shards_acknowledged": true
# }
NB creating a single shard, no replicas, otherwise we need to flip each shard copy into tracking-translog-locations mode which requires N gets and therefore N refreshes.
PUT /test/_doc/1?refresh
{
"foo": "bar"
}
# 201 Created
# {
# "_id": "1",
# "_index": "test",
# "_primary_term": 1,
# "_seq_no": 0,
# "_shards": {
# "failed": 0,
# "successful": 1,
# "total": 1
# },
# "_version": 1,
# "forced_refresh": true,
# "result": "created"
# }
GET /test/_stats?human&filter_path=_all.primaries.refresh.total
# 200 OK
# {
# "_all": {
# "primaries": {
# "refresh": {
# "total": 3
# }
# }
# }
# }
POST /test/_update/1?refresh=false
{
"doc": {
"foo": "bar1"
}
}
# 200 OK
# {
# "_id": "1",
# "_index": "test",
# "_primary_term": 1,
# "_seq_no": 1,
# "_shards": {
# "failed": 0,
# "successful": 1,
# "total": 1
# },
# "_version": 2,
# "result": "updated"
# }
GET /test/_stats?human&filter_path=_all.primaries.refresh.total
# 200 OK
# {
# "_all": {
# "primaries": {
# "refresh": {
# "total": 3
# }
# }
# }
# }
NB no refresh was needed for this update, the index was already fully refreshed.
POST /test/_update/1?refresh=false
{
"doc": {
"foo": "bar2"
}
}
# 200 OK
# {
# "_id": "1",
# "_index": "test",
# "_primary_term": 1,
# "_seq_no": 2,
# "_shards": {
# "failed": 0,
# "successful": 1,
# "total": 1
# },
# "_version": 3,
# "result": "updated"
# }
GET /test/_stats?human&filter_path=_all.primaries.refresh.total
# 200 OK
# {
# "_all": {
# "primaries": {
# "refresh": {
# "total": 4
# }
# }
# }
# }
NB a refresh was needed for this update because the shard was not in translog-location-tracking mode so had to get the previous document update using a search. But now the shard is in translog-location-tracking mode ...
POST /test/_update/1?refresh=false
{
"doc": {
"foo": "bar3"
}
}
# 200 OK
# {
# "_id": "1",
# "_index": "test",
# "_primary_term": 1,
# "_seq_no": 3,
# "_shards": {
# "failed": 0,
# "successful": 1,
# "total": 1
# },
# "_version": 4,
# "result": "updated"
# }
GET /test/_stats?human&filter_path=_all.primaries.refresh.total
# 200 OK
# {
# "_all": {
# "primaries": {
# "refresh": {
# "total": 4
# }
# }
# }
# }
... so this update needs no refresh ...
POST /test/_update/1?refresh=false
{
"doc": {
"foo": "bar4"
}
}
# 200 OK
# {
# "_id": "1",
# "_index": "test",
# "_primary_term": 1,
# "_seq_no": 4,
# "_shards": {
# "failed": 0,
# "successful": 1,
# "total": 1
# },
# "_version": 5,
# "result": "updated"
# }
GET /test/_stats?human&filter_path=_all.primaries.refresh.total
# 200 OK
# {
# "_all": {
# "primaries": {
# "refresh": {
# "total": 4
# }
# }
# }
# }
... and nor does this one.