Hi Kai,
There's a few ways to go about avoiding duplication generally.
But if you use your own document ids rather than let Elasticsearch autogenerate them for you, you can update an existing doc using its document id.
So:
POST /testing-updates/doc/
{
"user": "user1",
"enabled": true
}
vs
POST /testing-updates/doc/user2
{
"user": "user2",
"enabled": true
}
If we take a look at the _id for each of those docs:
GET /testing-updates/_search
resulting in:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "testing-updates",
"_type": "doc",
"_id": "AV1kA52WMIr8LISA3BqI",
"_score": 1,
"_source": {
"user": "user1",
"enabled": true
}
},
{
"_index": "testing-updates",
"_type": "doc",
"_id": "user2",
"_score": 1,
"_source": {
"user": "user2",
"enabled": true
}
}
]
}
}
You'll see that user2 has an id of user2 whereas user1 has an autogenerated doc id, because we specified the id of user2 but didn't for user1.
So if we try to update both of those users, setting that "enabled" field to false for example, we see that user2 is successfully updated but there's now two user1 docs as a new one was created with a new auto generated document id. The new one has the updated value, the old one has the original value for "enabled":
POST /testing-updates/doc/
{
"user": "user1",
"enabled": false
}
POST /testing-updates/doc/user2
{
"user": "user2",
"enabled": false
}
resulting in:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "testing-updates",
"_type": "doc",
"_id": "AV1kBhENMIr8LISA3ByZ",
"_score": 1,
"_source": {
"user": "user1",
"enabled": false
}
},
{
"_index": "testing-updates",
"_type": "doc",
"_id": "AV1kA52WMIr8LISA3BqI",
"_score": 1,
"_source": {
"user": "user1",
"enabled": true
}
},
{
"_index": "testing-updates",
"_type": "doc",
"_id": "user2",
"_score": 1,
"_source": {
"user": "user2",
"enabled": false
}
}
]
}
}
So in your case, if each document represents a video then you might use the video id as the document id.