I wrote a query to find duplicate documents using aggregated functions:
GET log/_search
{
"profile": false,
"size": 0,
"query": {
"range": {
"@timestamp": {
"gte": "now-2d/d",
"lt": "now-1d/d",
"time_zone": "-04:00"
}
}
},
"aggs": {
"duplicateCount": {
"terms": {
"script": "doc['user_id'].value + doc['event'].value + Math.round(doc['@timestamp'].value.getMillis() / 600000) + doc['action'].value",
"size": 5000,
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {
"size": 10
}
}
}
}
}
}
This works fine but I would like to break the process down into fine steps. I would like to add the scripted field to the document itself and then delete later. My question is how to use an update query to add the fields permanently. I am having problems
POST log/_update_by_query?conflicts=proceed
{
"script": {
"lang": "painless",
"source": "ctx._source.de_dupe_key = ctx._source.user_id + ctx._source.event + ctx._source.action + Math.round(ctx._source['@timestamp'].date.millis()/600000)"
}
}