Adding a field with a script to all documents for de-duplicating records

I wrote a query to find duplicate documents using aggregated functions:

GET log/_search
   {
  "profile": false,
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-2d/d",
        "lt": "now-1d/d",
        "time_zone": "-04:00"
      }
    }
  },
  "aggs": {
    "duplicateCount": {
      "terms": {
        "script": "doc['user_id'].value + doc['event'].value + Math.round(doc['@timestamp'].value.getMillis() / 600000) + doc['action'].value",
        "size": 5000,
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

This works fine but I would like to break the process down into fine steps. I would like to add the scripted field to the document itself and then delete later. My question is how to use an update query to add the fields permanently. I am having problems

POST log/_update_by_query?conflicts=proceed
{
  "script": {
    "lang": "painless",
    "source": "ctx._source.de_dupe_key = ctx._source.user_id + ctx._source.event + ctx._source.action  + Math.round(ctx._source['@timestamp'].date.millis()/600000)"
  }
}

I am having problems

You haven't said what the problem is. Does the update by query you showed give an error, or does the new field not exist on any documents? If the latter, you might try adding a query as I don't think there is a default, thus the query would match no documents. You probably want to use an exists query so that you can find documents that have not yet had the key added, so that any hiccups will make the process idempotent when re-running.

Apologies, I get this error

"script": "...... Math.round(ctx._source['@timestamp'].date.millis()/600000)",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unable to find dynamic field [date] for class [java.lang.String]."
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.