Convert two repeated values in array into a string

Hello,

I have some old documents where a field has an array of two vales repeated, something like this:

          "task" : [
            "first_task",
            "first_task"
          ],

I'm trying to convert this array into a string because it's the same value. I've seen the following script: Convert array with 2 equal values to single value but in my case, this problem can't be fixed through logstash because it happens just with old documents stored.

Is there a way to manipulate actual data from Elasticsearch and convert it to string using DSL queries?

I was thinking to do something like this:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "description": "Change task field from array to first element of this one",
          "lang": "painless",
          "source": """
            if (ctx['task'][0] == ctx['task'][1]) {
                ctx['task'] = ctx['task'][0];
            }
          """
        }
      }
    ]
  },
  "docs": [
    {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : ["first_task", "first_task"]
        }
    }
  ]
}

The result document is the following:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : "first_task"
        },
        "_ingest" : {
          "timestamp" : "2022-05-11T09:08:48.150815183Z"
        }
      }
    }
  ]
}

We can see the task field is reassigned and we have the first element of the array as a value.

Thanks.

Here's the solution: elasticsearch - Convert two repeated values in array into a string - Stack Overflow

I'll copy it here.

You can achieve this with _update_by_query endpoint. Here is an example:

POST tasks/_update_by_query
{
  "script": {
    "source": """
      if (ctx._source['task'][0] == ctx._source['task'][1]) {
          ctx._source['task'] = ctx._source['task'][0];
      }
    """,
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

You can remove the match_all query if you want to update all documents or you can filter documents by chaning the conditions in the query.

Keep in mind that running a script to update all documents in the index may cause some performance issues while the update process is running.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.