Painless Script: Splitting a string by '/'

Hello,

Here's my scenario. I have an index full of documents. I'm trying to create a new field called parent_folder_path which will be based on the already existing path field.

For example, if path is populated by example/file/path/document.pdf, the parent_folder_path would be example/file/path/.

I'm using an update by query script to try and create the parent_folder_path from the path field using painless script, but I'm having difficulty finding my way about the painless documentation. I'll share what I have so far, any help would be appreciated!

Setup:

PUT my-index
PUT my-index/_mapping
{
    "properties": {
        "path": {"type": "keyword"},
        "parent_folder_path": {"type": "keyword"}
    }
}

PUT my-index/_doc/doc_1 
{
  "path": "files/uploads/doc_1.pdf"
}

PUT my-index/_doc/doc_2
{
  "path": "files/uploads/subdirectory/doc_2.pdf"
}

Update by query attempt:
I know what I need to do, but I don't know the right syntax for it...

POST my-index/_update_by_query?conflicts=proceed
{
  "script": {
    "lang": "painless",
    "source": """
      def temp=ctx._source['parent'];
      def items= temp.splitOnToken('/');
      def temp2=Arrays.toString(items[:-1]);
      ctx._source['parent_folder_path'] = temp2;
    """
  }
}

End goal:

GET my-index/_search
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : "doc_1",
        "_score" : 1.0,
        "_source" : {
          "path": "files/uploads/doc_1.pdf",
          "parent_folder_path": "files/uploads/"
        }
      },
      {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : "doc_2",
        "_score" : 1.0,
        "_source" : {
          "path": "files/uploads/subdirectory/doc_2.pdf",
          "parent_folder_path": "files/uploads/subdirectory/"
        }
      }
   ]
}

Ok, I was able to figure out a script that worked, here it is:

POST my-index/_update_by_query?conflicts=proceed
{
  "script": {
    "lang": "painless",
    "source": """
      def split_path=ctx._source['path'].splitOnToken('/');
      int i = 0;
      String parent_folder = '';
      for (item in split_path) {
        i = i + 1;
        if (i != split_path.length) {parent_folder += item + '/'}
      }
      ctx._source['parent_folder_path'] = parent_folder;
    """
  }
}

And here's the beautiful results!

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : "doc_1",
        "_score" : 1.0,
        "_source" : {
          "path" : "files/uploads/doc_1.pdf",
          "parent_folder_path" : "files/uploads/"
        }
      },
      {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : "doc_2",
        "_score" : 1.0,
        "_source" : {
          "path" : "files/uploads/subdirectory/doc_2.pdf",
          "parent_folder_path" : "files/uploads/subdirectory/"
        }
      }
    ]
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.