Stringify object during reindexing

Hi Team,

I am looking for a solution to convert an object to string during reindexing. I searched for solutions using painless scripting but was unable to do so as it seems painless doesn't support JSON ( might be wrong here )

Can anyone help in this regard ?

Please share document mapping summary, current script and result, and desired result.

Say we have a document in an index index-a

{
  "req":{
    "body":{
      "a":1,
      "b":3,
      "c":{
        "d":0,
        "e":1
      }
    }
 }
}

And I want to stringify, req.body while reindexing into another index say index-b.

I tried with

POST _reindex
{
  "source": {
    "index": "index-a"
  },
  "dest": {
    "index": "index-b"
  },
  "script":{
      "inline":"ctx._source.req.body= ctx._source.req.bodytoString()",
      "lang": "painless"
    }
}

But this will not stringify the object correctly.
toString() will produce {a:1,b:3,c:{d:0,e:1}}

whereas the desired result is '{"a":1,"b":3,"c":{"d":0,"e":1}}'

There is no class handle Json in painless, as you said. If you have only numerical fields, the easiest way might be to replace '{' with '{"' and so on. If it is not so simple, maybe you have to make your custom stringifier (possibly recursive) function of hash.

Thanks @Tomo_M for the reply. Yes, looks like we need to implement the stringify function in painless script. Our object contains complex data types as well apart from numerical.

Was wondering if there is any other approach ?
We thought of enforcing the mapping on the new index ( index-b in the above example )

PUT index-b/_mapping/_doc
{
    "properties" : {
      "req.body" : {
        "type" : "object",
        "enabled": false
      }
    }
}

But in this case, the field req.body cannot be searched. Is there a way during indexing itself we enforce that the field req.body be indexed as string.

We don't have control over the data coming from source, so we can't stringify the req.body at the source. We need a solution to do it at the Elasticsearch layer

Below is a painless jsonify sample, but I don't recommend to use such script reinventing the wheel. I have no idea about other way to do it purely in Elasticsearch.

The easier way is use some client (eg. python). Retrive documents, jsonify using library and update documents with new jsonified text field.

Paramters

{
  "_source":{
    "my_object":{
      "parent":{
        "child":100,
        "childA":1.27,
        "childB": "2022-01-01T050505:000Z",
        "childC": [{
          "A":"B",
          "B":"C"
        },{
          "A":"B",
          "B":"C"
        }]
      }
    }
  }
}

Script

String jsonify(def object){
    if (object instanceof Map) {
        String out = "{";
        List keyList = new ArrayList(object.keySet());
        Collections.sort(keyList);
        for (int i=0; i< keyList.length; i++){
            String key = keyList[i];
            out = out + "\"" +  key + "\"";
            if (object[key] instanceof String){
                out = out + ": \"" + object[key] + "\""
            } else {
                out = out + ":" + jsonify(object[key])
            }
            if (i< keyList.length-1){
                out = out+","
            }
        }
        out = out + "}";
        return out
    } else if (object instanceof ArrayList) {
        String out = "[";
        for (int i = 0; i< object.length; i++){
            out = out + jsonify(object[i]);
            if (i< object.length-1){
                out = out + ","
            }
        }
        out = out + "]";
        return out
    } else {
        if (object instanceof String){
            return "\"" + object + "\""
        } else {
            return object.toString()
        }
    }
}
jsonify(params._source)

Output

{"my_object":{"parent":{"child":100,"childA":1.27,"childB": "2022-01-01T050505:000Z","childC":[{"A": "B","B": "C"},{"A": "B","B": "C"}]}}}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.