Convert Field data type while re-indexing the Index In Elasticsearch

Hello All,

Is there any way to convert string data type into object type or nested type and vice-versa while re-indexing the Index using the POST _reindex API in Elasticsearch?

Regards,
Avinash Kumawat

With a rename processor in an ingest pipeline.

Hi David, my question is about to convert document field data type while re-indexing the Index for example string to integer for that i know there is a way "Convert Processor"

https://www.elastic.co/guide/en/elasticsearch/reference/master/convert-processor.html#convert-processor

But i want to convert string into object/nested type and vice versa.

Could you share a document before and what it should look like after?

Hi David,

I have created one index with mapping like -

Step 1

PUT /test_field_type/
{
    "mappings" : {
        "properties" : {
            "field1" : { 
              "type" : "integer" 
            },
            "field2" : { 
              "type" : "text" 
            },
            "field3" : { 
              "type" : "float" 
            },
            "field4" : { 
              "type" : "double" 
            },
            "field5" : { 
              "type" : "long" 
            },
            "field6" : { 
              "type" : "boolean" 
            }
        }
    }
}

Step 2: Insert one document in the index -


POST /test_field_type/_doc
{
  "field1":111,
  "field2":"sample_text1",
  "field3":11.10,
  "field4":1111.100200,
  "field5":9876543231,
  "field6":true
}

Step 3:- Created the ingest pipeline to convert the field type


PUT _ingest/pipeline/string_convert_pipline
{
  "description": "converts the content of the field2 to string type",
  "processors" : [
    {
      "convert" : {
        "field" : "field1",
        "type": "string"
      }
    },
    {
      "convert" : {
        "field" : "field6",
        "type": "string"
      }
    },
    {
      "convert" : {
        "field" : "field5",
        "type": "string"
      }
    }
  ]
}

In the above pipeline i am converting field1, field5, field6 into string type.

Step 4: Created the second index with the mapping -

PUT /test_field_type_2/
{
    "mappings" : {
        "properties" : {
            "field1" : { 
              "type" : "text" 
            },
            "field2" : { 
              "type" : "text" 
            },
            "field3" : { 
              "type" : "float" 
            },
            "field4" : { 
              "type" : "double" 
            },
            "field5" : { 
              "type" : "text" 
            },
            "field6" : { 
              "type" : "text" 
            }
        }
    }
}

As you can notice the type of field1, field5, field6 is "Text" type where as in the first index the type of field1, field5, field6 is an integer, float, Boolean respectively.

Step 5: Now with the reindex API, i am coping the index "test_field_type" document into the new index "test_field_type_2" -

POST _reindex
{
  "source": {
    "index": "test_field_type"
  },
  "dest": {
    "index": "test_field_type_2",
    "pipeline": "string_convert_pipline"
  }
}

and I have provided the pipeline to convert the data type while reindexing.

Step 6: When you search the "test_field_type_2" the document will look like this-

GET /test_field_type_2/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_field_type_2",
        "_type" : "_doc",
        "_id" : "vB378nABNcAqCQrXDvLF",
        "_score" : 1.0,
        "_source" : {
          "field1" : "111",
          "field6" : "true",
          "field3" : 11.1,
          "field2" : "sample_text1",
          "field5" : "9876543231",
          "field4" : 1111.1002
        }
      }
    ]
  }
}


As you can notice field1, field5, field6 converted into the string type.

Now let's say I have one field in the index "skillset"

"skillset":  [ "node.js", "mongodb", "mysql" ]


what I want to achieve is, while reindexing I want to convert "skillset" into
an array of an object like this-

"skillset": [{
		"name": "node.js"
	}, {
		"name": "mongodb"
	}, {
		"name": "mysql"
	}]

So is there any way I can do that using elastic search reindex and pipeline functionality?

I see. Thanks for the detailed explanation.

I believe that you need to use an ingest script processor in such complex case.

Thanks for the quick reply, I will try out ingest script processor.

Hello David,

Thanks i have tried the ingest script processor and this worked for me

Step 1


PUT _ingest/pipeline/array_string_to_array_object
{
    "description": "convert an array of string into an array of an object",
    "processors": [
      {
        "script": {
          "source": """
            def skillset_new = []; 
            for (def i = 0; i < ctx.skillset.length; i++) {
              def obj = [params.key_name: ctx.skillset[i]];
              skillset_new.add(obj);
            }
            ctx.skillset = skillset_new;
          """,
          "params": {
            "key_name":"name"
          }
        }
      }
    ]
}

Step 2: Using the _simulate API

POST /_ingest/pipeline/array_string_to_array_object/_simulate
{
  "docs": [
    {
      "_index": "skills",
      "_source": {
        "skillset":["node.js", "mongodb", "mysql"]
      }
    }
  ]
}

Result:-

{
  "docs" : [
    {
      "doc" : {
        "_index" : "skills",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "skillset" : [
            {
              "name" : "node.js"
            },
            {
              "name" : "mongodb"
            },
            {
              "name" : "mysql"
            }
          ]
        },
        "_ingest" : {
          "timestamp" : "2020-03-23T09:55:34.2929144Z"
        }
      }
    }
  ]
}


Regards,
Avinash Kumawat

Amazing. Thanks for sharing your solution!

1 Like