Elastic Search : Data Migration for Dates


(Manjunatha Nakshathri) #1

Hello All,

We have our documents indexed in Elastic search. Each of these documents has date field captured without the timezone (By default, we are capturing the date in IST timezone, without the timezone). For obvious reasons, this is not right.

We are planning to capture date in UTC timezone and migrate all existing documents to capture the date in UTC timezone. After going through a couple of documents, I did not get any definitive guide to do this.

Need some expert opinion on how to solve this issue. Really appreciate every opinion.

Regards,
Manjunath


(David Pilato) #2

I wonder if you could use the reindex API with an ingest pipeline and a date processor?

Would that work?


(Manjunatha Nakshathri) #3

Hello David,

Thanks for that suggestion. We are able to get this working with Reindex API with ingest pipeline with date, rename, remove, script processors.

Request Data : 
{
      "startTime" : "2017-12-30 10:00:00", 
      "endTime"  : "2017-12-30 11:00:00",
      "activities" : [
            {
                    "startTime" : "2017-12-30 10:00:00",
                    "endDtime" : "2017-12-30 10:15:00"
            }
      ]
}

Script Processor : Used to add default timezone for the Date fields.
Date Processor : Used for converting the timezone and copy the value to new field
Rename and Remove Processors : For getting back the original document structure without the intermediate fields which got created during processing.

But facing one challenge. We have the date field captured at the document root level and at the nested level as well. How can we use the script processor for setting the default timezone value at the nested activities level? Really appreciate your quick suggestion on this.

Regards,
Manjunath


(David Pilato) #4

I'm not a painless user so I don't really know. Can't you iterate on the nested objects as well?


(Manjunatha Nakshathri) #5

I can iterate over activities with foreach processor like below :

 {
	"foreach": {
		"field": "activities",
		"processor": {
			"date": {
				"field": "_ingest._value.startTime",
				"target_field": "_ingest._value.startTime_utc",
				"formats": ["dd/MM/yyyy hh:mm:ss Z"],
				"timezone": "UTC"
			}
		}
	}
}

But what I need is :

{
"foreach": {
	"field": "activities",
	"processor": {
		"script": {
			"lang": "painless",
			"source": "ctx.startTimeWithOffset = ctx.startTime + params.default_offset",
			"params": {
				"default_offset": " +0530"
			}
		}
	}
}
}

And after this transformation apply the date processor on the modified field like below :

{
	"foreach": {
		"field": "activities",
		"processor": {
			"date": {
				"field": "_ingest._value.startTimeWithOffset",
				"target_field": "_ingest._value.startTime_utc",
				"formats": ["dd/MM/yyyy hh:mm:ss Z"],
				"timezone": "UTC"
			}
		}
	}
}

Where is the issue :

ForEach processor with the script processor is not able to iterate through the normal way and we get exceptions in

    "source": "ctx.startTimeWithOffset = ctx.startTime + params.default_offset",

(David Pilato) #6

@Igor_Motov do you know by any chance?


(Igor Motov) #7

@Nakshathri which error are you getting? Could you post here the simulate pipeline command with your test data and pipeline, so I can take a look?


(Manjunatha Nakshathri) #8

@Igor_Motov here are the scenarios:

  1. Able to iterate over activities.startTime; But the updated field, activityStartTimeWithOffset, is present as part of parent object, above activities.

    "foreach": {
      "field": "activities",
      "processor": {
     "script": {
     	"lang": "painless",
     	"source": "ctx.activityStartTimeWithOffset = ctx.startTime + params.default_offset",
     	"params": {
     		"default_offset": " +0530"
     	}
     }
     }
    }
    
  2. If we change the script to as given below, then it gives error:

    "foreach": {
      "field": "activities",
      "processor": {
     "script": {
     	"lang": "painless",
     	"source": "_ingest._value.activityStartTimeWithOffset = _ingest._value.startTime + params.default_offset",
     	"params": {
     		"default_offset": " +0530"
     	}
     }
     }
    }
    

Error :

{
    "error": {
        "root_cause": [
            {
                "type": "script_exception",
                "reason": "compile error",
                "script_stack": [
                    "_ingest._value.activitySt ...",
                    "^---- HERE"
                ],
                "script": "_ingest._value.activityStartTimeWithOffset = _ingest._value.startTime + params.default_offset",
                "lang": "painless",
                "header": {
                    "processor_type": "foreach",
                    "property_name": "source"
                }
            }
        ],
        "type": "script_exception",
        "reason": "compile error",
        "script_stack": [
            "_ingest._value.activitySt ...",
            "^---- HERE"
        ],
        "script": "_ingest._value.activityStartTimeWithOffset = _ingest._value.startTime + params.default_offset",
        "lang": "painless",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Variable [_ingest] is not defined."
        },
        "header": {
            "processor_type": "foreach",
            "property_name": "source"
        }
    },
    "status": 500
}

I think I am doing some syntax issue in the second case. But with the available documents, I was not able to get the right syntax for my need.

Really appreciate your feedback.


(Igor Motov) #9

@Nakshathri could you help me here and instead of describing the issue piece by piece post the simulate pipeline command with your test data and pipeline that reproduces the issue, so I can run it locally to see what's going on?


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.