Issue with date format change while reindexing


(Meenakshi) #1

Hi There,

Am new to elastic search and getting the following error while reindexing my data to a new index. I have a change in the format of date field from "yyyy-mm-ddThh:mm:ss.SSSZ" to "yyyy-mm-dd". Could you please help me with solving this issue

{
"index": "index111",
"type": "type111",
"id": "1935",
"cause": {
"type": "mapper_parsing_exception",
"reason": "failed to parse [person.birthDate]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": """Invalid format: "1955-09-23T12:00:00.000Z" is malformed at "T12:00:00.000Z""""
}
},
"status": 400
}

Thanks


(David Pilato) #2

May be use a Date Processor that you put in an ingest pipeline.
An ingest pipeline can be used while reindexing data.


(Meenakshi) #3

Thanks for a quick reply. Could you please share me an example, if you have any.


(David Pilato) #4

I don't but here is the doc:

From this last page, here is an example of calling a pipeline from reindex API:

POST _reindex
{
  "source": {
    "index": "source"
  },
  "dest": {
    "index": "dest",
    "pipeline": "some_ingest_pipeline"
  }
}

(Meenakshi) #5

I have created the pipeline in the following way and tried reindex but still facing issue

PUT _ingest/pipeline/my-pipeline-id
{
"description": "date pipeline ",
"processors": [
{
"date": {
"field": "person.birthDate",
"target_field": "timestamp",
"formats": [
"yyyy-MM-dd"
]
}
}
]
}

POST _reindex?wait_for_completion=false
{
"source": {
"index": "source_index"
},
"dest": {
"index": "index111",
"pipeline": "my-pipeline-id"
}
}

and now getting the following error

{
"index": "index111",
"type": "type111",
"id": "29808",
"cause": {
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to parse date [1935-06-17T12:00:00.000Z]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: unable to parse date [1935-06-17T12:00:00.000Z]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "unable to parse date [1935-06-17T12:00:00.000Z]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": """Invalid format: "1935-06-17T12:00:00.000Z" is malformed at "T12:00:00.000Z""""
}
}
},
"header": {
"processor_type": "date"
}
},
"status": 500
},


(David Pilato) #6

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Here you set that you read from person.birthDate and write to timestamp but you did not change or remove person.birthDate which is still incorrect.

You should use the _simulate API to see exactly what you are doing before run the reindex operation.
And it's probably not as straightforward as you think it is. The date processor helps to parse a string and make it a date. It does not help to render the field with the format you wish AFAIK.

But why do you want to change the date format after all? Could you explain it?
Is that for display purpose? Can 't you solve that at render time? I mean that a date is a date in elasticsearch whatever its format in the _source document.

If it's about inserting data, then your field can support multiple formats. See https://www.elastic.co/guide/en/elasticsearch/reference/master/date.html#multiple-date-formats. In which case you don't even need to reindex the data.


(Meenakshi) #7

HI David,

This is my simulate for the above issue, but i am still getting the same error, not sure if am missing something in the pipeline

POST _ingest/pipeline/_simulate
{
  "pipeline" :{
  "description": "date pipeline ",
  "processors": [
    {
      "date": {
        "field": "person.birthDate",
        "target_field": "timestamp",
        "formats": [
          "yyyy-MM-dd"
        ]
      }
    }
  ]},
  "docs": [
    {
      "_index": "index",
      "_type": "_doc",
      "_id": "id",
       "_source": {
          "identifiers": {
            "partyIdentifier": "20619"
          },
          "person": {
            "personIdentifier": {
              "id": "20619",
              "idScope": "ENT",
              "idContext": "B182"
            },
            "birthDate": "1950-01-01T12:00:00.000Z"}}
    }
  ]
}

_________________________________________ERROR _______________________________________

        {
          "docs": [
            {
              "error": {
                "root_cause": [
                  {
                    "type": "exception",
                    "reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to parse date [1950-01-01T12:00:00.000Z]",
                    "header": {
                      "processor_type": "date"
                    }
                  }
                ],
                "type": "exception",
                "reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to parse date [1950-01-01T12:00:00.000Z]",
                "caused_by": {
                  "type": "illegal_argument_exception",
                  "reason": "java.lang.IllegalArgumentException: unable to parse date [1950-01-01T12:00:00.000Z]",
                  "caused_by": {
                    "type": "illegal_argument_exception",
                    "reason": "unable to parse date [1950-01-01T12:00:00.000Z]",
                    "caused_by": {
                      "type": "illegal_argument_exception",
                      "reason": """Invalid format: "1950-01-01T12:00:00.000Z" is malformed at "T12:00:00.000Z""""
                    }
                  }
                },
                "header": {
                  "processor_type": "date"
                }
              }
            }
          ]
        }

(Tan Vinh Nguyen) #8

Use ISO8601 as the format.

"formats": [ "ISO8601" ]

Your input doesn't match your format.

Invalid format: "1950-01-01T12:00:00.000Z"

(David Pilato) #9

Can you also answer my questions?

But why do you want to change the date format after all? Could you explain it?
Is that for display purpose? Can 't you solve that at render time? I mean that a date is a date in elasticsearch whatever its format in the _source document.


(Meenakshi) #10

The source and destination are in yyyy-mm-dd format so to have a uniform data format we are looking for this change. we do have some other fields with changes in mappings and settings so we would like to do all the changes in one shot by doing reindexing


(David Pilato) #11

Here is a way to do it:

POST _ingest/pipeline/_simulate
{
  "pipeline" :{
  "description": "date pipeline ",
  "processors": [
    {
        "script": {
          "source": """
             SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
             ctx.person.birthDate = format.format(format.parse(ctx.person.birthDate));
          """
        }
    }
  ]},
  "docs": [
    {
      "_index": "index",
      "_type": "_doc",
      "_id": "id",
       "_source": {
          "person": {
            "birthDate": "1950-01-01T12:00:00.000Z"
          }
       }
    }
  ]
}

This gives:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "id",
        "_source" : {
          "person" : {
            "birthDate" : "1950-01-01"
          }
        },
        "_ingest" : {
          "timestamp" : "2019-02-26T17:35:45.52603Z"
        }
      }
    }
  ]
}

HTH


Query on where a date field is Tuesday, Wednesday or Thursday
(Meenakshi) #12

Thank you David, Thats very helpful :slight_smile:


(system) closed #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.