Custom Date format to searchable in any format for same date value

Hi,

I am facing following 2 problems, please help if some could :slight_smile:

  1. I have created index with date field of type date with custom format , when i am trying to insert the value its giving error ( but i have provided the format )

Index creation :

PUT date_index_both
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date",
"format":
"MM-dd-yyyy||M-dd-yyyy||dd-M-yyyy||dd-MM-yyyy||d-MM-yyyy||d-M-yyyy||dd-MMM-yyyy||d-MMM-yyyy||MMM-dd-yyyy||MMM-d-yyyy||yyyy-MM-dd||yyyy-M-d||yyyy-MMM-d||yyyy-dd-MMM||MM/dd/yyyy||M/d/yyyy||dd/M/yyyy||dd/MM/yyyy||d/MM/yyyy||d/M/yyyy||dd/MMM/yyyy||d/MMM/yyyy||MMM/dd/yyyy||MMM/d/yyyy||yyyy/MM/dd||yyyy/M/d||yyyy/MMM/d||yyyy/dd/MM"
}
}
}
}
}

Inserting value:

PUT date_index_both/my_type/1
{ "date": "20/01/2017" }

Output :

{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [date]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [date]",
"caused_by": {
"type": "illegal_field_value_exception",
"reason": "Cannot parse "20/01/2017": Value 20 for monthOfYear must be in the range [1,12]"
}
},
"status": 400
}

Note:: I have provided the format (dd/MM/yyyy)

  1. Some date values I inserted are not searchable with all the format provided in mapping in Kibana search bar

Ex.:

Inserted value:

PUT date_index_both/my_type/1
{ "date": "22-jan-2017" }

Getting required output with << date: (01-22-2017) >> in kibana search but getting shard error with no result for any search with forward slash format (i.e.: date:22/01/2017 , date: (22/jan/2017)

Please provide helpful suggestion if anyone have idea!! :frowning:

Thanks in advance

That looks like a bug to me.
Could you open an issue?

As a workaround for now, you might want to try ingest date processor:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "date": {
          "field": "date",
          "target_field": "new_date",
          "formats": [
            "MM-dd-yyyy","M-dd-yyyy","dd-M-yyyy","dd-MM-yyyy","d-MM-yyyy","d-M-yyyy","dd-MMM-yyyy","d-MMM-yyyy","MMM-dd-yyyy","MMM-d-yyyy","yyyy-MM-dd","yyyy-M-d","yyyy-MMM-d","yyyy-dd-MMM","MM/dd/yyyy","M/d/yyyy","dd/M/yyyy","dd/MM/yyyy","d/MM/yyyy","d/M/yyyy","dd/MMM/yyyy","d/MMM/yyyy","MMM/dd/yyyy","MMM/d/yyyy","yyyy/MM/dd","yyyy/M/d","yyyy/MMM/d","yyyy/dd/MM"
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "date": "20/01/2017"
      }
    }
  ]
}

Which gives:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_type": "_type",
        "_id": "_id",
        "_source": {
          "date": "20/01/2017",
          "new_date": "2017-01-20T00:00:00.000Z"
        },
        "_ingest": {
          "timestamp": "2017-11-30T11:58:33.309Z"
        }
      }
    }
  ]
}
2 Likes

I don't think this is a bug as such. The problem here is that you are specifying both MM/dd/yyyy and dd/MM/yyyy as date formats. This is a problem because there is no way to differentiate between the two formats when parsing a date and pick the correct one since they both look the same. Imagine that I have the date 3/4/2017, which date format is the correct one to use when parsing this date? The date I will get will depend on which format I choose to be correct. With the current approach we will try each format in order until we find one where the pattern matches but we will error if the value the parts of the pattern don't match what we expect. This means we are consistent across documents so if the pattern matches the first format (MM/dd/yyyy) then we will parse using that always, rather then being in the trappy situation where sometimes we parse with MM/dd/yyyy and sometimes with dd/MM/yyyy but never knowing what the input document actually intended as the date format.

I would instead suggest that you either normalise your data at the source so all documents have the same date format or if you can't do that then you should have some logic in your indexing application that looks at the source and determines what the format is (by for example looking a locale field) and normalise the date to a single format there.

Elasticsearch accepts multiple formats to make the easier cases where you have truly different formats easier for ingestion but in this case I think the rules are complex enough that they warrant logic outside of Elasticsearch in the indexing application

2 Likes

Thank u so much i would look into your suggestion :slight_smile:

I’d expect that date mapper behaves the same way as date ingest processor. If it fails with one format, it should try the next one and if not try the next until there is nothing else to try.

Thanks @colings86 for the solution , even i was getting confused with these two format :slight_smile: :+1:

Yes it should have checked all the format till it finds the exact match !! , but not working as expected :slightly_frowning_face::slightly_frowning_face:

I actually think the date ingest processor should behave the same way as the date mapper currently does. I think its too trappy to rely on Elasticsearch to "do the right thing" with some magic because "the right" thing is completely ambiguous to Elasticsearch. I think instead its right to push the decision back to somewhere where a definite decision on the ambiguity can be made.

I disagree with that.
Or we will have people using complicated pipelines using the pipeline error handling to handle such cases.

If the date must absolutely respect a format, then there is no need for a parser. We can just tell users to use ms since epoch. Which would be scary.

The processor is here to reconciliate dates that might come in different formats. We should not change this IMO.

Question is a bit different for the date mapper but if we support more than one format it means that we should try them all. Otherwise no need to support more than one.

IMO it’s a bug. Not a feature.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.