Index data with '0' in the date format as 'Not available' when processing date formats

Hello!

I do have a large data set from VirusTotal and this data set needs to be indexed. I would like to interpret a few field names as date instead of keywords/strings.
While processing, I discovered that hundreds of the data have not a date format in the field name but holds the value "0" to mark that it does not have a date.

How can I process this date format to let elasticsearch know that a date format is not available and should just leave the date format blank or mark it somehow as "N/A"

There are a bunch of field names in the data set which should be interpreted as type "date" but when encountering the value "0" in the data set, the mapping is wrong and ES raises an error.

Is there a opportunity to avoid this kind of behavior?

e.g.
This data set is processed properly and does not raise an error.
"2018:03:17 12:12:12"

This data set is raising an error and breaks my script from running or pushing further data to ES.
"0"

I think Virustotal does mark these fields, who doesnt have a date format, with a "0" to say it is not available.

Hey,

you could remove that field if it is 0 with an ingest processor, check this example

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "remove": {
          "field": "date",
          "if": "ctx.date == '0'"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "date": "2019-12-12T12:34:56.789Z"
      }
    },
    {
      "_source": {
        "date": "0"
      }
    }
  ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.