Conditional processing of "date" processor when empty value

Hello,

My logs are parsed with a dissect processor which works flawlessly.
Then I have some processors to enrich my data.
One of these processors is a "date" on a field calle "date_begin" which takes a string value and converts it in a date type.

In some case, my date_begin field is empty.
In this case, the date processor fails (of course).

I tried to add a conditionnal processing of my processor like :

"date": {
            "field": "date_begin",
            "if": "date_begin != null && ctx.lgr.meta.fmt == 'stat'",
            "target_field": "date_begin",
            "formats": [
                "yyMMdd HHmmss"
            ],
            "timezone": "Europe/Paris"
        }

But it gives me compile error.

I tried adding :

ignore_failure : true

But errors are still there:

error.caused_by.reason: cannot parse empty date
error.caused_by.type: illegal_argument_exception

Any idea on how to deal with this issue ?

[Elasticsearch 7.10]

I think this is not correct:

"if": "date_begin != null && ctx.lgr.meta.fmt == 'stat'"

It should be something like this:

"if": "ctx.date_begin != null && ctx.lgr.meta.fmt == 'stat'"

Have you tried to change it?

Yes, I tried with no success.
Still have the error "Cannot parse empty date"

(Also tried with other syntax : ctx[date_begin] and ctx['date_begin'])

Maybe I should give more context.
Here is the concerned log line:

"1","O","I","240912 100015","C","166","0","PHSE","WEFTDEV","RVIQPM01","/var/opt/data/flat/IN/inaxvgw1/","3909254-54URVI05","54URVI05","54U09","RVI08","","240912 100015","0"

The field date_begin which contains the "empty date" is the one between "RVI08" and "240912 100015"

Maybe the field is not considered as "null" ?

Another idea would be :
If date_begin field is empty, then copy the data of the "date_end" field into "date_begin" (with a set processor ?), and then do the date processor on date_begin field.

Is that a good idea ?
(it's not a problem for our purpose to have the same date in date_end and date_begin, when date_begin is empty)

I would prefer the initial solution (handle the conditional processing of this date processor), but if not possible, then the second solution is conceivable

Update :

I tried with this syntax :

"if": "ctx.date_begin != '' && ctx.lgr.meta.fmt == 'stat'",

I obtain a different error than the previous try which was :

"if": "ctx.date_begin != null && ctx.lgr.meta.fmt == 'stat'",

The error is now :

error.caused_by.reason / cannot parse empty date
error.caused_by.type / illegal_argument_exception
error.reason / failed to parse field [date_begin] of type [date] in document with id 'IuWj6pEBkPVGqgwVkGXt'. Preview of field's value: ''
error.type / mapper_parsing_exception

Does that help to find what is wrong with my condition ?

@zebu14

Please share a document in elasticsearch json with the field and without... null check can be tricky

Did you check the doc section null safety

And here

Give me a couple docs and I will show you

Correct syntax example

This first checks for null then empty

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "ctx?.date_begin != null && ctx?.date_begin != ''",
          "field": "foo",
          "value": "bar"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "field_a": "value_a1"
      }
    },
    {
      "_source": {
        "field_a": "value_a2",
        "date_begin": "date_value"
      }
    },
        {
      "_source": {
        "field_a": "value_a3",
        "date_begin": ""
      }
    }
  ]
}

# Results
{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "field_a": "value_a1"
        },
        "_ingest": {
          "timestamp": "2024-09-13T21:26:56.645514306Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "date_begin": "date_value",
          "field_a": "value_a2",
          "foo": "bar"
        },
        "_ingest": {
          "timestamp": "2024-09-13T21:26:56.645605526Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "date_begin": "",
          "field_a": "value_a3"
        },
        "_ingest": {
          "timestamp": "2024-09-13T21:26:56.645614003Z"
        }
      }
    }
  ]
}

Here are two docs.

First with correct date and parsing

"1","O","I","240917 153530","E","0","12757","PHSE","OP20BBA1","PRCNAW01","/var/opt/data/flat/AxwayGateway/IN/opaxvgw1/","9561280-20BCNA01","20BCNA01","20B11","CNA08","240917 153530","240917 153530","0"

{
  "_index": "lgr-axv-axway-default-2024.09.17-000074",
  "_type": "_doc",
  "_id": "9vcyAJIBD6Sy3UurCQil",
  "_version": 1,
  "_score": null,
  "_source": {
    "date": "240917 153530",
    "agent": {
      "hostname": "opaxvgw1",
      "name": "xxxx",
      "id": "895e074c-03d7-44c0-91bc-5b52a390612a",
      "type": "filebeat",
      "ephemeral_id": "17a8e074-93ee-4294-978b-6ba5fa6fdf5b",
      "version": "7.7.1"
    },
    "code": "0",
    "dst": "PRCNAW01",
    "log": {
      "file": {
        "path": "/opt/application/.../stat.dat"
      },
      "offset": 24981
    },
    "penta_cible": "CNA08",
    "lgr_uniq_id": "b5wyAJIBmHWdItooCP4t",
    "type": "1",
    "mode": "I",
    "path": "/var/opt/data/flat/.../opaxvgw1/",
    "protocol": "PHSE",
    "ecs": {
      "version": "1.5.0"
    },
    "lgr": {
      "meta": {
        "component": "default",
        "cloudid": "[...]",
        "logsize": 902,
        "fmt": "stat",
        "env": "prd"
      }
    },
    "@version": "1",
    "plan": "prd_ods_l_02",
    "direction": "O",
    "file_component": "9561280-20BCNA01",
    "penta_source": "20B11",
    "src": "OP20BBA1",
    "idf": "20BCNA01",
    "date_end": "2024-09-17T15:35:30.000+02:00",
    "received": "2024-09-17T13:35:32.246223Z",
    "message": "\"1\",\"O\",\"I\",\"240917 153530\",\"E\",\"0\",\"12757\",\"PHSE\",\"OP20BBA1\",\"PRCNAW01\",\"/var/opt/data/flat/AxwayGateway/IN/opaxvgw1/\",\"9561280-20BCNA01\",\"20BCNA01\",\"20B11\",\"CNA08\",\"240917 153530\",\"240917 153530\",\"0\"",
    "tags": [
      "beats_input_codec_plain_applied",
      "beats_input_codec_json_applied",
      "routage-main-ods",
      "routage-main-cloudid"
    ],
    "input": {
      "type": "log"
    },
    "retries": "0",
    "@timestamp": "2024-09-17T15:35:30.000+02:00",
    "size": "12757",
    "date_begin": "2024-09-17T15:35:30.000+02:00",
    "kafka": {
      "headers": [],
      "partition": 0,
      "offset": 489122125,
      "topic": "...",
      "key": ""
    },
    "fields": {
      "produit": "stats_flow",
      "program": "axv_stat_prd"
    },
    "status": "E"
  },
  "fields": {
    "date_end": [
      "2024-09-17T13:35:30.000Z"
    ],
    "received": [
      "2024-09-17T13:35:32.246Z"
    ],
    "@timestamp": [
      "2024-09-17T13:35:30.000Z"
    ],
    "date_begin": [
      "2024-09-17T13:35:30.000Z"
    ]
  },
  "sort": [
    1726580130000
  ]
}

Second with "empty" date and error

"1","O","I","240917 153502","C","166","0","PHSE","44O06","DRP26B01","/var/opt/data/flat/AxwayGateway/IN/opaxvgw1","9561121-44O06","44OZZZ03","44O06","26B06","","240917 153502","0"

{
  "_index": "lgr-axv-axway-reject-2024.09.17-000074",
  "_type": "_doc",
  "_id": "o_ExAJIBD6Sy3UuronvT",
  "_version": 1,
  "_score": null,
  "_source": {
    "date": "240917 153502",
    "agent": {
      "hostname": "opaxvgw1",
      "name": "...",
      "id": "895e074c-03d7-44c0-91bc-5b52a390612a",
      "ephemeral_id": "17a8e074-93ee-4294-978b-6ba5fa6fdf5b",
      "type": "filebeat",
      "version": "7.7.1"
    },
    "code": "166",
    "dst": "DRP26B01",
    "log": {
      "file": {
        "path": "/opt/application/.../stat.dat"
      },
      "offset": 1575
    },
    "penta_cible": "26B06",
    "lgr_uniq_id": "fpwxAJIBmHWdItoonq3W",
    "type": "1",
    "error": {
      "message": "cannot parse empty date"
    },
    "mode": "I",
    "path": "/var/opt/data/flat/.../opaxvgw1",
    "protocol": "PHSE",
    "ecs": {
      "version": "1.5.0"
    },
    "lgr": {
      "meta": {
        "component": "default",
        "cloudid": "[...]",
        "logsize": 879,
        "fmt": "stat",
        "env": "prd"
      }
    },
    "@version": "1",
    "plan": "prd_ods_l_02",
    "direction": "O",
    "file_component": "9561121-44O06",
    "penta_source": "44O06",
    "src": "44O06",
    "idf": "44OZZZ03",
    "date_end": "240917 153502",
    "message": "\"1\",\"O\",\"I\",\"240917 153502\",\"C\",\"166\",\"0\",\"PHSE\",\"44O06\",\"DRP26B01\",\"/var/opt/data/flat/AxwayGateway/IN/opaxvgw1\",\"9561121-44O06\",\"44OZZZ03\",\"44O06\",\"26B06\",\"\",\"240917 153502\",\"0\"",
    "tags": [
      "beats_input_codec_plain_applied",
      "beats_input_codec_json_applied",
      "routage-main-ods",
      "routage-main-cloudid"
    ],
    "input": {
      "type": "log"
    },
    "retries": "0",
    "@timestamp": "2024-09-17T15:35:02.000+02:00",
    "size": "0",
    "date_begin": "",
    "kafka": {
      "headers": [],
      "partition": 0,
      "offset": 489101404,
      "topic": "...,
      "key": ""
    },
    "fields": {
      "produit": "stats_flow",
      "program": "axv_stat_prd"
    },
    "status": "C"
  },
  "fields": {
    "@timestamp": [
      "2024-09-17T13:35:02.000Z"
    ]
  },
  "sort": [
    1726580102000
  ]
}

Did you see my solution... Did you try it?

Yes, i saw your solution but it just helps to use a "set" processor, which is not what I am looking for in a first place.

You asked for two examples of logs, one which is OK and one which is KO, that's why I gave them.

If there is no solution with the initial problem and question, then I'll study the "set" processor to replace the value contained in "date_begin" with some data.

But for now, I'd prefer to find a solution to the "empty" or "null" processing.

Oh and I have another question please :
What's the difference between : ctx?.___ and ctx.___ ?

Update: I tested you example pipeline.
If I understand it well, a "null" field, is when the field doesn't exist at all in the document, whereas an empty field is the field populated with an empty value in it.

Hi @zebu14

The point of my example was not the set processor.. it's was the if condition that allows you to safely access that field... When it does exist or is not empty and then you would just use opposite logic (the is no built in "else") for when
the field doesn't exist or is empty

Once you can safely access the field you can use the date, processor or whichever processor you like.

For documents without the field you will have to decide what value you want to put in that field.

Ok.

I decided to go with this solution :

    {
        "set": {
            "if": "ctx?.date_begin == '' && ctx.lgr.meta.fmt == 'stat'",
            "field": "date_begin",
            "value": "{{date_end}}"
        }
    }

So if my date is empty, then I use the end date as the begin date.
This way I have no more error during parsing.

1 Like