Cannot parse "message" in pipeline with Grok -- illegal_argument_exception

bbarshaw · October 1, 2023, 1:48pm

When attempting to Grok the "message" field in a filebeat pipeline from Kibana I am getting the following error:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "field [message] of type [java.util.ArrayList] cannot be cast to [java.lang.String]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "field [message] of type [java.util.ArrayList] cannot be cast to [java.lang.String]"
      }
    }
  ]
}

The sample document I am using is:

[
  {
    "_source": {
      "message": [
        "2023-09-29 08:08:16"
      ]
    }
  }
]

If I remove the array brackets from the message making it just:

[
  {
    "_source": {
      "message": "2023-09-29 08:08:16"
    }
  }
]

It works fine and produces the output I want:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": "2023-09-29 08:08:16",
          "ftp": {
            "access": {
              "time": "2023-09-29 08:08:16"
            }
          }
        },
        "_ingest": {
          "timestamp": "2023-10-01T13:41:46.152104543Z"
        }
      }
    }
  ]
}

The brackets (array) are default for the filebeat/elasticsearch output/input though so not sure how I would remove the brackets from the filebeat output in order to process them properly.

The Grok pattern is simply:

%{TIMESTAMP_ISO8601:ftp.access.time}

Interesting to note -- the IIS pipeline which is a built-in module processes these messages just fine and my pipeline is basically just copying it with slight modifications (ftp custom fields instead of iis).

Any input would be GREATLY appreciated.

Thanks!

leandrojmp · October 1, 2023, 2:18pm

Where are you using this? Are you simulating the ingest pipeline in the Dev Tools on Kibana?

Please provide a little more context of what you are doing and how.

bbarshaw · October 1, 2023, 2:58pm

I am testing it from the pipeline itself in Kibana's "Ingest Pipelines" and "Test pipeline". The Dev Tools Grok Debugger works perfectly with the entire Grok statement (screenshot attached). The actual scenario I am attempting is to export the logs from a Windows server using filebeat and ingest them directly into Elasticsearch.

filebeat.inputs:
- type: filestream
  id: filestream-ftp-id
  enabled: true
  paths:
    ["C:/filestream/*.log"]

I also tried the deprecated type of "log" with the same results. The output is:

output.elasticsearch:  
  hosts: ["https://192.168.10.4:9200"]
  pipeline: "filebeat-8.10.2-ftp-access-pipeline"
  indices:
    - index: "filebeat-ftp"

  protocol: "https"
  ssl.verification_mode: none
  username: "elastic"
  password: <edited>

The issue though looks like Grok doesn't recognize the "message" field as it pre-defined in the filebeat template/index as "text" but rather as an array object and balks at it. Again though, this appears to be how the IIS and mssql modules do their intake (using both as well) and they work fine. I should also mention that I've created brand new pipelines for testing this and using the same format of:

[
  {
    "_source": {
      "message": [
        "could_be_anything"
      ]
    }
  }
]

Also results in the type mismatch error -- removing the brackets works again though.

Thanks for the quick reply Leandro! Happy to provide any more information.

leandrojmp · October 1, 2023, 3:10pm

It is because your message field is an array and the processor is expecting an string.

This:

[
  {
    "_source": {
      "message": [
        "could_be_anything"
      ]
    }
  }
]

Should be like this:

[
  {
    "_source": {
      "message": "could_be_anything"
    }
  }
]

Every processor in an ingest pipeline will expect a string field, not an array field, filebeat will also send your log lines as string, not arrays.

Any reason to use the source as an array?

bbarshaw · October 1, 2023, 3:15pm

I am not changing it to an array -- that is what is being sent by default. In my first post I also show that if I remove the array brackets it works but have no idea why they are being added when I have made no modifications to do so and the "message" field is defined as a "text" type in the template/index itself.

leandrojmp · October 1, 2023, 3:26pm

You said that you are using the Test pIpeline feature in Kibana, right? This interface.

So, in this interface you need to provide a sample document to test, the sample document that you should use needs to be something like this, where the message field is not an array.

[
  {
    "_source": {
      "message": "could_be_anything"
    }
  }
]

I fail to understand where the array in the message field is coming from since the sample document is provided by the user.

bbarshaw · October 1, 2023, 3:39pm

Leandro,

I was just using that as an example -- the actual document I am trying to process comes from filebeat and is processed by elasticsearch. If I view one of them in Discover and click to "interact with cell content" I can see that ALL of the fields are coming in as arrays:

{
  "@timestamp": [
    "2023-10-01T13:11:12.162Z"
  ],
  "agent.ephemeral_id": [
    "d1f0c894-7a37-4431-b0f7-dd24ee8e1925"
  ],
  "agent.hostname": [
    "lab-w22-server"
  ],
  "agent.id": [
    "2f7b5607-1239-48c4-a378-f50997527ca7"
  ],
  "agent.name": [
    "lab-w22-server"
  ],
  "agent.type": [
    "filebeat"
  ],
  "agent.version": [
    "8.10.2"
  ],
  "ecs.version": [
    "8.0.0"
  ],
  "host.architecture": [
    "x86_64"
  ],
  "host.hostname": [
    "lab-w22-server"
  ],
  "host.id": [
    "a0beffff-9920-4382-bbc6-49d52338f872"
  ],
  "host.ip": [
    "192.168.10.20"
  ],
  "host.mac": [
    "00-0C-29-43-52-0F"
  ],
  "host.name": [
    "lab-w22-server"
  ],
  "host.os.build": [
    "20348.1970"
  ],
  "host.os.family": [
    "windows"
  ],
  "host.os.kernel": [
    "10.0.20348.1970 (WinBuild.160101.0800)"
  ],
  "host.os.name": [
    "Windows Server 2022 Standard"
  ],
  "host.os.name.text": [
    "Windows Server 2022 Standard"
  ],
  "host.os.platform": [
    "windows"
  ],
  "host.os.type": [
    "windows"
  ],
  "host.os.version": [
    "10.0"
  ],
  "input.type": [
    "filestream"
  ],
  "log.file.idxhi": [
    65536
  ],
  "log.file.idxlo": [
    108195
  ],
  "log.file.path": [
    "C:\\filestream\\u_ex230920.log"
  ],
  "log.file.vol": [
    3872396159
  ],
  "log.offset": [
    138568
  ],
  "message": [
    "2023-09-20 23:00:23 192.168.0.10 53692 - FTPSVC2 lab-ftp-server - 192.168.0.20 ControlChannelClosed - - 0 0 27 0 203 fcadbdf0-fe07-443e-bbec-606635adac49 - -"
  ],
  "_id": "kYZd64oB0EkB5yQ5RbfZ",
  "_index": ".ds-filebeat-ftp-2023.10.01-000001",
  "_score": null
}

If I then cut-n-paste this into the test pipeline I get my error message -- but this is the REAL data I am trying to process.

leandrojmp · October 1, 2023, 3:54pm

Oh yeah, that's where the confusion come from.

This happens because Kibana uses the fields API and this API always return the fields as arrays even if there is just one value in the field.

This can lead to a lot of confusion some times.

If you look in the _source field in the beginning of the document in the JSON Tab on Kibana, you will see the real stucture of your document in Elasticsearch, fields with just one value, will not be show as arrays.

In short, the _source field in Kibana Discover, when you expand a document and go to the JSON Tab will show you the real structure of your document, the fields part in this same place will show all fields as arrays, but will not match the real structure in your document.

In this case you can't just copy and paste, you need to remove the brackets from the arrays for it to match how the source field will look like.

For example, if you want to copy the message field:

  "message": [
    "2023-09-20 23:00:23 192.168.0.10 53692 - FTPSVC2 lab-ftp-server - 192.168.0.20 ControlChannelClosed - - 0 0 27 0 203 fcadbdf0-fe07-443e-bbec-606635adac49 - -"
  ]

You need to remove the [ and ] before testing in your ingest pipeline, so you should use this:

  "message": "2023-09-20 23:00:23 192.168.0.10 53692 - FTPSVC2 lab-ftp-server - 192.168.0.20 ControlChannelClosed - - 0 0 27 0 203 fcadbdf0-fe07-443e-bbec-606635adac49 - -"

Then you can test your ingest pipeline.

The data collected from filebeat and sent to elasticsearch will not be sent as array, you can ignore the arrays, it is just how elastic choose to show it it in someplaces in Kibana.

bbarshaw · October 1, 2023, 4:37pm

Leandro,

Sorry for the initial confusion! You were right -- in actual production it worked just fine. One of those nuances that us ELK newbies must learn I guess. Thankfully it appears this forum is filled with EXTREMELY helpful people. Appreciate your knowledge!

system · October 29, 2023, 6:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
java.lang.IllegalArgumentException: Provided Grok expressions do not match field value Beats filebeat	1	479	February 16, 2019
Problem with GROK when coming from filebeat (bracket parsing) Beats filebeat , ingest-pipeline	4	477	August 26, 2022
Elasticsearch grok processor error Elasticsearch	5	1155	March 28, 2018
Kibana Ingest Pipeline Parsing exception Logstash	6	568	July 2, 2018
Filebeat error using ingest pipeline: Provided Grok expressions do not match field value Beats filebeat	4	6178	June 15, 2017

Cannot parse "message" in pipeline with Grok -- illegal_argument_exception

Related topics