URI Parts Processor decodes an encoded & in URL query

Hello,

I'm fairly new to the Elastic Stack. I've set up a cluster that receives Tomcat access logs from Filebeats.
I'm using the default Ingest Pipeline created by the Apache module in Filebeat, which uses the processor URI parts to split the full URL info into path, extension, query and so forth. I need to split the query part into variables and their values (e.g. var1=valX&var2=valY), for which I've used the KV Processor.

I've found out that some URLs we receive can have encoded "&" characters. These are decoded by URI Parts into the url.query field, which then make the KV processor to fail due to an parse problem.

This is an example simulated from Kibana:

[
  {
    "_source": {
      "@timestamp": "2021-12-09T09:40:54.487393416Z",
      "field1": "//Some-Path/File.ext?query=%2A&someVar=SomeVal&SomeOtherVar=Something+%26+SomeOtherThing&YetAnotherVar=desc&format=json"
    }
  }
]

Processed by URI Parts:

{
  "_index": "_index",
  "_type": "_doc",
  "_id": "_id",
  "_source": {
    "field1": "//Some-Path/File.ext?query=%2A&someVar=SomeVal&SomeOtherVar=Something+%26+SomeOtherThing&YetAnotherVar=desc&format=json",
    "@timestamp": "2021-12-09T09:40:54.487393416Z",
    "field2": {
      "path": "/File.ext",
      "extension": "ext",
      "original": "//Some-Path/File.ext?query=%2A&someVar=SomeVal&SomeOtherVar=Something+%26+SomeOtherThing&YetAnotherVar=desc&format=json",
      "scheme": null,
      "domain": "Some-Path",
      "query": "query=*&someVar=SomeVal&SomeOtherVar=Something+&+SomeOtherThing&YetAnotherVar=desc&format=json"
    }
  },
  "_ingest": {
    "pipeline": "_simulate_pipeline",
    "timestamp": "2021-12-21T08:31:51.998010821Z"
  }

Then, because of the "+&+" situation in field2.query, the Key-Value Processor returns:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "field [field2.query] does not contain value_split [=]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "field [field2.query] does not contain value_split [=]"
      }
    }
  ]
}

Is this the intended behaviour?

Any ideas on how I could work with such URLs?

Thanks in advance,
Patricio

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.