The problem we are experiencing is that an ingest error from the Apache integration (agent) is causing an enrichment processor in a separate pipeline to fail with the same error the Apache processor failed with. This is with elasticsearch 8.8.1, elastic agent 8.8.1, Apache integration 1.8.2.
The failure event in elastic agent logs is
{"type":"document_parsing_exception","reason":"[1:1287] failed to parse field [network.forwarded_ip] of type [ip] in document with id 'bSV9SokBpaQYVFWLbmGU'. Preview of field's value: 'server.com'","caused_by":{"type":"illegal_argument_exception","reason":"'server.com' is not an IP string literal."}}, dropping event!
That tells me the agent module couldn't properly parse the Apache log entry. That's really not surprising for this particular server as it is a proxy server used by our library and it just gets hammered with garbage requests. I'm assuming this is coming from the agent module rather than the logs-apache.access-1.8.2 pipeline because the pipeline would have failed with a grok parsing failure and never made it to the geolocation processor.
After enabling the Apache integration for this particular server I started seeing lots of errors from a completely separate pipeline that I use to handle local geolocation. This pipeline is called from various @custom pipelines and uses an enrichment processor to look up geolocation information based on source.ip, or network.forwarded_ip if that is available. The Apache pipeline only can call this if the rest of the pipeline succeeds. So, with respect to the event mentioned above, this custom geolocation pipeline is never called.
The errors in the custom pipeline are showing up for events that are coming from completely different integrations and different servers. But they all end up with the same type of error that the Apache integration had. The error message shows the source.ip field is a valid IP.
add-local-geoip-source.ip-step1: 'server.com' is not an IP string literal. source.ip: 10.45.45.89
The relevant part of the enrichment processor:
"enrich": {
"field": "source.ip",
"policy_name": "localgeoip-policy",
"target_field": "source.geo",
"ignore_missing": true,
"if": "ctx?.network?.forwarded_ip == null",
"on_failure": [
{
"set": {
"field": "error.message",
"value": "add-local-geoip-source.ip-step1: {{ _ingest.on_failure_message }} source.ip: {{source.ip}}"
}
}
]
}
When logs from the problem server are being ingested the number of ingest errors on the custom pipeline skyrocket for other servers. Even after stopping the ingest of logs from the problem server these errors continue to appear randomly for other servers, though not nearly as often as when the problem server logs are being ingested.
I'm at a loss on this one. There's nothing in the elasticsearch logs on the ingest or data nodes to indicate a problem. How is it possible that an error from a failed pipeline can impact a completely separate pipeline?