Tracking down which pipelines fails to parse

I underwent a pretty bumpy upgrade process behind me and now that the stack stabilized, I noticed that for some pipelines there were no data parsed. As a lot of information is parsed and I've got a dozen pipelines, it is not easy for me to find out which pipelines failed to parse the data or simply didn't receive any.

Is there any solution in ES Stack to monitor grok parsing failure? Like creating a Dashboard or a feature I'm not thinking about?

The node stats API can be used to fetch ingest usage statistics, globally and on a per pipeline basis. That will at least help you figure out which pipelines are failing.

There are also several ways to handle failures in pipelines, are you doing anything with the failures or just discarding the documents/outcome?

Currently I'm doing nothing with the failures.

It's a really good idea to do something with failures, otherwise the failed documents just don't get ingested. Maybe you don't care much if you lose data, but at the very least recording some failure metrics would be helpful (though, as noted, you can get basic metrics from the node stats API).

Let's say I would like to store the failure in its own indexes using a failure pipeline storing message and original pipeline. Is it possible? Is it a good idea?

Is it a good idea?

I certainly think so. Otherwise the documents don't get indexed, and if you're using something like filebeat, the indexing will fail and filebeat will stop sending logs in while it waits for the index to succeed (and it never will).

Let's say I would like to store the failure in its own indexes using a failure pipeline storing message and original pipeline. Is it possible?

Certainly, and personally that's the route I'd go. If I were in your situation, I would basically just keep track of the input and the resulting error in a new index, so that I could go back and retry it later. And once you know the failing input, you can use the Simulate API to tweak things until it doesn't generate an error anymore.

The docs have a whole section on handling errors using the on_failure parameter. I'm no expert in ingest pipelines, but my understanding is that you can handle errors in 2 ways, per-process and as a kind of global catch-all. I believe that works pretty much like any other processor, meaning you can use painless to modify or append things to the original document and do whatever you need.

It's probably also worth noting that Elasticsearch's object type fields support an enabled setting, and when set to false, the input won't be mapped to any fields, it just stores it in the _source. With this in mind, my naive approach would be to have 2 disabled fields, one for the original input and one for the resulting error object, and on_failure I'd just shove both pieces into a new document in some kind of failure index and come back to it later. There's probably better options, and I'm sure if I read the "Handling failures in Pipelines" docs, I might come up with a better plan :wink: . But the takeaway here is that by handling errors you stop the pipeline from stoping ingestion, and by keeping track of the failures you can investigate the issue and fix the problem (either in the pipeline itself or at the source of the data).

Well for the time being, I can't either create or change pipeline because I'm getting timeout.

A maintenance is planned soon, and I will had the on_failure option to the grok processor directly through a script.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.