Is it a good idea?
I certainly think so. Otherwise the documents don't get indexed, and if you're using something like filebeat, the indexing will fail and filebeat will stop sending logs in while it waits for the index to succeed (and it never will).
Let's say I would like to store the failure in its own indexes using a failure pipeline storing message and original pipeline. Is it possible?
Certainly, and personally that's the route I'd go. If I were in your situation, I would basically just keep track of the input and the resulting error in a new index, so that I could go back and retry it later. And once you know the failing input, you can use the Simulate API to tweak things until it doesn't generate an error anymore.
The docs have a whole section on handling errors using the
on_failure parameter. I'm no expert in ingest pipelines, but my understanding is that you can handle errors in 2 ways, per-process and as a kind of global catch-all. I believe that works pretty much like any other processor, meaning you can use painless to modify or append things to the original document and do whatever you need.
It's probably also worth noting that Elasticsearch's object type fields support an enabled setting, and when set to false, the input won't be mapped to any fields, it just stores it in the
_source. With this in mind, my naive approach would be to have 2 disabled fields, one for the original input and one for the resulting error object, and
on_failure I'd just shove both pieces into a new document in some kind of failure index and come back to it later. There's probably better options, and I'm sure if I read the "Handling failures in Pipelines" docs, I might come up with a better plan . But the takeaway here is that by handling errors you stop the pipeline from stoping ingestion, and by keeping track of the failures you can investigate the issue and fix the problem (either in the pipeline itself or at the source of the data).