Elasticsearch (sort of) losing pipelines on restart

hi.... here's my setup:

  • ES Cluster of three nodes: one is the ingest node, the two others data nodes.
  • It has been setup with 'ansible-elasticsearch'
  • All works fine... filebeat is sending data to the ingest node where a pipeline processes the data that is then saved in the data nodes

Once I restart the ingest node the es-logfile logs this (a lot):
java.lang.IllegalArgumentException: pipeline with id [httpd-access] does not exist

Yet,

curl -XGET 'localhost:9200/_ingest/pipeline'

shows it exists with the correct pipeline steps.
My solution currently is to delete and recreate it:

curl -XDELETE 'localhost:9200/_ingest/pipeline/httpd-access'
curl -XPUT 'localhost:9200/_ingest/pipeline/httpd-access' -H 'Content-Type: application/json' -d'...'

As soon as I've recreated it, data is processed again and the errors in the log stop appearing.

I was trying to understand where the pipelines are stored/persisted and investigate there, but so far to no avail. And before attempting to solve it by re-building the cluster I am hoping to get some answers here. Any help is greatly appreciated.

that is unexpected,

the Ingest Pipelines are stored in the cluster-state. For whatever reason,
the cluster-state including these pipelines is not being registered in the ingest node's pipeline store.

I will try and reproduce

Are all your nodes master eligible? What do you have minimum_master_nodes set to?

thanks for you reply.
yes, all are master eligible

curl -XGET 'localhost:9200/_cat/nodes?v&pretty'
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.31.1.22           64          81   0    0.01    0.02     0.05 md        -      node02
172.31.1.21           24          80   1    0.00    0.01     0.05 md        *      node01
172.31.1.20           13          99   0    0.00    0.01     0.05 mi        -      node00

minimum_master_nodes ist set to 2 on all three nodes

discovery.zen.minimum_master_nodes: 2

That is great. I was wondering of it could be the effect of some kind of split-brain scenario, but then that does not seem to be the case.

I'm still trying to understand and made some progress:

  • Cluster is setup from scratch; all is fine
  • If adding all my pipelines, then restarting elasticsearch I'm back to the error:

java.lang.IllegalArgumentException: pipeline with id [xpack_monitoring_2] does not exist

but yet it's there:

curl -XGET 'localhost:9200/_ingest/pipeline/xpack_monitoring_2?pretty'
{
"xpack_monitoring_2" : {
"description" : "2: This is a placeholder pipeline for Monitoring API version 2 so that future versions may fix breaking changes.",
"processors" :
}
}

So: I've started to add my pipelines one by one and found that the error starts as soon as I'm adding a pipeline with a painless-script. Adding the pipeline with the painless-script renders all my pipelines unusable after a restart of the Elasticsearch server ingest node until I delete and re-add all pipelines to the running Elasticsearch server.

I'll try to simplify my setup to then post a sample script & pipeline here.

Additionally this might be related to these topics:

We're still encountering this problem. At least once a week someone on our team runs into this issue and usually re-creating the pipeline resolves it. Occasionally an environment will be more stubborn and will require some combination of node restarts and deletion/recreation of the pipeline to sort things out. Again, we're on 5.1.1 and would love if someone came up with a solution for this (or at least identified/recognized the problem).

This hit me yesterday when I added new nodes to the cluster and decided to just trust a yum update. When I got mixed version I decided to update the other nodes as well. Now none of my pipelines are working, I have bee troublshooting from the filebeat side thinking it was there. But obviously it's not.