How to add the name of Ingest node that processed the pipeline to special field

Hello,

I am trying to get the ingest node name that ran the pipeline which produced the output final document and append it to the document by setting the value of a special extra field "ingested_on" that was created for that purpose.

The issue is I don't know how to get the node name value, I know there is _ingest metadata that has timestamp and pipeline information but is there any particular metadata that can be used to extract the node name/id that processed the pipeline?

Hi @Ayd_Asraf Welcome to the community.

Apologies, but No as far as I can tell the node name is not available, can you perhaps help us understand what you are trying to accomplish? Just to be clear even if you could capture the node name that the ingest pipeline ran on you are not guaranteed that is where the document will actually be written.

Are you trying to debug?

Elasticsearch is a distributed data store / data processing platform, where any 1 operation is executed is typically not the focus...

Hello @stephenb !

Thank you for your kind welcome and I am looking forward to being a great contributor as much as I am a consumer and adding some value as well.

Thank you for confirming my doubt of the field not being available, I know and understand that fact of Elasticsearch, but then this is as you said I am trying to debug and find a root cause to an issue that is listed below:

So basically we have an high end mid-sized cluster with (4 Hot/Warm/Cold + 2 Frozen + 5 Ingest(8CPU/10GBRAM) + 3 Master + 3 Clients). Data is being sent from various sources to Logstash, which in return sends the data to ingest nodes on elastic to run pipelines that build the final document based on the source. The whole cluster is containerized (Running as Stateful sets over AWS EKS). The ingesting / client nodes are load-balanced through two different ALBs, and all consumers are connecting to those LBs. The current ingest rate is almost ~8K event/sec on avg with 12~15K event/sec upon peak times.

The area of concern is related to the ingest nodes & pipelines, as frequently we face an issue where the ingestion completely stops and when tracing we found that a couple of ingesting nodes runs at 100% CPU and request insane amount of CPU extra ( we can see xx t and xx b numbers in throttled CPU in Kibana Stack Monitoring). however, what puzzles me is that there are other one or two ingest nodes that are running at 60- 71 while the others are at their 100 and sometimes not event being able to send their monitoring data to Kibana (we can see N/A) at some times. The issue normally is solved by completely stopping and then restarting the logstashes.

My first suspect would be that the ingest nodes are undersized, however, when we restart the logstatshes the utilization does not go above 25% for a whole good hour then it starts going high gradually again till the nodes are stuck infinitely at 100% causing the issue again and again.

Here is a sample graph of one of the behaviors on one of ingesting nodes:

Now, the logs do not show anything they are sol clean with almost no errors, the only error you get is

"message": "failed to download database [GeoLite2-City.mmdb]", 
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"

but even those are not correlated to the times the issue happens. from logstash perspective, it reports the following error, though I can call the ingest node URL and access ELK APIs at the time of the issue.

"message":"Encountered a retryable error (will retry with exponential backoff)","code":504,"url":"https://ingest-xxx.xxx.net:443/_bulk","content_length":4962342

so my guess is that there is a certain pipeline that when invoked at scale causes this chaos, however in order to get to that with hundreds of pipelines, I need to correlate the pipeline runs with the node that actually executed them and sees matches when the node goes down.

I already tried GET _nodes/stats/ingest?filter_path=nodes.*.ingest stats but they are not helpful and with the amounts of events and pipelines you cannot get much useful information (especially there is no visualization in Kiabna for this :frowning: ), I also tried to create dashboards for data sources sums logged per timeslot but still not a certain pattern, especially that it seems the data is not sent when this happens, just stuck in some pipeline.

any leads to follow from your experience or ideas are welcome!

Interesting and thanks for sharing you architecture, yeah not so easy to track down.

I do think there will be some more enhanced ingest monitoring in the future but I do not have an ETA.

What Version of the stack?

Is there a lot of scripting some / all of the pipelines?

Are you using the enrich processor?

Have you used hot_threads to take a look? Here

If that graph was memory it would suggest a leak, but CPU hmmm.

And you are sure you are not backing up writing on the actual hot nodes?

And in the logstash you are setting the elasticsearch output to all the ingest nodes?

On another notes

This is related to trying to download the latest Geo DB. that should only happen on start up. I guess this cluster does not have internet access.

You can disable that behavior here

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.