Ingest Lag or Pipeline not working?

Ely_96 · September 11, 2022, 7:36pm

Hi guys,
I have a doubt my pipelines.

I have 2 pipelines to see (in Kibana) the most recent requests within a platform. Each request is a json that has a couple of fields and:

req-id
req-timestamp

My pipelines:

The first pipeline (pipeline_temp) populates the "temp" index which receives all the documents in realtime;
The second pipeline (pipeline_main) populates the "main" index; it is scheduled to start every 10 minutes, it has input the temp index and for each document it checks that there is not a document with the same req-id with a greater req-timestamp and empties the "temp" index.

In my "main" index I currently have about 12 million documents and I see that in the main index there are also req-ids with different req-timestamps (not just the most recent).

The loading of these documents seems to be random, the pipeline seems to work correctly 80% of the time but about 20% fails.

Could it be a data ingestion delay problem? Maybe the main pipeline checks if there are req-id with more recent req-timestamps, but if the document has not already been ingested the check fails

Thanks in advance
Ely

Rios · September 11, 2022, 8:01pm

Why fails? Timeout?
Is req-id unique value?
12 mil records in total in main index?
How many docs usually has temp index?
Are you using ILM for temp?

Ely_96 · September 11, 2022, 8:23pm

Hi Rios,
Thanks a lot for your answer.

I dont know why fails... this is the purpose of my topic
Yes, req-id is unique value; main index increase every 10 minutes, when the pipeline main runs.. but now I have 12 mil docs.

And no, I'm not using ILM (elasticsearch use the default value).

Thanks!!

Rios · September 11, 2022, 8:49pm

Is there any error in /var/log/logstash/logstash-plain.log?
With temp index, you try to avoid duplicated records based on unique req-ids?
If req-ids=12345 and req-timestamps='10092022' in index, and temp index get req-ids=12345 and newer req-timestamps='11092022' , will be update of full record for req-ids=12345 or just req-timestamps in main index?

Ely_96 · September 12, 2022, 6:46am

Hi Rios,

I just asked to have access to that lo ... let's see as soon as I obtain it. What could it contain?

In the temp index I load (and gradually empty) everything that arrives. I clean by timestamps only occurs in the main index

Rios · September 12, 2022, 1:02pm

Search for error or timeout.

system · October 10, 2022, 1:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitoring ingress pipelines Elasticsearch ingest-pipeline	2	569	October 9, 2021
Logstash ingest pipeline - Data from one pipeline going to another Logstash ingest-pipeline	9	1200	January 9, 2022
Duplicate records in Elasticsearch Logstash ilm-index-lifecycle-management	12	125	August 10, 2025
Ingest pipeline not working but working in the simulate API Elasticsearch	7	5558	November 4, 2022
Dec 13th, 2018: [EN][Elasticsearch] Chaining Ingest Pipelines Advent Calendar	1	1908	December 1, 2019

Ingest Lag or Pipeline not working?

Related topics