Ingest Lag or Pipeline not working?

Hi guys,
I have a doubt my pipelines.

I have 2 pipelines to see (in Kibana) the most recent requests within a platform. Each request is a json that has a couple of fields and:

  • req-id
  • req-timestamp

My pipelines:

  1. The first pipeline (pipeline_temp) populates the "temp" index which receives all the documents in realtime;

  2. The second pipeline (pipeline_main) populates the "main" index; it is scheduled to start every 10 minutes, it has input the temp index and for each document it checks that there is not a document with the same req-id with a greater req-timestamp and empties the "temp" index.

In my "main" index I currently have about 12 million documents and I see that in the main index there are also req-ids with different req-timestamps (not just the most recent).

The loading of these documents seems to be random, the pipeline seems to work correctly 80% of the time but about 20% fails.

Could it be a data ingestion delay problem? Maybe the main pipeline checks if there are req-id with more recent req-timestamps, but if the document has not already been ingested the check fails

Thanks in advance
Ely

Why fails? Timeout?
Is req-id unique value?
12 mil records in total in main index?
How many docs usually has temp index?
Are you using ILM for temp?

Hi Rios,
Thanks a lot for your answer.

I dont know why fails... this is the purpose of my topic :slight_smile:
Yes, req-id is unique value; main index increase every 10 minutes, when the pipeline main runs.. but now I have 12 mil docs.

And no, I'm not using ILM (elasticsearch use the default value).

Thanks!!

Is there any error in /var/log/logstash/logstash-plain.log?
With temp index, you try to avoid duplicated records based on unique req-ids?
If req-ids=12345 and req-timestamps='10092022' in index, and temp index get req-ids=12345 and newer req-timestamps='11092022' , will be update of full record for req-ids=12345 or just req-timestamps in main index?

Hi Rios,

I just asked to have access to that lo ... let's see as soon as I obtain it. What could it contain?

In the temp index I load (and gradually empty) everything that arrives. I clean by timestamps only occurs in the main index

Search for error or timeout.