Pipeline sending to itself

Wolfram_Haussig · January 7, 2021, 7:41am

Hello all,

We have the following usecase:
We want to read data from a service provider that requires paging when polling data(they also support pushing data but we are not accessible from the internet). I found that there are a few threads about http_poller and paging but no perfect solution.

At first, we made a straightforward try of using the http_poller to read the data but only returned a partial result because the provider requires paging for larger results(only 100 entries per page) and a token for the next page.

My next try was the logic into 2 pipelines:

One pipeline request-init:

uses the heartbeat input to regularly initiate the workflow
uses the lumberjack output to pipeline request-execution.

Pipeline request-execution:

has the Beats input(as the documentation of the lumberjack inputs recommends to use Beats input instead) to get the trigger
uses the http filter to do the polling of the REST API. If the request contained a nextPageToken this will be sent with the request as well.
does a bit of processing(like splitting the response)
uses elasticsearch output to send the data to Elasticsearch
uses the lumberjack output to pipeline request-execution with the nextPageToken to trigger reading the next page.

Unfortunately, I cannot start pipeline request-execution because of an error:
[2021-01-07T08:06:41,580][ERROR][logstash.outputs.lumberjack] All hosts unavailable, sleeping ...
So it looks like the output is opened before the input is open. Is there a way to workaround that?

If possible I would like to use LogStash only without calling an external program to handle the paging.

Best regards
Wolfram

Badger · January 7, 2021, 3:19pm

That would be request-init, not request-execution, right?

Why not just wait ten seconds and let it retry?

Wolfram_Haussig · January 7, 2021, 5:09pm

No, request-init also does not start but this is fine because the pipeline request-execution is indeed not yet running. No, I have thought that the pipeline request-execution could call itself to support the paging feature which seems to be my problem as the input plugin is not yet running when the output is started - or so it seems.

Best regards
Wolfram

Badger · January 7, 2021, 5:27pm

Got it. So the request-execution tries to start the output which connects to its own input before it starts the input, so it never completes startup.

You could use a different input/output pair where the startup order does not matter. A file input and output would work, but would never clean up the old entries. A kafka input and output would also work. There may be others.

system · February 4, 2021, 5:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Paginantion in logstash Logstash	4	528	February 20, 2023
Logstash Pipeline To Pipeline Communication Error Using HTTP Output Plugin Logstash	1	633	January 20, 2020
Pipeline not communicate Logstash	1	277	June 30, 2021
2 Inputs and 1 Output. Can we pass events from 1 input to other input Logstash	8	1587	September 20, 2017
Logstash pipelines dose not work independently Logstash	2	376	November 17, 2020

Pipeline sending to itself

Related topics