Pipeline sending to itself

Hello all,

We have the following usecase:
We want to read data from a service provider that requires paging when polling data(they also support pushing data but we are not accessible from the internet). I found that there are a few threads about http_poller and paging but no perfect solution.

At first, we made a straightforward try of using the http_poller to read the data but only returned a partial result because the provider requires paging for larger results(only 100 entries per page) and a token for the next page.

My next try was the logic into 2 pipelines:

One pipeline request-init:

  • uses the heartbeat input to regularly initiate the workflow
  • uses the lumberjack output to pipeline request-execution.

Pipeline request-execution:

  • has the Beats input(as the documentation of the lumberjack inputs recommends to use Beats input instead) to get the trigger
  • uses the http filter to do the polling of the REST API. If the request contained a nextPageToken this will be sent with the request as well.
  • does a bit of processing(like splitting the response)
  • uses elasticsearch output to send the data to Elasticsearch
  • uses the lumberjack output to pipeline request-execution with the nextPageToken to trigger reading the next page.

Unfortunately, I cannot start pipeline request-execution because of an error:
[2021-01-07T08:06:41,580][ERROR][logstash.outputs.lumberjack] All hosts unavailable, sleeping ...
So it looks like the output is opened before the input is open. Is there a way to workaround that?

If possible I would like to use LogStash only without calling an external program to handle the paging.

Best regards
Wolfram

That would be request-init, not request-execution, right?

Why not just wait ten seconds and let it retry?

No, request-init also does not start but this is fine because the pipeline request-execution is indeed not yet running. No, I have thought that the pipeline request-execution could call itself to support the paging feature which seems to be my problem as the input plugin is not yet running when the output is started - or so it seems.

Best regards
Wolfram

Got it. So the request-execution tries to start the output which connects to its own input before it starts the input, so it never completes startup.

You could use a different input/output pair where the startup order does not matter. A file input and output would work, but would never clean up the old entries. A kafka input and output would also work. There may be others.