Hello all,
We have the following usecase:
We want to read data from a service provider that requires paging when polling data(they also support pushing data but we are not accessible from the internet). I found that there are a few threads about http_poller and paging but no perfect solution.
At first, we made a straightforward try of using the http_poller to read the data but only returned a partial result because the provider requires paging for larger results(only 100 entries per page) and a token for the next page.
My next try was the logic into 2 pipelines:
One pipeline request-init
:
- uses the heartbeat input to regularly initiate the workflow
- uses the lumberjack output to pipeline
request-execution
.
Pipeline request-execution
:
- has the Beats input(as the documentation of the lumberjack inputs recommends to use Beats input instead) to get the trigger
- uses the http filter to do the polling of the REST API. If the request contained a nextPageToken this will be sent with the request as well.
- does a bit of processing(like splitting the response)
- uses elasticsearch output to send the data to Elasticsearch
- uses the lumberjack output to pipeline
request-execution
with the nextPageToken to trigger reading the next page.
Unfortunately, I cannot start pipeline request-execution
because of an error:
[2021-01-07T08:06:41,580][ERROR][logstash.outputs.lumberjack] All hosts unavailable, sleeping ...
So it looks like the output is opened before the input is open. Is there a way to workaround that?
If possible I would like to use LogStash only without calling an external program to handle the paging.
Best regards
Wolfram