Given a web service that supports a typical paging construct of specifying the record to start from and a page size, is using the "pipeline" input and output to recursively call a config file a viable option? Or will it result in a memory bomb?
Here is how I have a recursive algorithm currently implemented:
- An "initial" config file is set to run on a schedule. It creates an event with some initial values (start_at_record = 0, page_size = 10), and outputs to another pipeline, "worker".
- "worker" uses the HTTP filter to retrieve a batch of data from the web service using start_at_record, and page_size.
- One of the resulting records is marked "keep_going". (I use a bit of Ruby to do this.)
- The "split" filter is used to break the set of records into individual records for processing.
- When the "keep_going" flag is detected, start_at_record is incremented by page_size.
- The 'output' section has an if statement so that wen 'keep_going' is detected, 'pipeline' is used to call the "worker" config again.
- Goto 2.
- Eventually, there are no more records returned, no record gets marked "keep_going" and everything falls out the bottom.
Is this clever? Or courting disaster?