How to run queries from multiple JDBC inputs sequentially

madebydna · March 2, 2017, 10:04pm

My pipeline consists of three JDBC inputs, a set of common filters and one output stage to Elasticsearch. The three JDBC inputs fetch similar data from three different tables, the aggregate filter groups the data by a particular field that is present in all three tables, while the output filter dynamically sets the type based on which table the data originated from.

input {
   jdbc {
      statement => "Select foo FROM table_a ORDER BY foo;"
   }
  jdbc {
      statement => "Select foo FROM table_b ORDER BY foo;"
   }
  jdbc {
      statement => "Select foo FROM table_c ORDER BY foo;"
   }
}

filters {
   aggregate {
      task_id => "%{foo}"
      code => "...."
      push_previous_map_as_event => true
   }
}

output {
    elasticsearch {
       action => "index"
       document_id => "%{foo}"
       document_type => "%{[@metadata][type]}"
    }
}

My problem is that the 3 queries are not executed sequentially, but in parallel, or at least in quick succession. This has the effect that the results / events arrive interleaved, e.g. a few events from table_a followed by events from table_b, then back to table_a, etc. This interferes with the workings of the aggregate filter which expects all events from table_a ordered by foo, then all events from table_b ordered by foo, etc.

I've been setting the worker count to 1 when running the pipeline, but upon closer reading of the documentation I realized that this setting only affects the filter and output stages of the pipeline.

I read that "Each input stage in the Logstash pipeline runs in its own thread." In my particular case, do I have other options besides running each input separately?

A similar question was asked before but didn't receive an answer.

warkolm · March 3, 2017, 4:35am

You'll want to tag each input, then use a agg filter with a conditional for each to make sure they are grouped.

Randy_BoBandy · March 21, 2017, 2:17pm

Does this mean it is not possible to run separate inputs sequentially? I'm seeing indexing performance drop by a factor of 10 when the number of inputs increases.
I've tried limiting the worker processes, but this seems to have no effect on query execution.

warkolm · March 21, 2017, 8:18pm

Yes.

system · April 18, 2017, 8:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sequential processing jdbc input to elastic search Logstash	1	343	April 2, 2019
Do multiple jdbc input run simultaneously or sequentially? Logstash	4	1476	January 17, 2019
Running multiple SQL Queries in logstash JDBC plugin Logstash	2	625	November 5, 2020
How can I have multiple concurrent jdbc_streaming filter instances? Logstash	5	1847	April 12, 2018
Append to nested type object from multiple jdbc input stream using logstash aggregation Logstash	4	902	September 19, 2019

How to run queries from multiple JDBC inputs sequentially

Related topics