We are using Logstash to get data from Oracle database using JDBC with SQL query base on ID and time stamp. The Oracle table has around 15 billions/ month and the data inserted continuous to that table.
It is around 1 millions new record / minute.
Logstash cannot catch up the number of data that inserted to table.
Can you suggest any solution to solve this issue?
How complex is the query? How long does it take to run the query and retrieve the results if you use e.g. a script? If running the query and extracting the data is not the bottleneck, would it be possible to partition the query and have multiple pipelines each process a subset of the data and that way increase parallelism?
Hi Christian_Dahlqvist
The query join 3 tables and insert it to elasticsearch. The oracle database already created partition and indexes. How can I do multiple pipelines?
If you have a natural way to partition your data, you can create multiple pipelines where each has a jdbc input that contains a WHERE clause that ensures the inputs read different data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.