Logstash delta pull from postgres

tenet_testuser1 · June 24, 2020, 2:38pm

I have a simple logstash pipeline job that is a SQL job that pulls data from a postgres db into ES.

The initial pull was 1.5 million rows, which ran last night and took about 5 minutes. but the delta daily pull is about 4000 rows (4K)

I expected the delta pull be just a few seconds.. however, today's run also took about 5 minutes.

I am wondering why?
I hope it's just pulling delta data... if so, why is it taking almost the same time as pull all the data?

Thanks

Badger · June 24, 2020, 2:40pm

Maybe it is pulling all of the data again. What does the configuration look like?

tenet_testuser1 · June 25, 2020, 10:55pm

I have pasted the sample from 7.8 doc.

In my case, I have to filter for delta data based on multiple columns, this example shows just one.

How do I go about setting the where clause, and the tracking_column?

input {
jdbc {
statement => "SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > :sql_last_value"
use_column_value => true
tracking_column => "id"
# ... other configuration bits
}
}

tenet_testuser1 · June 25, 2020, 11:34pm

I tried this, but I do not think it worked well.

my pipeline configuration...
Note that I have commented the --'2019-04-15' portion after '?'

statement=>"select initcap(b.company_name) company_name, a.symbol,a.day_close,a.day_high,a.day_low,
a.trade_date,a.rs,a.high_52week,a.low_52week
from stock_daily_data a,company_listing b where a.trade_date> '?' --'2019-04-15'
and a.symbol=b.symbol"
use_column_value => true
tracking_column => "a.trade_date"
prepared_statement_bind_values => [":sql_last_value"]
prepared_statement_name => "sbsectorsymbolonee"
use_prepared_statements => true
tracking_column_type => "timestamp"
last_run_metadata_path => "/dockermnt/logstash/lastvalues/xx.yml"

logstash log snippet

2020-06-25T23:11:59.023551191Z [2020-06-25T23:11:59,021][INFO ][logstash.inputs.jdbc ][sector_symbol_one][c564e52a52c7396ffa0533a145f2e4d228e32c50e285fcef8a7f2c04cc23171c] (0.056661s) PREPARE xx: select initcap(b.company_name) company_name, a.symbol,a.day_close,a.day_high,a.day_low,
2020-06-25T23:11:59.023648903Z a.trade_date,a.rs,a.high_52week,a.low_52week
2020-06-25T23:11:59.023760102Z from stock_daily_data a,company_listing b where a.trade_date> '?' --'2019-04-15'
2020-06-25T23:11:59.023774379Z and a.symbol=b.symbol
2020-06-25T23:11:59.204127319Z [2020-06-25T23:11:59,203][WARN ][logstash.inputs.jdbc ][sector_symbol_one][c564e52a52c7396ffa0533a145f2e4d228e32c50e285fcef8a7f2c04cc23171c] Exception when executing JDBC query {:exception=>org.postgresql.util.PSQLException: The column index is out of range: 1, number of columns: 0.}

tenet_testuser1 · June 26, 2020, 12:41am

I tried a few other changes to the cfg file, I was not successful.

I am either getting the postgres index error, or logstash is running out of heap, OOM.

Please let me know, how to execute a sql with multiple table join, with multiple filter predicates and perform delta pulls.

tenet_testuser1 · June 27, 2020, 4:15am

For anyone that might stumbe upon this issue and come here searching for answers....

I was able to get it working. Surprisingly, (after 2 days), it seems pretty simple.

here is my logstash config snippet that made the difference....

set the timezone to UTC
pay close attention to the SQL statement
First run, to get all the data, set clean_run = true. Once the first run is complete, remove this directive.
Order by the date column is VERY IMPORTANT.

jdbc_default_timezone=> "UTC"
statement=>"select a.trade_date,a.low_52week
from stock_daily_data a,company_listing b where a.trade_date> :sql_last_value and a.symbol=b.symbol order by trade_date"
use_column_value => true
tracking_column => "trade_date"
tracking_column_type => "timestamp"

system · July 25, 2020, 4:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Log stash configuration file to push delta of data from SQL Server using JDBC Importer Logstash	2	2566	July 6, 2017
Logstash jdbc plugin with MSSQL Logstash	12	841	November 14, 2023
Optimize logstash configuration Logstash	6	569	July 31, 2020
Logstash reading sql server data real time Logstash	9	2062	March 7, 2019
Sql_last_value, store value even no document indexed Logstash	6	1501	February 4, 2020

Logstash delta pull from postgres

Related topics