Unnecessary hits from logstash to elastic index

When using JDBC paging for input plugin instead of using top and order by , the SQL Last value is not getting changed after every fetch which results in the last value to be updated after all records are pushed from the database to the elastic search index. This causes the logstash to do multiple Database query hits as well as multiple updates of the same data to the elastic search which is unnecessary.
multiplehits

I face the same issue.

For Example, below is my jdbc input plugin. I think the issue is with JDBC offset. if i have 8Lacs record, in the first iteration sql_last_value starts with the default 1970 date but during the second iteration the sql_last_value is not the "top 1 of updatedTS desc" instead it is updated with some random date, hence the JDBC plugin fetches data again from db and feeds to elasticsearch which again some 3Lacs records. Need help from the team to know if am missing any of the config to update the sql_last_value correctly when there is JDBC offset added by Logstash in the queries.

jdbc {
	type => "sometype"
	jdbc_connection_string => "${CONNSTRING}"
	jdbc_driver_class => "${JDBCDRIVER}"
	jdbc_user => "${JDBCUID}"
	statement_filepath => "query.sql"
	schedule => "/2 * * * * *"
	use_column_value => true
	tracking_column => "updatedts"
	tracking_column_type => "timestamp"
	jdbc_paging_enabled => true
	jdbc_page_size => 1000
	record_last_run => true
	clean_run => false
	last_run_metadata_path => "${CONFIG_FOLDER}\.logstash_jdbc_last_run"
}
1 Like

Can I get help on this issue please.

Did you sort the query by updatedts? Show the query.

No, i did not sort the data using the query because order by requires top clause.

SELECT a.col1 AS col1
,a.col2 AS col2
,a.updated_ts AS updatedTs
,a.col3 AS col3 FROM sampleTable a WITH (NOLOCK) WHERE updated_ts > :sql_last_value

"top + order by" will not work in my case because the records might have same updatedTS(might got into db using DML) hence went with JDBC paging.

You have to find a way to get the results sorted. The sql_last_value caches the last records value between scheduled runs, if that value is not the latest of all the records processed in that run you will not get continuous data, you get overlaps or gaps.

Can you define an ordered View and then page on that?

Thank you, will try this and let you know.

@guyboertje, I managed to sort the data with "TOP 100 PERCENT + Order By" and the data is sorted. but again the hits are increasing and the sql_last_value is wrong.

I have achieved this using rownumber sorting in the query, "TOP 100 PERCENT + ORDER BY" is not sorting the data.

Thank you @guyboertje for your assistance.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.