Logstash JDBC Input Plugin for streaming data


(Navneet Mathpal) #1

Can we use logstash-jdbc-input plugin for streaming data , ex: if I have data available in my database and I run JDBC input plugin , it will index the data into es, but if after some time more data comes of database , Is jdbc input plugin is able to index that data without restarting the logstash ?


(Magnus Bäck) #2

Yes, you can run it periodically and have it pick up new data. See the State section of the documentation.


(Navneet Mathpal) #3

@magnusbaeck I have a concern is that , If I have a very large data set , and once it indexes the data into es and if I schedule it again , will it again try to index the whole data set ?
because if it does so , it would be an overburden on application , isn't ?
(I know we can handle the duplicate rows but as a performance point of view how feasible it would be ?)

Thanks :smiley:


(Magnus Bäck) #4

As the documentation I linked to tries to explain, your query is served with a parameter that contains the timestamp when the query was run the last time. You can use that to select only the rows that have been updated since the last run. Duplicates in the Elasticsearch output can be avoided by setting the document id to e.g. the primary key from the source database.


(system) #5