Jdbc input plugin read data multiple times from database

rkhapre · June 5, 2019, 9:17am

Hi All
I am using JDBC input plugin, i have 10,000 records in Database

I am setting jdbc_page_size as 500

It goes in loop 20, to get all the 10,000 records

for this, i schedule this as
schedule => */2 * * * *

With this schedule, every 2 min it will fetch 500 records from database

But once the 20 loop is completed, the LS is again going and bringing data from Database

so the pipeline is continuously bringing the data, how can we avoid to stop bringing duplicate data once 10,000 records are completed?

pastechecker · June 5, 2019, 10:44am

The data is being logged to the elasticsearch?
If yes you could calculate MD5 hash out of your result and use it as a document ID. All the duplicates will be recorded in the Elasticsearch in the same document ID, and the only _version will increase.

rkhapre · June 5, 2019, 1:39pm

This is to just avoid duplicate
i do not want pipeline to run unnecessary and keep doing indexing and then consuming lot of RAM and CPU by checking documentid. One it loads 10,000 records it should not load data at all

BenjaminD · June 7, 2019, 1:18pm

Hello.
What about your
document_id
in the elasticsearch output settings ?
If you don't specify it, each time the request is launched ,it creates new documents.
Give it the unique id you have in the database, it should solve your issue.

rkhapre · June 7, 2019, 2:13pm

Hi @BenjaminD

I have 2mn records, i am using persistent queue
this is what happening with me
page0 : 0.5 mn
page 1: 0.5 mn (1mn)
page2 : 0.5 mn (1.5 mn)
page3 : 0.5 mn (2mn)
page 4: 0.5 mn ( 2.5 mn)
page 5 :0.5mn (3.0 mn)
...
...
...

This is going in continuous loop, how can i avoid that. I dont have any problem in using document_id, i am already aware of that solution, but how can i avoid this continuous loop?

I want once 2mn is done it should stop going in second loop and i should be able to schedule it for next day

But schedule here works on pagination basis

BenjaminD · June 7, 2019, 2:22pm

Hi @rkhapre

I'm sorry I misunderstood your issue.

Sadly, I can't help you as I'm still learning the Elastic Stack.

I hope you will find what you're looking for.

rkhapre · June 8, 2019, 8:54pm

Hi , anyone have any idea, how we can solve this

Thanks in advance.

system · July 6, 2019, 8:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash JDBC Input Plugin for streaming data Logstash	4	2070	July 6, 2017
Same data keeps on adding which results in duplicate data while using jdbc to fetch data from the mysql server Logstash	2	648	February 28, 2019
Duplicate entries into Elastic Search Logstash	8	2159	June 5, 2019
Getting duplicate records with logstash JDBC plugin Logstash	4	3431	February 23, 2017
Repeat insert data use logstash-input-jdbc plugin dump data from mysql to elasticsearch Logstash	2	1685	July 6, 2017

Jdbc input plugin read data multiple times from database

Related topics