Duplicate entries into Elastic Search

Ravindra_Nath · May 6, 2019, 1:25pm

Problem:
I am using ELK 6.71. I am able to get logs data to Elastic Search by input plugin from MySQL database and i scheduled to execute the SQL query for every 1 minute. But, every time logstash runs the query it also fetches the previously fetched documents(duplicates).
Is there any way to fetch only the new entries every time i run the SQL query. Every document contains a unique log number field.

Thanks in advance.

Badger · May 6, 2019, 2:12pm

If you are using the jdbc input and you have a sequence in a column then the input can manage state.

elasticforme · May 6, 2019, 2:32pm

few month ago when I started with elk, I had same problem.
we are using jdbc 90% of the time.

please show us your logstash input section.
is your database has uniq field?

Ravindra_Nath · May 7, 2019, 4:24am

Yes, my database has unique "logid" field. It auto generates every time a new entry comes in.

elasticforme · May 7, 2019, 1:38pm

what is your output section looks like?
use
document_id => "%{logid}"

and you won't have duplicate

But problem here I see is that you are doing full query to table_name every second which is not good for large database.
and by doing this your elasticsearch is also working over time

for example you have
logid name as database filed. you have 10 record.
you read them first time and elasticserach will insert in to it's database

now lets say after two second you have 10 more record in your mysql database
elk will read that 20 record.
remove 10 from first one and insert 20 in.

I hope this make sense.

Ravindra_Nath · May 8, 2019, 7:21am

I tried and not working. Logstash is not at all starting.

Ravindra_Nath · May 8, 2019, 9:23am

Thanks @Badger and @elasticforme. Solution is perfectly working fine for me.

elasticforme · May 8, 2019, 8:54pm

Glad to hear that. come often to this site and share your experience.

system · June 5, 2019, 8:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicate entries when using jdbc pipeline Logstash elastic-stack-monitoring	5	515	April 5, 2019
Getting duplicate records with logstash JDBC plugin Logstash	4	3384	February 23, 2017
Duplication in logstash pipeline (input elasticsearch and output sql database) Logstash	3	229	June 20, 2023
Jdbc input plugin read data multiple times from database Logstash	7	780	July 6, 2019
Logstash and databse Logstash	16	5637	July 6, 2017

Duplicate entries into Elastic Search

Related topics