Logstash adding duplicate rows in every run

saafianihan · September 14, 2017, 5:26am

i am using Logstash to pull the date from MySQL database. I am using two different configuration files to create two indexes with different MySQL queries. I run the conf files as

logstash -f path\to\bin*.conf .

While running the above command duplication problem is solved. But one additional row added to first result from second result and vice versa. Why?

My configuration files is

input {
jdbc {
jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
jdbc_user => "root"
jdbc_password => ""
schedule =>"* * * * *"
statement => "select attendance_fact.id as attendance_id,date(attendance_fact.punchin_time) as clocked_in_date,org_dimension.name as organization,org_dimension.id as org_id,
concat(user_dimenson.first_name,' ',user_dimenson.last_name) as user
from attendance_fact
inner join user_dimenson on attendance_fact.user_fk=user_dimenson.ID
inner join org_dimension on user_dimenson.org_fk=org_dimension.ID
where date(attendance_fact.punchin_time)=curdate()"

}
}

output {
elasticsearch {
index => "users_clocked_in"
hosts => ["localhost:9200"]
document_id => "%{[attendance_id]}"

}

stdout { codec => json_lines }
}

input {
jdbc {
jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
jdbc_user => "root"
jdbc_password => ""
schedule =>"* * * * *"
statement => "select complaint_fact.id as complaint_id,workflow_dimension.name as workflow,work_location_dimension.name as worklocation,date(complaint_fact.complaint_create_time) as jobsadded_createddate,
org_dimension.name as organization,org_dimension.id as org_id
from complaint_fact
left join workflow_dimension on complaint_fact.workflow_fk=workflow_dimension.id
left join work_location_dimension on complaint_fact.worklocation_fk=work_location_dimension.id
inner join org_dimension on complaint_fact.org_fk = org_dimension.id
where date(complaint_fact.complaint_create_time)=curdate()"

}
}
output {
elasticsearch {
index => "new_jobs_count"
hosts => ["localhost:9200"]
document_id => "%{[complaint_id]}"

}

stdout { codec => json_lines }
}

saafianihan · September 14, 2017, 6:15am

No hints from anyone. I am still looking for answer. Please help me!!!

magnusbaeck · September 17, 2017, 5:05pm

Logstash has a single event pipeline and doesn't care if you have multiple configuration files. All events from all inputs will be sent through all filters and then to all outputs. If you don't want that you need to wrap the filters and outputs in conditionals to route the events correctly.

This question pops up here at least once a week.

saafianihan · September 18, 2017, 4:25am

I am new to ELK. Could you please give me an example of wrap the filters and outputs in conditionals!!!

saafianihan · September 18, 2017, 4:26am

While executing the configuration file, logstash adding one additional row with the Query result. The additional row contains the "document_id" details. I am using "document_id" to prevent duplication of data for every run. The duplication issue solved, but one additional row is added to the result with document_id details. Why?

magnusbaeck · September 18, 2017, 5:27am

I am new to ELK. Could you please give me an example of wrap the filters and outputs in conditionals!!!

See Accessing event data and fields | Logstash Reference [8.11] | Elastic for examples.

While executing the configuration file, logstash adding one additional row with the Query result. The additional row contains the "document_id" details.

It's not clear what you mean by this. Please give an example. The output from your stdout output would be useful to see.

saafianihan · September 18, 2017, 5:36am

My configuration file is

input {
jdbc {
jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
jdbc_user => "root"
jdbc_password => ""
schedule =>"* * * * *"
statement => "select attendance_fact.id as attendance_id,date(attendance_fact.punchin_time) as clocked_in_date,org_dimension.name as organization,org_dimension.id as org_id,
concat(user_dimenson.first_name,' ',user_dimenson.last_name) as user
from attendance_fact
inner join user_dimenson on attendance_fact.user_fk=user_dimenson.ID
inner join org_dimension on user_dimenson.org_fk=org_dimension.ID
where date(attendance_fact.punchin_time)=curdate()"

}
}

output {
elasticsearch {
index => "users_clocked_in"
hosts => ["localhost:9200"]
document_id => "%{[attendance_id]}"

}

stdout { codec => json_lines }
}

my stdout output is

just see the output, the second row from the last is an additional row which appended to my result with "document_id" details. That row is not needed in my result. Why it is appended?

saafianihan · September 18, 2017, 7:34am

Could you please help me to avoid that unnecessary row from showing/fetching?

magnusbaeck · September 18, 2017, 10:08am

That event was probably produced by one of your other jdbc inputs. The way the configuration above looks it couldn't possibly have produced the event you've circled in red.

saafianihan · September 18, 2017, 10:25am

When i execute above logstash configuration file i did not get the additional row. But when i execute the two more configuration files i am getting the result as shown.

magnusbaeck · September 18, 2017, 11:31am

Right, and that's because you're not wrapping your outputs in conditionals so that events from the input in configuration file A is routed only to file A's output.

saafianihan · September 18, 2017, 11:34am

How can i do that? Could you please give me an example?

magnusbaeck · September 18, 2017, 11:36am

Did you look at the documentation link I sent? You can set a field or a tag in the jdbc input and look at that field or tag in the conditional block in the output.

system · October 16, 2017, 11:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash adding additional one row Logstash	3	780	October 17, 2017
Logstash 2.1.1 - weird results with multiple conf files on WIndows Logstash	5	756	July 6, 2017
Logstash logs duplicates in different indexes despite else if conditions Logstash	7	2377	October 16, 2018
Problem with output elasticsearch data duplicate on index Logstash	5	2060	October 25, 2017
Indexes get assigned new fields from other indexes ?! Logstash	4	604	July 6, 2017

Logstash adding duplicate rows in every run

Related topics