Logstash adding duplicate rows in every run

i am using Logstash to pull the date from MySQL database. I am using two different configuration files to create two indexes with different MySQL queries. I run the conf files as

logstash -f path\to\bin*.conf .

While running the above command duplication problem is solved. But one additional row added to first result from second result and vice versa. Why?

My configuration files is

  1. input {
    jdbc {
    jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
    jdbc_user => "root"
    jdbc_password => ""
    schedule =>"* * * * *"
    statement => "select attendance_fact.id as attendance_id,date(attendance_fact.punchin_time) as clocked_in_date,org_dimension.name as organization,org_dimension.id as org_id,
    concat(user_dimenson.first_name,' ',user_dimenson.last_name) as user
    from attendance_fact
    inner join user_dimenson on attendance_fact.user_fk=user_dimenson.ID
    inner join org_dimension on user_dimenson.org_fk=org_dimension.ID
    where date(attendance_fact.punchin_time)=curdate()"

}
}

output {
elasticsearch {
index => "users_clocked_in"
hosts => ["localhost:9200"]
document_id => "%{[attendance_id]}"

}

stdout { codec => json_lines }
}

  1. input {
    jdbc {
    jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
    jdbc_user => "root"
    jdbc_password => ""
    schedule =>"* * * * *"
    statement => "select complaint_fact.id as complaint_id,workflow_dimension.name as workflow,work_location_dimension.name as worklocation,date(complaint_fact.complaint_create_time) as jobsadded_createddate,
    org_dimension.name as organization,org_dimension.id as org_id
    from complaint_fact
    left join workflow_dimension on complaint_fact.workflow_fk=workflow_dimension.id
    left join work_location_dimension on complaint_fact.worklocation_fk=work_location_dimension.id
    inner join org_dimension on complaint_fact.org_fk = org_dimension.id
    where date(complaint_fact.complaint_create_time)=curdate()"

}
}
output {
elasticsearch {
index => "new_jobs_count"
hosts => ["localhost:9200"]
document_id => "%{[complaint_id]}"

}

stdout { codec => json_lines }
}

No hints from anyone. I am still looking for answer. Please help me!!!

Logstash has a single event pipeline and doesn't care if you have multiple configuration files. All events from all inputs will be sent through all filters and then to all outputs. If you don't want that you need to wrap the filters and outputs in conditionals to route the events correctly.

This question pops up here at least once a week.

I am new to ELK. Could you please give me an example of wrap the filters and outputs in conditionals!!!

While executing the configuration file, logstash adding one additional row with the Query result. The additional row contains the "document_id" details. I am using "document_id" to prevent duplication of data for every run. The duplication issue solved, but one additional row is added to the result with document_id details. Why?

I am new to ELK. Could you please give me an example of wrap the filters and outputs in conditionals!!!

See Accessing event data and fields | Logstash Reference [8.11] | Elastic for examples.

While executing the configuration file, logstash adding one additional row with the Query result. The additional row contains the "document_id" details.

It's not clear what you mean by this. Please give an example. The output from your stdout output would be useful to see.

My configuration file is

input {
jdbc {
jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
jdbc_user => "root"
jdbc_password => ""
schedule =>"* * * * *"
statement => "select attendance_fact.id as attendance_id,date(attendance_fact.punchin_time) as clocked_in_date,org_dimension.name as organization,org_dimension.id as org_id,
concat(user_dimenson.first_name,' ',user_dimenson.last_name) as user
from attendance_fact
inner join user_dimenson on attendance_fact.user_fk=user_dimenson.ID
inner join org_dimension on user_dimenson.org_fk=org_dimension.ID
where date(attendance_fact.punchin_time)=curdate()"

}
}

output {
elasticsearch {
index => "users_clocked_in"
hosts => ["localhost:9200"]
document_id => "%{[attendance_id]}"

}

stdout { codec => json_lines }
}

my stdout output is

just see the output, the second row from the last is an additional row which appended to my result with "document_id" details. That row is not needed in my result. Why it is appended?

Could you please help me to avoid that unnecessary row from showing/fetching?

That event was probably produced by one of your other jdbc inputs. The way the configuration above looks it couldn't possibly have produced the event you've circled in red.

When i execute above logstash configuration file i did not get the additional row. But when i execute the two more configuration files i am getting the result as shown.

Right, and that's because you're not wrapping your outputs in conditionals so that events from the input in configuration file A is routed only to file A's output.

How can i do that? Could you please give me an example?

Did you look at the documentation link I sent? You can set a field or a tag in the jdbc input and look at that field or tag in the conditional block in the output.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.