i am using Logstash to pull the date from MySQL database. I am using two different configuration files to create two indexes with different MySQL queries. I run the conf files as
logstash -f path\to\bin*.conf .
While running the above command duplication problem is solved. But one additional row added to first result from second result and vice versa. Why?
My configuration files is
input {
jdbc {
jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
jdbc_user => "root"
jdbc_password => ""
schedule =>"* * * * *"
statement => "select attendance_fact.id as attendance_id,date(attendance_fact.punchin_time) as clocked_in_date,org_dimension.name as organization,org_dimension.id as org_id,
concat(user_dimenson.first_name,' ',user_dimenson.last_name) as user
from attendance_fact
inner join user_dimenson on attendance_fact.user_fk=user_dimenson.ID
inner join org_dimension on user_dimenson.org_fk=org_dimension.ID
where date(attendance_fact.punchin_time)=curdate()"
input {
jdbc {
jdbc_driver_library => "C:/New_ELK/logstash-5.5.2/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/emob_report"
jdbc_user => "root"
jdbc_password => ""
schedule =>"* * * * *"
statement => "select complaint_fact.id as complaint_id,workflow_dimension.name as workflow,work_location_dimension.name as worklocation,date(complaint_fact.complaint_create_time) as jobsadded_createddate,
org_dimension.name as organization,org_dimension.id as org_id
from complaint_fact
left join workflow_dimension on complaint_fact.workflow_fk=workflow_dimension.id
left join work_location_dimension on complaint_fact.worklocation_fk=work_location_dimension.id
inner join org_dimension on complaint_fact.org_fk = org_dimension.id
where date(complaint_fact.complaint_create_time)=curdate()"
Logstash has a single event pipeline and doesn't care if you have multiple configuration files. All events from all inputs will be sent through all filters and then to all outputs. If you don't want that you need to wrap the filters and outputs in conditionals to route the events correctly.
While executing the configuration file, logstash adding one additional row with the Query result. The additional row contains the "document_id" details. I am using "document_id" to prevent duplication of data for every run. The duplication issue solved, but one additional row is added to the result with document_id details. Why?
While executing the configuration file, logstash adding one additional row with the Query result. The additional row contains the "document_id" details.
It's not clear what you mean by this. Please give an example. The output from your stdout output would be useful to see.
just see the output, the second row from the last is an additional row which appended to my result with "document_id" details. That row is not needed in my result. Why it is appended?
That event was probably produced by one of your other jdbc inputs. The way the configuration above looks it couldn't possibly have produced the event you've circled in red.
When i execute above logstash configuration file i did not get the additional row. But when i execute the two more configuration files i am getting the result as shown.
Right, and that's because you're not wrapping your outputs in conditionals so that events from the input in configuration file A is routed only to file A's output.
Did you look at the documentation link I sent? You can set a field or a tag in the jdbc input and look at that field or tag in the conditional block in the output.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.