Logstash not creating right number of documents when passing folder as an argument


(saisn) #1

Hi,

What I'm doing: Created 4 conf files under a root directory called XYZ. Each of these conf files will import 1000 rows from SQL and tables being imported are unique in all 4 conf files. when ran separately the number of documents created are 1000 in each index but when running conf file with root folder as argument the count is not 1000 in indexes.
I've also observed each index is picking different document. I'm also using templates for each of the index and template name is different all across.

I've given different index names and different document names in all configfiles. But, somehow the number of documents created in each of the indexes is different from expected.
However, when i run configfiles seperately the number of docs created are correct


Duplicate records when scaling logstash
(João Duarte) #2

I'm not sure what is happening without seeing the configuration files, but remember that in logstash, if you run it with multiple configuration files, they will all be concatenated and evaluated as a single one.
So, be sure to confirm if you have unnecessarily duplicate input/filter/output sections.


(saisn) #3

i cant share the whole config file but each conf file is fetching data from single table and writing to a single table. Similar to below i have 4 conf files for each table under a root directory

input {
jdbc {
jdbc_driver_library => "/h"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://"
jdbc_user => "ReadOnly"
jdbc_password => ""
#lowercase_column_names => false
#schedule => "*/10 * * * *"
clean_run => true
use_column_value => true
tracking_column => ***
record_last_run => true
#used for incremental updates of record by having a reference point in ../jinfo
last_run_metadata_path => "/etc/logstash/run_metadata.d/"
statement => "SELECT *
FROM [a] where [a] > :sql_last_value order by [amd] asc"
#jdbc_paging_enabled => "true"
#jdbc_page_size => "50000"
#statement_filepath => "query.sql"
}
}
filter {

}
output {
elasticsearch {
user => ""
password => ""
#ssl => true
#ssl_certificate_verification => true
truststore => ""
truststore_password => ""
hosts => [""]
index => "h1"
document_type => ""
document_id =>"%{cd}"
#protocol => "http"
}

}


(João Duarte) #4

when you say different count than expected, is it more or less?


(saisn) #5

it's more.


(João Duarte) #6

does each of your individual configuration files, have the same structure like the one below?

input {
  jdbc {
    # ...
  }
}
filter {
}
output {
  elasticsearch {
     # ...
  }
}

If so, when you execute bin/logstash -f *.conf, all your N files will be merged into one, which means that you now have N jdbc blocks, but also N elasticsearch blocks, which means that for each event one of the jdbc blocks produces, you're sending it N times instead of one.

you need 1 file with:

filter {

}
output {
  elasticsearch {
    user => ""
    password => ""
    truststore => ""
    truststore_password => ""
    hosts => [""]
    index => "h1"
    document_type => ""
    document_id =>"%{cd}"
    #protocol => "http"
  }
}

and then N files, each with just the jdbc:

input {
  jdbc {
  }
}

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.