I'm sending data from a Microsoft SQL Server database into elasticsearch using logstash 8.x. It functions well. However, logstash multiplies indexes with no documents and raises the shared. It takes over an hour to reach the 3000 total shards.
To retrieve the data from the SQL Server database, I am using a view. The database field will be used to generate the indexes name.
My Input configuration
input {
jdbc {
jdbc_driver_library => "${jdbc_path}"
jdbc_driver_class => "${jdbc_driver_class}"
jdbc_connection_string => "${jdbc_connection_string}"
jdbc_user => "${jdbc_user}"
jdbc_password => "${jdbc_pwd}"
statement_filepath => "/opt/logstash/logstash-jdbc-sqls/view_call/transaction_customers.sql"
tracking_column => "unix_ts_in_secs"
use_column_value => true
tracking_column_type => numeric
schedule => "*/2 * * * *"
last_run_metadata_path => "/opt/logstash/metadata/customers.txt"
jdbc_page_size => 5000
}
}
My output configuration is below -
The database field "index_day" (string field) have the format MM-YYYY .
output {
elasticsearch {
hosts => [ "${es_node1}", "${es_node2}" , "${es_node3}" ]
index => "customer-%{index_day}"
ssl_certificate_authorities => "/etc/logstash/certs/ca.crt"
ssl_enabled => true
api_key => "${api_key}"
document_id => "%{document_id}"
data_stream => false
doc_as_upsert => true
template_name => "customer-template"
}
}
As per the documentation the cluster settings action.auto_create_index set default to true. i have made some changes to allow only to required formats. It works. but i am not sure how to avoid this zero document indexes with default settings. Logstash logs i have verified i did not get any clue in the debug mode.
I am running 8 different pipeline on the same machine to pull data from different tables at different scheduled times. My total data is about ~50k transactions per day.
Is there a way i can figure out what cause the issues.?