Logstash create many zero document indexes

Amzath_Khan · October 22, 2023, 9:10am

I'm sending data from a Microsoft SQL Server database into elasticsearch using logstash 8.x. It functions well. However, logstash multiplies indexes with no documents and raises the shared. It takes over an hour to reach the 3000 total shards.
To retrieve the data from the SQL Server database, I am using a view. The database field will be used to generate the indexes name.

My Input configuration

input {
  jdbc {
    jdbc_driver_library => "${jdbc_path}"
    jdbc_driver_class => "${jdbc_driver_class}"
    jdbc_connection_string => "${jdbc_connection_string}"
    jdbc_user => "${jdbc_user}"
    jdbc_password => "${jdbc_pwd}"
    statement_filepath =>  "/opt/logstash/logstash-jdbc-sqls/view_call/transaction_customers.sql"
    tracking_column => "unix_ts_in_secs"
    use_column_value => true
    tracking_column_type => numeric
    schedule => "*/2 * * * *"
   last_run_metadata_path => "/opt/logstash/metadata/customers.txt"
    jdbc_page_size => 5000

  }
}

My output configuration is below -
The database field "index_day" (string field) have the format MM-YYYY .

output {
  elasticsearch {
    hosts => [ "${es_node1}", "${es_node2}" , "${es_node3}" ]
    index => "customer-%{index_day}"
    ssl_certificate_authorities => "/etc/logstash/certs/ca.crt"
    ssl_enabled => true
    api_key => "${api_key}"
    document_id => "%{document_id}"
    data_stream => false
    doc_as_upsert => true
    template_name => "customer-template"
  }
}

As per the documentation the cluster settings action.auto_create_index set default to true. i have made some changes to allow only to required formats. It works. but i am not sure how to avoid this zero document indexes with default settings. Logstash logs i have verified i did not get any clue in the debug mode.

I am running 8 different pipeline on the same machine to pull data from different tables at different scheduled times. My total data is about ~50k transactions per day.

Is there a way i can figure out what cause the issues.?

PodarcisMuralis · October 22, 2023, 11:31am

Can you please share your customer-template also?

leandrojmp · October 22, 2023, 12:46pm

Could be something else that is creating those index?

The index name does not match the format that you said logstash is using.

You said that the index_day is a string with the format MM-YYYY, so you would have something like 10-2023 and logstash would create customer-10-2023.

But the screenshot you shared does not have this format, it seems to be customer-EPOCH_TIME.

Do you haver any customer-10-2023 indices?

Also, as already asked please share your customer-template.

Amzath_Khan · October 29, 2023, 12:28pm

Sorry for late reply. I figured out the issue. The issue on the view side as it was using * to select all the fields from the tables. Sometimes the results (values only) shifted in columns and thus logstash was creating the indexes with different values assigned to the index_day column.
Solved this issue by removing the wild card "*" and use require column name from the table

system · November 26, 2023, 12:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.