Logstash pipline optimization

abhisheksimion · June 10, 2020, 12:44pm

To answer your question, I am using an alias to query this data. I want to avoid duplicates that is why I have to use document_id

Little info on the data part. This data is of audit log for one of our products, that is why it will be huge. I am thinking to partition the retrieval process by splitting into multiple queries, as given below.

The production system will have a dedicated system for Logstash with 32GB ram, 8 Core CPU and a minimum of 300GB HDD, but is this kind of load still possible with 2 CPU cores, 8GB ram, 70GB HDD?

What is the number that can be said as acceptable number of smaller indexes?

Also, I made further changes after looking here and there, will this increase my chances of getting data faster into Elasticsearch without overwhelming Elasticsearch and crashing either of them?

	input {
	jdbc {
		jdbc_connection_string => "jdbc:oracle:thin:@<host:port>/sname"
		jdbc_user => "xx"
		jdbc_password => "xxxx"
		jdbc_validate_connection => true
		jdbc_driver_library => "/home/app_config/logstash-6.2.4/jdbc_drivers/OJDBC-Full/ojdbc7.jar"
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
		schedule => "53 11 * * *"
		statement_filepath =>"/home/app_config/logstash-6.2.4/bin/queries/user_log_pod1.sql"
		jdbc_paging_enabled => true
		jdbc_page_size => 1000
		jdbc_fetch_size => 1000
		clean_run => false
	}
	jdbc {
		jdbc_connection_string => "jdbc:oracle:thin:@<host:port>/sname"
		jdbc_user => "xx"
		jdbc_password => "xxxx"
		jdbc_validate_connection => true
		jdbc_driver_library => "/home/app_config/logstash-6.2.4/jdbc_drivers/OJDBC-Full/ojdbc7.jar"
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
		schedule => "53 11 * * *"
		statement_filepath =>"/home/app_config/logstash-6.2.4/bin/queries/user_log_pod2.sql"
		jdbc_paging_enabled => true
		jdbc_page_size => 1000
		jdbc_fetch_size => 1000
		clean_run => false
	}
}
filter {
	 ruby {
		code => "event.set('updatetime_str', event.get('updatetime').time.localtime.strftime('%Y_%m_%d'))"
	}
}
output {
	elasticsearch {
	hosts => "<host:port>"
	index => "user_log_index_%{updatetime_str}"
	document_id => "%{user_log_key}" 
	document_type => "org_user_log"
	}
}

Topic		Replies	Views
Logstash using multiple configuration Logstash	4	30	March 18, 2025
Help needed for logstash pipeline.yml settings Logstash	2	259	June 17, 2020
Logstash multi pipeline optimal configuration Logstash	1	232	February 9, 2021
Logstash 6.2.2 with JDBC data on not updated Logstash	6	921	May 4, 2018
Recommended performance settings Elasticsearch	2	713	July 1, 2017

Logstash pipline optimization

Related topics