To answer your question, I am using an alias to query this data. I want to avoid duplicates that is why I have to use document_id
Little info on the data part. This data is of audit log for one of our products, that is why it will be huge. I am thinking to partition the retrieval process by splitting into multiple queries, as given below.
The production system will have a dedicated system for Logstash with 32GB ram, 8 Core CPU and a minimum of 300GB HDD, but is this kind of load still possible with 2 CPU cores, 8GB ram, 70GB HDD?
What is the number that can be said as acceptable number of smaller indexes?
Also, I made further changes after looking here and there, will this increase my chances of getting data faster into Elasticsearch without overwhelming Elasticsearch and crashing either of them?
input {
jdbc {
jdbc_connection_string => "jdbc:oracle:thin:@<host:port>/sname"
jdbc_user => "xx"
jdbc_password => "xxxx"
jdbc_validate_connection => true
jdbc_driver_library => "/home/app_config/logstash-6.2.4/jdbc_drivers/OJDBC-Full/ojdbc7.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
schedule => "53 11 * * *"
statement_filepath =>"/home/app_config/logstash-6.2.4/bin/queries/user_log_pod1.sql"
jdbc_paging_enabled => true
jdbc_page_size => 1000
jdbc_fetch_size => 1000
clean_run => false
}
jdbc {
jdbc_connection_string => "jdbc:oracle:thin:@<host:port>/sname"
jdbc_user => "xx"
jdbc_password => "xxxx"
jdbc_validate_connection => true
jdbc_driver_library => "/home/app_config/logstash-6.2.4/jdbc_drivers/OJDBC-Full/ojdbc7.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
schedule => "53 11 * * *"
statement_filepath =>"/home/app_config/logstash-6.2.4/bin/queries/user_log_pod2.sql"
jdbc_paging_enabled => true
jdbc_page_size => 1000
jdbc_fetch_size => 1000
clean_run => false
}
}
filter {
ruby {
code => "event.set('updatetime_str', event.get('updatetime').time.localtime.strftime('%Y_%m_%d'))"
}
}
output {
elasticsearch {
hosts => "<host:port>"
index => "user_log_index_%{updatetime_str}"
document_id => "%{user_log_key}"
document_type => "org_user_log"
}
}