Hello,
I have stumbled upon weird performance inconsistency with logstash jdbc plugin. I have defined sql query which is supposed to pull some columns from database and inject them into elasticsearch. If i define query to return columns, like:
SELECT col1, col2, col3, col4, ..., col20 FROM table
The performance with indexing is reaching 120 docs/s
If i'll change the SELECT to create json in the query as follows:
SELECT '{ "col1": ' || col1 || '",' || ', "col2": ' || col2 || ... || ', "col20": ' || col20 || '"}' as attr_list FROM table
and then parse it with logstash json filter it i have a performance of indexing 10000 docs/s.
Every time i use the same pipeline settings:
pipeline.workers: 3
pipeline.batch.size: 10000
Logstash output workers number in logstash.yml is set to 10 and there is almost no work done by logstash apart from being glue between database and elasticsearch:
input {
# download newly checked-in labels incrementally using create_date column for reference
jdbc {
jdbc_driver_library => "/etc/logstash/toolbox/connectors/ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:@XXX:1521/mms"
jdbc_user => "XZY"
jdbc_password => "ABC"
jdbc_paging_enabled => "true"
jdbc_page_size => "100000"
statement_filepath => "/etc/logstash/toolbox/mms_metrics/query.sql"
last_run_metadata_path => "/usr/share/logstash/dev/metrics_sql_last_run"
schedule => "*/10 * * * *"
}
}
filter {
json {
source => "attr_list"
remove_field => ["attr_list"]
}
mutate {
# move selected fields to metadata
add_field => { "[@metadata][document.key]" => "%{job.key}" }
rename => {
"job_error" => "job.error"
"job_detailed_error" => "job.detailederror"
}
# remove fields
remove_field => ["job.key"]
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "metric"
document_type => "default"
document_id => "%{[@metadata][document.key]}"
action => "update"
doc_as_upsert => true
user => "ZZZ"
password => "FFF"
retry_on_conflict => 2
}
}
The number of records for initial query is ~80k.
Do You have some hints about why there is such big difference in performance (10000 vs 100 docs/s) ?