Inconsistent performance with jdbc query

Hello,

I have stumbled upon weird performance inconsistency with logstash jdbc plugin. I have defined sql query which is supposed to pull some columns from database and inject them into elasticsearch. If i define query to return columns, like:

SELECT col1, col2, col3, col4, ..., col20 FROM table

The performance with indexing is reaching 120 docs/s

If i'll change the SELECT to create json in the query as follows:

SELECT '{ "col1": ' || col1 || '",' || ', "col2": ' || col2 || ... || ', "col20": ' || col20 || '"}' as attr_list FROM table

and then parse it with logstash json filter it i have a performance of indexing 10000 docs/s.

Every time i use the same pipeline settings:

pipeline.workers: 3
pipeline.batch.size: 10000

Logstash output workers number in logstash.yml is set to 10 and there is almost no work done by logstash apart from being glue between database and elasticsearch:

input {
    # download newly checked-in labels incrementally using create_date column for reference
    jdbc {
    jdbc_driver_library => "/etc/logstash/toolbox/connectors/ojdbc6.jar"
    jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
    jdbc_connection_string => "jdbc:oracle:thin:@XXX:1521/mms"
    jdbc_user => "XZY"
    jdbc_password => "ABC"
    jdbc_paging_enabled => "true"
    jdbc_page_size => "100000"
    statement_filepath => "/etc/logstash/toolbox/mms_metrics/query.sql"
    last_run_metadata_path => "/usr/share/logstash/dev/metrics_sql_last_run"
    schedule => "*/10 * * * *"
    }
}

filter {
json { 
    source => "attr_list"
    remove_field => ["attr_list"]
}

mutate {
    # move selected fields to metadata
    add_field => { "[@metadata][document.key]" => "%{job.key}" }
	rename => {
        "job_error" => "job.error"
        "job_detailed_error"  => "job.detailederror"
              }


    # remove fields
    remove_field => ["job.key"]
}
}

output {
elasticsearch {
    hosts => ["127.0.0.1:9200"]
    index => "metric"
    document_type => "default"
    document_id => "%{[@metadata][document.key]}"
    action => "update"
    doc_as_upsert => true
    user => "ZZZ"
    password => "FFF"
    retry_on_conflict => 2
}
}

The number of records for initial query is ~80k.

Do You have some hints about why there is such big difference in performance (10000 vs 100 docs/s) ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.