I am trying to get data from a postgres database using JDBC. I managed to get data to stdout by using only 3 fields, but it fails because of the large number of fields in the query.
here is the output from running the query with many fields:
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-11-08 02:07:06.227 [main] runner - Starting Logstash {"logstash.version"=>"7.9.3", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.272-b10 on 1.8.0_272-8u272-b10-0ubuntu1~20.04-b10 +indy +jit [linux-x86_64]"}
[WARN ] 2020-11-08 02:07:06.952 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-11-08 02:07:10.906 [Converge PipelineAction::Create] Reflections - Reflections took 87 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-11-08 02:07:12.070 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/etc/logstash/conf.d/payments_transferFaliure.conf"], :thread=>"#<Thread:0x48330599 run>"}
[INFO ] 2020-11-08 02:07:14.041 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>1.96}
[INFO ] 2020-11-08 02:07:14.503 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2020-11-08 02:07:14.612 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}
[INFO ] 2020-11-08 02:07:15.334 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated
It will not stop. the prompt just blinking for hours.
Here is the statment that works "only two fields":
statement => "SELECT id as id, created as created FROM payments_transfer"
And this is the query with many fields that is not working "same settings as the above query!" :
statement => "SELECT id as id, type as type, dwolla_id as dwolla_id, sender_id as sender_id, receiver_id as receiver_id, status as status, amount as amount, fees as fees, created as created, metadata as metadata, funding_source_from_id as funding_source_from_id, funding_source_to_id as funding_source_to_id, is_parent as is_parent, destination_type as destination_type, source_type as source_type FROM payments_transfer"
Please make sue your format your code and error as follow and share your config : /etc/logstash/conf.d/payments_transferFaliure.conf
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-11-08 02:07:06.227 [main] runner - Starting Logstash {"logstash.version"=>"7.9.3", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.272-b10 on 1.8.0_272-8u272-b10-0ubuntu1~20.04-b10 +indy +jit [linux-x86_64]"}
[WARN ] 2020-11-08 02:07:06.952 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-11-08 02:07:10.906 [Converge PipelineAction::Create] Reflections - Reflections took 87 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-11-08 02:07:12.070 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/etc/logstash/conf.d/payments_transferFaliure.conf"], :thread=>"#<Thread:0x48330599 run>"}
[INFO ] 2020-11-08 02:07:14.041 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>1.96}
[INFO ] 2020-11-08 02:07:14.503 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2020-11-08 02:07:14.612 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}
[INFO ] 2020-11-08 02:07:15.334 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated
SELECT id AS id,
type AS type,
dwolla_id AS dwolla_id,
sender_id AS sender_id,
receiver_id AS receiver_id,
status AS status,
amount AS amount,
fees AS fees,
created AS created,
metadata AS metadata,
funding_source_from_id AS funding_source_from_id,
funding_source_to_id AS funding_source_to_id,
is_parent AS is_parent,
destination_type AS destination_type,
source_type AS source_type
FROM payments_transfer
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://ldatabase_rds_path?useSSL=true"
jdbc_user => "username"
jdbc_password => "password"
jdbc_driver_library => "/home/z/Documents/postgresql-42.2.18.jar"
jdbc_driver_class => "org.postgresql.Driver"
tracking_column => "id"
tracking_column_type => "numeric"
clean_run => true
schedule => "0 */1 * * *"
jdbc_page_size => 50000
last_run_metadata_path => "/path/to/last_run_value.txt"
statement => "SELECT id as id, type as type, z_id as z_id, sender_id as sender_id, receiver_id as receiver_id, status as status, amount as amount, fees as fees, created as created, metadata as metadata, funding_source_from_id as funding_source_from_id, funding_source_to_id as funding_source_to_id, is_parent as is_parent, destination_type as destination_type, source_type as source_type FROM payments_transfer WHERE id > :sql_last_value"
}
}
The jdbc_page_size is to limit the result in each time you run from schedule, you can modify as you like following the capacity of your machine last_run_meta_data_path is the file to store the last value of the id in each query WHERE id > :sql_last_value will read the value from last_run_meta_data value, if you run in first time, it will be 0, but it's limit by pagination.
I suggest to limit the result of the query because if you run in scheduled, perhaps the query not yet finished and logstash will re-query the same value and makes your machine slower.
Also I suggest, run the logstash in more frequent and get smaller amount of the query result, becsuse it will hurt your machine if you run once with large mount of result in one go.
I tried your solution by adding the jdbc_page_size and last_run_meta_data_path and I also reduced the timer to 0 */12 * * * # Every twelve hours at minute 0 of the hour. Still not working, It just takes hours of running without any use:
here is my console which stays as that for hours:
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-11-11 16:28:22.850 [main] runner - Starting Logstash {"logstash.version"=>"7.9.3", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.272-b10 on 1.8.0_272-8u272-b10-0ubuntu1~20.04-b10 +indy +jit [linux-x86_64]"}
[WARN ] 2020-11-11 16:28:24.130 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-11-11 16:28:28.736 [Converge PipelineAction::Create<main>] Reflections - Reflections took 99 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-11-11 16:28:31.128 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[WARN ] 2020-11-11 16:28:31.591 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"}
[INFO ] 2020-11-11 16:28:31.980 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7}
[WARN ] 2020-11-11 16:28:31.985 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[INFO ] 2020-11-11 16:28:32.195 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[INFO ] 2020-11-11 16:28:32.576 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/etc/logstash/conf.d/payments_transferFaliure_with_large_capacity.conf"], :thread=>"#<Thread:0x4fcc21a9 run>"}
[INFO ] 2020-11-11 16:28:34.619 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>2.04}
[INFO ] 2020-11-11 16:28:34.909 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2020-11-11 16:28:35.119 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2020-11-11 16:28:35.882 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated
I run it using the command line: sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/payments_transferFaliure_with_large_capacity.conf --config.reload.automatic
If more than shell, I specify a directory for the data.
here is my pipeline.yml config:
- pipeline.id: main
path.config: "/etc/logstash/conf.d/*.conf"
I tried with a command line that specifies both the data and pipeline paths.: --path.data /usr/share/logstash/data/[directory] --path.settings /etc/logstash, and still not working
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://ldatabase_rds_path?useSSL=true"
jdbc_user => "username"
jdbc_password => "password"
jdbc_driver_library => "/home/z/Documents/postgresql-42.2.18.jar"
jdbc_driver_class => "org.postgresql.Driver"
tracking_column => "id"
tracking_column_type => "numeric"
clean_run => true
schedule => "* * * * *"
jdbc_page_size => 50000
last_run_metadata_path => "/path/to/last_run_value.txt"
statement => "SELECT id as id, type as type, z_id as z_id, sender_id as sender_id, receiver_id as receiver_id, status as status, amount as amount, fees as fees, created as created, metadata as metadata, funding_source_from_id as funding_source_from_id, funding_source_to_id as funding_source_to_id, is_parent as is_parent, destination_type as destination_type, source_type as source_type FROM payments_transfer WHERE id > :sql_last_value"
}
}
filter {
}
output {
stdout { codec => rubydebug }
}
Then paste the error message from the stdout in here, then we can analize the detail error.
Thanks. I added the debug output line, but the cursor also keep blinking for hours without any output:
> Using bundled JDK: /usr/share/logstash/jdk
> OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.jruby.ext.openssl.SecurityHelper (file:/tmp/jruby-8752/jruby3327211400598248584jopenssl.jar) to field java.security.MessageDigest.provider
> WARNING: Please consider reporting this to the maintainers of org.jruby.ext.openssl.SecurityHelper
> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
> Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
> [INFO ] 2020-11-21 23:28:24.783 [main] runner - Starting Logstash {"logstash.version"=>"7.10.0", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 11.0.8+10 on 11.0.8+10 +indy +jit [linux-x86_64]"}
> [WARN ] 2020-11-21 23:28:25.113 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
> [INFO ] 2020-11-21 23:28:26.966 [Converge PipelineAction::Create<main>] Reflections - Reflections took 43 ms to scan 1 urls, producing 23 keys and 47 values
> [INFO ] 2020-11-21 23:28:27.535 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/etc/logstash/conf.d/payment_transfer.conf"], :thread=>"#<Thread:0x23ef9f58 run>"}
> [INFO ] 2020-11-21 23:28:28.239 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.7}
> [INFO ] 2020-11-21 23:28:28.388 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
> [INFO ] 2020-11-21 23:28:28.449 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
> [INFO ] 2020-11-21 23:28:28.693 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9601}
> /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated
I also found that it works in seconds after commenting the scheduler line which means there is something wrong with the scheduler, but I can't figure it out!.
If you schedule an input to run on the hours 0 and 12 I believe it will wait for one of them before the first time it runs it. Similarly if you configure it to run every 12 hours then it will wait 12 hours before running.
If you want to unconditionally run at startup then use a second input configured without a schedule option. But note that if you start logstash at 11:50 then that means the query might run twice in ten minutes. That may or may not be a problem for your use case.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.