Logstash Jdbc input not updating last run

karthik_s1 · January 11, 2018, 2:26pm

Hi,

We are pulling data from mysql and pushing to elasticsearch through logstash.

Following is the configuration for jdbc input:
jdbc_driver_library => '/home/software/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar'
jdbc_driver_class => 'com.mysql.jdbc.Driver'
jdbc_connection_string => 'jdbc:mysql://url:ip/xx?tinyInt1isBit=false'
jdbc_user => 'XXXX'
jdbc_password => 'XXXX'
schedule => '*/1 * * * *'
statement => "select something, id from infi_buzz_es_data where id > :sql_last_value order by id"
last_run_metadata_path => '/etc/logstash/.logstash_jdbc_last_run_es'
record_last_run => true
clean_run => false
use_column_value => true
tracking_column => 'id'
tracking_column_type => 'numeric'
jdbc_paging_enabled => 'true'
jdbc_page_size => '100000'

In logs i can see that the sql queries are being executed with any errors, but '/etc/logstash/.logstash_jdbc_last_run_es doesn't get updated with the last run value. Its always executing the query with id > 0.

when i manually created the file and entered a value(say, 100) it executes the query with id >100, but still doesn't get updated.

i had disabled es output and enabled stdout for debugging, there was (correction) output at stdout.

logstash version: 5.6.5

whey doesn't the file get updated?

EDIT
logstash was started using initctl command of amazon linux server.

EDIT
More info:

We were connecting to mariadb with mysql jdbc connector. We have another setup of es, logstash, mariadb where it works fine. In this case, the setup is in aws. We changed the connector to mariadb driver and it started creating the file. But, only if we pass "sessionVariables=sql_mode=ANSI_QUOTES" in jdbc string we are able to execute the sql query otherwise it throws syntax error. Here, if we pass this variable logstash won't create/update the last run file.

guyboertje · January 12, 2018, 4:29pm

Assuming permissions are ok...

Looking at the code, the sql_last_value write occurs synchronously after the events are written to the queue.
When the filters/outputs are slow we get backpressure and backpressure == no file write until backpressure ceases.
Now you enable paging and have the jdbc_page_size at 100 000. Meaning one input db call will retrieve 100 000 records per page. If there is say 4 pages then we are talking about 400 000 records.
If you have 8 workers with 125 events per batch == 1000 events inflight.
This means that it will take 400 worker loops before the file is written.
If the out-going events per second is say, 100 events per second, then it will take 4000 seconds before the file is written to.

YMMV.

karthik_s1 · January 15, 2018, 3:33am

@guyboertje thanks for your views.

We had even tried with a page size of 10, still it doesn't create/ update the file. So, can we rule out page size issue?

We do have another similar setup, which has a page size of 1L and does the work.

As per my observation, i believe the file should be created before events arrive at logstash. If that observation is correct, then at-least the file should be created.

guyboertje · January 15, 2018, 10:00am

Turn off paging and then add your own size to the sql statement.

select something, id from infi_buzz_es_data where id > :sql_last_value order by id limit 100

Check whether the file is updated in this case. Also verify, via the logs, that the sql_last_value gets bigger in leaps of 100.

karthik_s1 · January 15, 2018, 2:06pm

@guyboertje thanks!

it looks good now

guyboertje · January 15, 2018, 3:31pm

Great.

Now that you have confidence that the file is being updated, you can increase the size of the limit clause to get more records for each network call.

Ideally you should balance the schedule run frequency with the processing speed of the records fetched.

Your schedule (above) is every minute. Say your Logstash performance is 10k events per second, then you could theoretically fetch 600 000 (10 000 X 60) records in each run. However, taking network, DB, LS processing and ES indexing latency variations into account - you scale that number down to a half or one third of the theoretical amount.
Another example, if LS runs at say 2000 events per second then the theoretical max is 2000 X 60 or 120 000 and a scaled down LIMIT would be 60 000 or 40 000 records.

Measure your throughput using the LS metrics API.

Events per second = "[events][out]" divided by "[events][duration_in_millis]" multiplied by 1000.

karthik_s1 · January 16, 2018, 6:22am

@guyboertje thanks for the explanation, i shall try this out. As of now, i am stuck with another issue with logstash which is blocking progress in this.

system · February 13, 2018, 6:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash input-jdbc work logic and last run sql_last_value Logstash	1	909	November 13, 2018
Logstash JDBC_last _ not updating Logstash	3	307	November 3, 2021
JDBC does not create .logstash_jdbc_last_run Logstash	1	2449	July 6, 2017
How :sql_last_value works with jdbc_page_size? Logstash	1	804	May 6, 2019
How should I use sql_last_value in logstash? Logstash	11	27204	July 6, 2017

Logstash Jdbc input not updating last run

Related topics