Logstash Jdbc input not updating last run


(Karthik S) #1

Hi,

We are pulling data from mysql and pushing to elasticsearch through logstash.

Following is the configuration for jdbc input:
jdbc_driver_library => '/home/software/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar'
jdbc_driver_class => 'com.mysql.jdbc.Driver'
jdbc_connection_string => 'jdbc:mysql://url:ip/xx?tinyInt1isBit=false'
jdbc_user => 'XXXX'
jdbc_password => 'XXXX'
schedule => '*/1 * * * *'
statement => "select something, id from infi_buzz_es_data where id > :sql_last_value order by id"
last_run_metadata_path => '/etc/logstash/.logstash_jdbc_last_run_es'
record_last_run => true
clean_run => false
use_column_value => true
tracking_column => 'id'
tracking_column_type => 'numeric'
jdbc_paging_enabled => 'true'
jdbc_page_size => '100000'

In logs i can see that the sql queries are being executed with any errors, but '/etc/logstash/.logstash_jdbc_last_run_es doesn't get updated with the last run value. Its always executing the query with id > 0.

when i manually created the file and entered a value(say, 100) it executes the query with id >100, but still doesn't get updated.

i had disabled es output and enabled stdout for debugging, there was (correction) output at stdout.

logstash version: 5.6.5

whey doesn't the file get updated?

EDIT
logstash was started using initctl command of amazon linux server.

EDIT
More info:

We were connecting to mariadb with mysql jdbc connector. We have another setup of es, logstash, mariadb where it works fine. In this case, the setup is in aws. We changed the connector to mariadb driver and it started creating the file. But, only if we pass "sessionVariables=sql_mode=ANSI_QUOTES" in jdbc string we are able to execute the sql query otherwise it throws syntax error. Here, if we pass this variable logstash won't create/update the last run file. :frowning:


(Guy Boertje) #2

Assuming permissions are ok...

Looking at the code, the sql_last_value write occurs synchronously after the events are written to the queue.
When the filters/outputs are slow we get backpressure and backpressure == no file write until backpressure ceases.
Now you enable paging and have the jdbc_page_size at 100 000. Meaning one input db call will retrieve 100 000 records per page. If there is say 4 pages then we are talking about 400 000 records.
If you have 8 workers with 125 events per batch == 1000 events inflight.
This means that it will take 400 worker loops before the file is written.
If the out-going events per second is say, 100 events per second, then it will take 4000 seconds before the file is written to.

YMMV.


(Karthik S) #3

@guyboertje thanks for your views.

We had even tried with a page size of 10, still it doesn't create/ update the file. So, can we rule out page size issue?

We do have another similar setup, which has a page size of 1L and does the work.

As per my observation, i believe the file should be created before events arrive at logstash. If that observation is correct, then at-least the file should be created.


(Guy Boertje) #4

Turn off paging and then add your own size to the sql statement.

select something, id from infi_buzz_es_data where id > :sql_last_value order by id limit 100

Check whether the file is updated in this case. Also verify, via the logs, that the sql_last_value gets bigger in leaps of 100.


(Karthik S) #5

@guyboertje thanks!

it looks good now :slight_smile:


(Guy Boertje) #6

Great.

Now that you have confidence that the file is being updated, you can increase the size of the limit clause to get more records for each network call.

Ideally you should balance the schedule run frequency with the processing speed of the records fetched.

Your schedule (above) is every minute. Say your Logstash performance is 10k events per second, then you could theoretically fetch 600 000 (10 000 X 60) records in each run. However, taking network, DB, LS processing and ES indexing latency variations into account - you scale that number down to a half or one third of the theoretical amount.
Another example, if LS runs at say 2000 events per second then the theoretical max is 2000 X 60 or 120 000 and a scaled down LIMIT would be 60 000 or 40 000 records.

Measure your throughput using the LS metrics API.

Events per second = "[events][out]" divided by "[events][duration_in_millis]" multiplied by 1000.


(Karthik S) #7

@guyboertje thanks for the explanation, i shall try this out. As of now, i am stuck with another issue with logstash which is blocking progress in this.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.