Logstash JDBC unable to run multiple statements and ingest data from different tables

Hi,

I'm running Logstash version 8.10.2 in a Docker container to ingest data from a MySQL DB into Elastic.

I don't know why the first statement is executed but the second one does not execute.

Below is my logstash config:

input {

  jdbc {
          clean_run => true
          jdbc_driver_library => "/usr/share/logstash/config/mysql-connector-j-8.1.0.jar"
          jdbc_driver_class => "com.mysql.jdbc.Driver"
          jdbc_connection_string => "jdbc:mysql://10.10.10.1:3306/myDB"
          jdbc_user => "username"
          jdbc_password => "password"
          schedule => "* * * * *"
          statement => "select * from contacts where icn > :sql_last_value"
          use_column_value => true
          tracking_column => "icn"
          type => "contacts"
          last_run_metadata_path => "/usr/share/logstash/data/import-contacts.yml"
      }
      
  jdbc {
          clean_run => true
          jdbc_driver_library => "/usr/share/logstash/config/mysql-connector-j-8.1.0.jar"
          jdbc_driver_class => "com.mysql.jdbc.Driver"
          jdbc_connection_string => "jdbc:mysql://10.10.10.1:3306/myDB"
          jdbc_user => "username"
          jdbc_password => "password"
          schedule => "* * * * *"
          statement => "select * from jobs where jobid > :sql_last_value"
          use_column_value => true
          tracking_column => "jobid"
          type => "jobs"
          last_run_metadata_path => "/usr/share/logstash/data/import-jobs.yml"
      }

filter {
    
}

output {

  stdout { 
    id => "all_output"
    codec => rubydebug 
  }
  

  elasticsearch {
    hosts => ["https://10.1.10.26:9200"]
    index => "z4_db"
    user => "elastic"
    password => "changeme"
    ssl_verification_mode => "none"
  }

}

Both import-contacts.yml and import-jobs.yml get created in the container, however, only the import-contacts.yml file has the last value in it.

Also, when I'm restarting the container it is re-running the first statement again thus causing duplicates inside of Elastic. For some reason, the import-contacts.yml file is not persisting between container restarts.

Are you binding mounting the path /usr/share/logstash/data/ to a path in your host?

I believe I am mounting the path in my docker-compose.yml file:

version: '3.8'
services:
  logstash-mysql:
    image: logstash:8.10.2
    user: 1000:1000
    volumes:
      - "./logstash.yml:/usr/share/logstash/config/logstash.yml"
      - "./logstash.conf:/usr/share/logstash/pipeline/logstash.conf"
      - "./mysql-connector-j-8.1.0.jar:/usr/share/logstash/config/mysql-connector-j-8.1.0.jar"
      # - "./import-contacts:/usr/share/logstash/data/import-contacts.yml"
      # - "./import-jobs:/usr/share/logstash/data/import-jobs.yml"
      # - "./import_skills:/usr/share/logstash/config/import_skills"
      - mysqldata:/usr/share/logstash/data
volumes:
  mysqldata:
    driver: local

I also tired mounting a local file import-contacts to /usr/share/logstash/data/import-contacts.yml but had no luck.

When I bind mount the import-contact file from my local host to /usr/share/logstash/data/import-contacts.yml inside the container, I get a permission denied error:

exception=>#<Errno::EACCES: Permission denied - /usr/share/logstash/data/import-contacts.yml>

I'm not sure if I'm running into a bug or not. I have now changed my docker-compose.yml to this:

version: '3.8'
services:
  logstash-mysql:
    image: logstash:8.10.2
    volumes:
      - "./logstash.yml:/usr/share/logstash/config/logstash.yml"
      - "./conf.d:/usr/share/logstash/pipeline/"
      - "./mysql-connector-j-8.1.0.jar:/usr/share/logstash/config/mysql-connector-j-8.1.0.jar"
      - mysqldata:/usr/share/logstash/data
volumes:
  mysqldata:
    driver: local

I have split my original logstash.conf file into two separate conf files (logstash1.conf and logstash2.conf) and stored them locally inside the conf.d directory.

I put one JDBC input from above into each logstash.conf file. So logstash1.conf is going to query the contacts table and logstash2.conf is going to query the job table.

I then just mount the conf.d directory to the /usr/share/logstash/pipeline/

I'm loading both logstash.conf file from my logstash.yml file like this:

config.reload.automatic: true
config.reload.interval: 5s
# log.level: debug

xpack.management.pipeline.id:
  - logstash*

I was initially seeing an error message indicating I was using a duplicate id in my stdout output plugin in my logstash2.conf file. So I changed the id to "all_output_logstash2_conf" which made that error go away.

Even though I have two separate logstash.conf files running, for some reason it thinks I'm running one logstash.conf file with 2 stdout output plugins using the same id value.

Currently, both logstash.conf files are running, but only the JDBC input from logstash1.conf is executing. The JDBC input from logstash2.conf is not executing.

To confirm both logstash.conf files are running I changed the index value from logstash2.conf to "z4_db_logstash2_conf" and I see indices from both logstash.conf files writing to ES. The data from the logstash1.conf is being stored in the index from logstash2.conf.

There is a couple of issues here.

First, separating your config in multiple files will not make logstash run them as different pipelines if you do not use pipelines.yml, which you seem to not be using as you aren't binding mounting the pipelines.yml, check this question for an example.

You would need to have a file like this:

- pipeline.id: pipeline-1
  path.config: "/usr/share/logstash/pipeline/logstash1.conf"
- pipeline.id: pipeline-2
  path.config: "/usr/share/logstash/pipeline/logstash2.conf"

And bind mount this file as /usr/share/logstash/config/pipelines.yml

If you do not do that, logstash will merge both files and run as a single pipeline with the name main, it would be the same thing as having just a single configuration.

I suggest that you create this file and mount it as pipelines.yml to make logstash run your configuration as two different pipelines.

The id field on a filter is optional and mostly used to troubleshoot some performance issues, you may remove it if you want.

You should also remove this from your logstash.yml, this is used when you have Centralized Pipeline Management configured and configure all your pipelines with Kibana, this is a paid feature that will only work if you have at least a platinum license.

As mentioned before, you are not running two independent pipelines, you are running just one pipeline which is a merge of your two configurations, so this is expected as you do not have any conditionals in your output.

I see no errors in your jdbc configuration, so the only things that I can think for it to not work are:

  • There is no data being returned for your query.
  • There is data being returned but for some reason it has some conflict with the data returned by the first jdbc query, which would make elasticsearch reject the document, but this would generate a log and you didn't share anything about it.

Can you share a sample of the data returned by both of those queries?

The fact that the second jdbc filter is not creating the last run metadata path suggests that it is not returning any data.

Also, can you share your logstash logs when you start your docker-compose to see if it is generating any WARN/ERROR lines?

This was it. For some reason, Logstash is not receiving any data when this query is run "select * from jobs where jobid > :sql_last_value" inside of the JDBC plugin.
However, if I log into the db and run the query select * from jobs I receive 59 rows back.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.