Beginner question: Skipping the filtering of id into [@metadata][_id]

I followed the following tutorial: https://www.elastic.co/blog/how-to-keep-elasticsearch-synchronized-with-a-relational-database-using-logstash

The configuration file for the pipeline contains the following:

    input {
      jdbc {
        jdbc_driver_library => "<path>/mysql-connector-java-8.0.16.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://<MySQL host>:3306/es_db"
        jdbc_user => <my username>
        jdbc_password => <my password>
        jdbc_paging_enabled => true
        tracking_column => "unix_ts_in_secs"
        use_column_value => true
        tracking_column_type => "numeric"
        schedule => "*/5 * * * * *"
        statement => "SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > :sql_last_value AND modification_time < NOW()) ORDER BY modification_time ASC"
      }
    }
    filter {
      mutate {
        copy => { "id" => "[@metadata][_id]"}
        remove_field => ["id", "@version", "unix_ts_in_secs"]
      }
    }
    output {
      # stdout { codec =>  "rubydebug"}
      elasticsearch {
          index => "rdbms_sync_idx"
          document_id => "%{[@metadata][_id]}"
      }
    }

Specifically for the filter. If I got this right, we want each database entry to have a unique document_id so that the data will be correctly updated.

  • Does passing the data to metadata._id achieves anything? We simply pass this data again to document_id in the output, so why not passing the ID directly?
  • In tables with multiple primary keys, it is not possible to have a single ID table. My solution was to add all columns forming the primary key into document_id (document_id => "%{col1}%{col2}...") Is this the correct approach? The data seems to get updated correctly

Thank you for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.