Logstash is sending garbage value to ES

I'm using Logstash to read data from Oracle DB & ingesting it into a Kafka topic, then reading the data from the topic & ingesting it into ES (basically Oracle -> Kafka, Kafka -> ES). The query used is capturing data for a specific timestamp which fetches 52 rows, everything works as expected till here but after inserting all the rows, Logstash starts inserting below garbage value:

2020-08-27T13:40:49.122Z %{host} 2020-08-27T13:40:48.799Z %{host} 2020-08-27T13:40:48.612Z %{host} 2020-08-27T13:40:48.323Z %{host} 2020-08-27T13:40:48.123Z %{host} 2020-08-27T13:40:47.723Z %{host} 2020-08-27T13:40:47.220Z %{host} %{message}

Configs that I'm using:

Oracle -> Kafka:

    input {
      jdbc {
        jdbc_driver_library => "/usr/share/logstash/ojdbc8.jar"
        jdbc_driver_class => "Java::oracle.jdbc.OracleDriver"
        jdbc_connection_string => "jdbc:oracle:thin:@redacted"
        jdbc_user => "redacted"
        jdbc_password => "redacted"
        tracking_column => "timestamp"
        use_column_value => true
        tracking_column_type => "timestamp"
        jdbc_default_timezone => "Asia/Kolkata"
        schedule => "*/10 * * * *"
        statement_filepath => "redacted"
      }
    }

    output {
      kafka {
       topic_id => "audit_data"
       bootstrap_servers => "redacted"
       acks => "0"
       jaas_path => "/usr/share/logstash/jaas.conf"
       sasl_kerberos_service_name => "kafka"
       kerberos_config => "redacted"
       codec => plain
       security_protocol => "SASL_PLAINTEXT"
      }
    }
Kafka -> ES
    input {
        kafka{
    #    group_id => "logstash"
        jaas_path => "/usr/share/logstash/jaas.conf"
        sasl_kerberos_service_name => "kafka"
        kerberos_config => "redacted"
        auto_offset_reset => "latest"
        topics => ["audit_data"]
        codec => plain
        bootstrap_servers => redacted
        security_protocol => "SASL_PLAINTEXT"
    #    type => "syslog"
        decorate_events => true
        }
    }


    output {
        #stdout { codec =>  "json"}
        elasticsearch {
            hosts => ["redacted"]
            user => "redacted"
            password => "redacted"
            cacert => ["redacted"]
            action => "index"
            index => "kafka_logstash"
        }
    }

I checked Kafka topic data from consumer console & could see the garbage values continuously flowing in, so I removed the old data from topic(set the retention to 1000ms), used the same query & config parameters, this time directly from Oracle to ES & it worked fine without any garbage value. Below is the config I used:

    input {
      jdbc {
        jdbc_driver_library => "/usr/share/logstash/ojdbc8.jar"
        jdbc_driver_class => "Java::oracle.jdbc.OracleDriver"
        jdbc_connection_string => "jdbc:oracle:thin:@redacted"
        jdbc_user => "redacted"
        jdbc_password => "redacted"
        tracking_column => "timestamp"
        use_column_value => true
        tracking_column_type => "timestamp"
        jdbc_default_timezone => "Asia/Kolkata"
        schedule => "*/10 * * * *"
        statement_filepath => "/usr/share/logstash/oracle.sql"
      }
    }

    output {
        #stdout { codec =>  "json"}
        elasticsearch {
            hosts => ["redacted"]
            user => "redacted"
            password => "redacted"
            cacert => ["redacted"]
            action => "index"
            index => "test_kafka_logstash"
        }
    }

Please suggest how we can fix this.
Thanks!

@stephenb @Badger Could you please suggest something..that I might've missed?

Hi @Himanshii

1st is is not really best practice / polite to call on specific people to help with questions @Badger and I (Event though I am an Elastic Team member) are volunteers on this forum and participate in our free time.

2nd I would check to see if the Oracle -> Kafka logstash pipeline is actually writing the bad lines you can put another output in output section, that way you will see if logstash is actually writing those lines or is it something in Kafka.

stdout {codec => rubydebug}

@stephenb Sorry I wasn't sure how this works, thought my question got skipped, noted for future reference :slight_smile: Thanks for your help! Appreciate it!
I tried the below config in output along with Kafka:

     file {
       path => "/tmp/logstash-kafka.txt"
       codec => rubydebug
      }

I had opened consumer console along with this file & could see garbage data flowing in Kafka, & correct data in the logstash-kafka.txt. Same happened when I tried Kafka along with ES, data was getting correctly ingested in ES via Logstash in the absence of Kafka in between.

Hi @Himanshii no worries

I am not sure what is happening... Looks like ingest on the Kafka side (I am not a Kafka expert)

Also I would write the file with the same code => plain to make sure all things are the same.

Also just to set expectations... Not all questions in the forum actually get answered, there are too many... the community does it's best but no guarantee.

If you need commercial support I would suggest perhaps considering a commercial license which comes with support.

@stephenb I checked the file, data seemed consistent if that's what you're suggesting here:

Another thing I observed in Kafka console was that below garbage value was in 52 lines(occured 52 times, which is the number of rows fetched from oracle), & then it stopped sending anymore data/garbage:

2020-08-29T20:57:32.574Z %{host} %{message}

Thanks for the suggestion, we're considering commercial license as our requirement & cluster is growing, but that might take time..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.