JDBC plugin ignores and duplicates entities


#1

Hi everyone,

I tried to move my data from a single postgreSQL table to Elasticsearch.
I have like 6,6m rows and I get the same amount of documents created within my index.
The problem now is that logstash creates like 4 or 5 duplicates for the first rows and ignores the last rows completly.

My simple-out.conf for Logstash 5.5 is:

input {
    jdbc {
	# Postgres jdbc connection string to our database, mydb
        jdbc_connection_string => "jdbc:postgresql://xxxx.int:5438/D9SBX"
        # The user we wish to execute our statement as
        jdbc_user => "xxx"
        jdbc_password => "xxx"
        # The path to our downloaded jdbc driver
        jdbc_driver_library => "/home/mlu/JDBC/postgresql-42.1.3.jar"
        # The name of the driver class for Postgresql
        jdbc_driver_class => "org.postgresql.Driver"
        jdbc_paging_enabled => true
        # our query
        statement => "SELECT * from etl_swlch.etl_export_person"
    }
}
output {
    elasticsearch {
        index => "suvch"
        document_type => "persons"
     }
}

My created index is configured for 2 nodes as:

{
    "settings" : {
        "index" : {
            "number_of_shards" : 2, 
            "number_of_replicas" : 0 
        }
    }
}

Every document represents a single person which has a personal ID that goes from 20000000260 to 29018953116.
So when f.e. I'm searching the id 20000000260, I'll end up getting 5 duplicated documents as a result but when I'm searching for 29018953116, I won't find a single person with that ID.

Can someone explain me please what went wrong here?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.