Hi everyone,
I tried to move my data from a single postgreSQL table to Elasticsearch.
I have like 6,6m rows and I get the same amount of documents created within my index.
The problem now is that logstash creates like 4 or 5 duplicates for the first rows and ignores the last rows completly.
My simple-out.conf for Logstash 5.5 is:
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://xxxx.int:5438/D9SBX"
# The user we wish to execute our statement as
jdbc_user => "xxx"
jdbc_password => "xxx"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/mlu/JDBC/postgresql-42.1.3.jar"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
jdbc_paging_enabled => true
# our query
statement => "SELECT * from etl_swlch.etl_export_person"
}
}
output {
elasticsearch {
index => "suvch"
document_type => "persons"
}
}
My created index is configured for 2 nodes as:
{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}
Every document represents a single person which has a personal ID that goes from 20000000260 to 29018953116.
So when f.e. I'm searching the id 20000000260, I'll end up getting 5 duplicated documents as a result but when I'm searching for 29018953116, I won't find a single person with that ID.
Can someone explain me please what went wrong here?