I have a postgresql table "relou" having more than 2.8m records:
> select count(*) from relou;
> count
> ---------
> 2853566
I'm indexing this table to elasticsearch using logstash:
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://***"
jdbc_user => "***"
jdbc_password => "***"
jdbc_driver_library => "/path/postgresql-42.1.4.jar"
jdbc_driver_class => "org.postgresql.Driver"
statement => "select * from relou"
jdbc_paging_enabled => "true"
jdbc_fetch_size => "100000"
}
}
filter {
mutate {
add_field => [ "[location][lat]", "%{lat}" ]
add_field => [ "[location][lon]", "%{lon}" ]
}
mutate {
convert => [ "[location][lat]", "float" ]
convert => [ "[location][lon]", "float" ]
}
mutate { remove_field => [ "lat", "lon" ] }
}
output {
elasticsearch {
hosts => ["***"]
index => "test"
document_type => "test"
document_id => "%{type}%{id_catalog_shop_sku}"
user => logstash
password => ***
}
}
I'm executing the corresponding conf file, everything goes smoothly (no errors) and when I check the log file I can see that all the 2.8m rows from "relou" table have been recorded:
[2018-04-14T19:45:26,326][INFO ][logstash.inputs.jdbc ] (3.085797s) SELECT * FROM (select * from relou) AS "t1" LIMIT 100000 OFFSET 2800000
[2018-04-14T19:46:20,974][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"relou", :thread=>"#<Thread:0x17a53624@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 run>"}
[2018-04-14T19:46:21,329][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>".monitoring-logstash", :thread=>"#<Thread:0x428b6a84@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 run>"}
In kibana under monitoring, I checked the "relou" pipeline and it says:
Events Received: 2.8m events
Events Emitted: 2.8m events
But when I'm doing a count on my test elasticsearch index I only have 1.8m records:
GET /test/_count
{
"count": 1805997,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
}
}