Hi everyone,
I'm relative new to the ES/Logstash world, but I'll do my best to describe the problem I'm facing.
We have ES running on a AWS instance (multiple ones I believe in a cluster, it wasn't me setting it up, but I can ask for specifics if it matters) and Logstash also in AWS.
So far we were testing all this by creating the Logstash config files and then running them with logstash -f . It was all good and we got matching results between the input and ElasticSearch.
However, we are facing an issue since we started to use schedules and due to my inexperience I'm having hard time of getting to the bottom of this.
Here is how one our typical config looks like:
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://<connection_url>"
# The user we wish to execute our statement as
jdbc_user => "<user>"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/usr/share/logstash/postgresql-42.0.0.jar"
# schedule time for query
schedule => "*/30 * * * *"
# last running file
record_last_run => "false"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
# our query
statement => "Select row_number() over(order by product_line) as id, <other column>from <table>"
}
}
output {
elasticsearch {
index => "test1"
document_type => "test1"
document_id => "%{id}"
hosts => "http://<ElasticSearch URL>"
}
}
We have multiple configs in the same folder, but in each case:
- the statment is different and pointing to a different table
- the output index name is unique
- the output document type is different for each index
However, later on when logstash picks up the config due it's schedule we see a mismatch in rowcount in ES vs the jdbc input.
For example, this particular statement returns with 2856 rows in postgresql, but in ES we see 2868 rows after Logstash runs:
https://<ElasticSearch URL>/test1/_count
{
"count": 2868,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
}
}
If I check the last expected document in the index that one looks fine ( https:///test1/test1/2856 ), however after that I see documents coming from different inputs coming different configs.
Just to reconfirm, even though we have multiple configs, each have different query statement on the input side, and different index/doc type on the output so essentially easy index in ES should be a copy of the source table from PostgreSQL.
If we let Logstash running, over time the rowcount in ES will grow even further, while we are sure that the underlying PostgreSQL did not change. To me, it looks like multiple inputs from different configs files are pushed to multiple indices. When I check the _cat/indices REST endpoint I see many indexes with the exact same rowcount which confirms my above suspicion.
Is there anything we are doing wrong? Anyone has any pointer on how to troubleshoot this issue?
Many thanks in advance,
Norbert