JDBC input pushed to the wrong ES output - configs are seemingly mixed up

Apolitosz · July 6, 2017, 3:51pm

Hi everyone,

I'm relative new to the ES/Logstash world, but I'll do my best to describe the problem I'm facing.

We have ES running on a AWS instance (multiple ones I believe in a cluster, it wasn't me setting it up, but I can ask for specifics if it matters) and Logstash also in AWS.

So far we were testing all this by creating the Logstash config files and then running them with logstash -f . It was all good and we got matching results between the input and ElasticSearch.

However, we are facing an issue since we started to use schedules and due to my inexperience I'm having hard time of getting to the bottom of this.

Here is how one our typical config looks like:

input {
jdbc {
    # Postgres jdbc connection string to our database, mydb
    jdbc_connection_string => "jdbc:postgresql://<connection_url>"
    # The user we wish to execute our statement as
    jdbc_user => "<user>"
    # The path to our downloaded jdbc driver
    jdbc_driver_library => "/usr/share/logstash/postgresql-42.0.0.jar"
    # schedule time for query
    schedule => "*/30 * * * *"
    # last running file
    record_last_run => "false"
    # The name of the driver class for Postgresql
    jdbc_driver_class => "org.postgresql.Driver"
    # our query
    statement => "Select row_number() over(order by product_line) as id, <other column>from <table>"
}
}
output {
elasticsearch {
    index => "test1"
    document_type => "test1"
    document_id => "%{id}"
    hosts => "http://<ElasticSearch URL>"
}
}

We have multiple configs in the same folder, but in each case:

the statment is different and pointing to a different table
the output index name is unique
the output document type is different for each index

However, later on when logstash picks up the config due it's schedule we see a mismatch in rowcount in ES vs the jdbc input.

For example, this particular statement returns with 2856 rows in postgresql, but in ES we see 2868 rows after Logstash runs:

https://<ElasticSearch URL>/test1/_count
{
    "count": 2868,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    }
}

If I check the last expected document in the index that one looks fine ( https:///test1/test1/2856 ), however after that I see documents coming from different inputs coming different configs.

Just to reconfirm, even though we have multiple configs, each have different query statement on the input side, and different index/doc type on the output so essentially easy index in ES should be a copy of the source table from PostgreSQL.

If we let Logstash running, over time the rowcount in ES will grow even further, while we are sure that the underlying PostgreSQL did not change. To me, it looks like multiple inputs from different configs files are pushed to multiple indices. When I check the _cat/indices REST endpoint I see many indexes with the exact same rowcount which confirms my above suspicion.

Is there anything we are doing wrong? Anyone has any pointer on how to troubleshoot this issue?

Many thanks in advance,
Norbert

Apolitosz · July 7, 2017, 2:50pm

We've disabled dynamic addition of types by using "dynamic": "strict" in the mapping. As each index of ours has it's own doc type, this way we can prevent a document from a different config to be pushed to the wrong index. However, according to these warnings, the issue is still present:

[2017-07-07T14:08:38,741][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"79", :_index=>"test1", :_type=>"test1", :_routing=>nil}, 2017-07-07T14:07:34.765Z %{host} %{message}], :response=>{"index"=>{"_index"=>"test1", "_type"=>"test1", "_id"=>"79", "status"=>400, "error"=>{"type"=>"strict_dynamic_mapping_exception", "reason"=>"mapping set to strict, dynamic introduction of [severity] within [test1] is not allowed"}}}}

The serverity field is surely not in the test1.conf I created for testing, and the test1 index is surely not used in other configs as output.

Any ideas how to fully eliminate the issue would be appreciated.

Thanks

system · August 4, 2017, 2:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JDBC input scheduling question Logstash	1	808	July 6, 2017
My input JDBC conf is not working .... [org.logstash.execution.PeriodicFlush][main] Pushing flush onto pipeline.] Logstash	4	459	January 15, 2020
Issue with Logstash & JDBC input (postgreSQL) Logstash	6	2840	April 11, 2019
Filter and Output plugin not working Logstash	5	307	June 30, 2020
An elasticsearch index as logstash input Logstash	3	356	April 3, 2019

JDBC input pushed to the wrong ES output - configs are seemingly mixed up

Related topics