Missing documents after indexing with Logstash

Hey,

I'm using Elasticsearch 5.2.2, Logstash 5.2.2 and PostgreSQL 9.5.2.
And trying to index over 5,000,000 documents distributed in two types in one index.
But after indexing some documents are missing. :cry:

The first type "A" has all the contents of one of the database tables (~4,000,000); and
The second type "B" is a subset of type "A" (~1,000,000).
All I'm index is a single string.

  • The ids of A can't be duplicated and the query returning B can have duplicates but it is much less than the count of documents missing.

  • I'm using the input plugin jdbc with the option jdbc_paging_enabled, and it states: "Be aware that ordering is not guaranteed between queries."
    So I put ORDER BY updated in the query. But still getting missing documents.

  • I kept monitoring the Thread Pool, but the bulk queue show no more than 5 and 0 rejected.


Any idea of what I should check?


When indexing ends I have:

GET /I/A/_count

{"count":3672436,...}

GET /I/B/_count

{"count":1154472,...}

GET /I/_stats/indexing?pretty&types=A,B

{
  ...
  "_all" : {
    "primaries" : {
      "indexing" : {
        "index_total" : 5494806,
        ...
        "types" : {
          "A" : {
            "index_total" : 4309319,
            ...
          },
          "B" : {
            "index_total" : 1185487,
            ...
          }
        }
      }
    },
   ...
  },
  ...
}

The config for Logstash for both types, just changing the "type" and "statement":

input {
    jdbc {
        jdbc_driver_library => "/usr/local/lib/postgresql-9.4.1212.jar"
        jdbc_driver_class => "org.postgresql.Driver"
        jdbc_connection_string => "jdbc:postgresql://postgres:5432/postgres"
        jdbc_user => "postgres"
        jdbc_password => "postgres"
        jdbc_validate_connection => true
        jdbc_paging_enabled => true
        jdbc_page_size => 100000
        schedule => "* * * * *"
        last_run_metadata_path => "/usr/share/logstash/data/A.logstash_jdbc_last_run"
        type => "A"
        statement => "SELECT id,str FROM a WHERE updated >= :sql_last_value ORDER BY updated ASC"
    }
}
output {
    if [type] == "A" {
        elasticsearch {
            hosts => "127.0.0.1:9200"
            index => "I"
            document_type => "A"
            document_id => "%{id}"
        }
    }
}

Thanks.

Any additional information about my problem is needed? :thinking:

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.