Logstash - ES to BigQuery - doesn't load all rows into BigQuery

Hello,

I am using logstash 7.1.1 to fetch data from elasticsearch cluster to BigQuery database. My logstash instance ran successfully but ran into some problems:

Logstash config file:
input {
elasticsearch {
hosts => myhost
index => "index-v1"
query => '{"query": { "type" : {"value" : "ObjectTerritory2AssignmentRuleItem"}}, "_source" : {}}'
}
}

filter {
mutate {
remove_field => ["@version","@DATE","DATE","path","host","type", "message", "version", "timestamp", "@timestamp"]
}
if [ "lastmodifieddate" ] != "null" { mutate { gsub => [ "lastmodifieddate", "000+00:00", "" ] }}
if [ "systemmodstamp" ] != "null" { mutate { gsub => [ "systemmodstamp", "000+00:00", "" ] }}
}

output {
google_bigquery {
project_id => "project"
dataset => "dataset"
table_prefix => "ObjectTerritory2AssignmentRuleItem_2016_01"
table_separator => ""
flush_interval_secs => 50
date_pattern => ""
error_directory => "/logstash/7.1.1/tmp/bigquery-errors/tmp"
json_key_file => "key.json"
json_schema => {
fields => [
{ name => "CreatedById" type => "STRING" mode => "NULLABLE"},
{ name => "CreatedDate" type => "TIMESTAMP" mode => "NULLABLE"},
{ name => "Field" type => "STRING" mode => "NULLABLE"},
{ name => "Id" type => "STRING" mode => "NULLABLE"},
{ name => "IsDeleted" type => "BOOLEAN" mode => "NULLABLE"},
{ name => "LastModifiedById" type => "STRING" mode => "NULLABLE"},
{ name => "LastModifiedDate" type => "TIMESTAMP" mode => "NULLABLE"},
{ name => "Operation" type => "STRING" mode => "NULLABLE"},
{ name => "RuleId" type => "STRING" mode => "NULLABLE"},
{ name => "SortOrder" type => "FLOAT" mode => "NULLABLE"},
{ name => "SystemModstamp" type => "TIMESTAMP" mode => "NULLABLE"},
{ name => "Value" type => "STRING" mode => "NULLABLE"},
{ name => "sfdc_type" type => "STRING" mode => "NULLABLE"}
]
}
}
}

Logfile Content:
[2019-07-11T17:05:57,080][INFO ][logstash.outputs.googlebigquery] Publishing 128 messages to ObjectTerritory2AssignmentRuleItem_2016_01
[2019-07-11T17:05:58,526][INFO ][logstash.outputs.googlebigquery] Publishing 128 messages to ObjectTerritory2AssignmentRuleItem_2016_01
[2019-07-11T17:05:58,757][ERROR][logstash.outputs.googlebigquery] Error creating table. {:exception=>java.lang.NullPointerException}
...
[2019-07-11T17:06:44,678][INFO ][logstash.outputs.googlebigquery] Publishing 128 messages to ObjectTerritory2AssignmentRuleItem_2016_01
[2019-07-11T17:06:45,169][INFO ][logstash.outputs.googlebigquery] Publishing 128 messages to ObjectTerritory2AssignmentRuleItem_2016_01
[2019-07-11T17:06:46,082][INFO ][logstash.runner ] Logstash shut down.

Issues:

  1. Although my destination table exists, I get an error message - '[ERROR][logstash.outputs.googlebigquery] Error creating table. {:exception=>java.lang.NullPointerException}'. Why does logstash try to create a table if it exists? Can this feature disabled? I didn't find any parameter that can do this. Please suggest.

  2. The number of documents in elasticsearch is 11437 whereas total number of records that got loaded in BigQuery is 11392. As an alternative, I tried deleting the table and reloading it but I saw the same behavior.

Please help.

Update - I found that logstash didn't commit the records in buffer before shutting down. Detailed analysis is here.

Thanks @RobBavey for quickly resolving this issue and updating the plugin code!