How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
documentation:
File input plugin
File rotation is detected and handled by this input, regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
copying files take less than one minute.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
This is one example of duplicates.
There is one entry in log files.
grep -r "2018-02-02 11:31:36,978 ERROR" /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app_2018-02-02_11.0.log:2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}
There are four entries in elasticsearch. One entry has path "nifi-app_2018-02-02_11.0.log". The three entries have path "nifi-app.log". The nifi-app.log is a revolving file. I have removed fourth entry because of blog message "Body is limited to 7000 characters; you entered 7948".
curl -XGET '10.19.19.33:9200/from_nifi_dev_logs_nifi_4/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"query_string": {
"query": " (date:\"18-02-02\") AND (time:\"11:31:36,978\") AND (EventType:\"ERROR\") "
}
}
}
'
{
"_index" : "test_4",
"_type" : "test_4",
"_id" : "IMQcWGEBOC31Kjf9gyWS",
"_score" : 18.249443,
"_source" : {
"date" : "18-02-02",
"path" : "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app_2018-02-02_11.0.log",
"@timestamp" : "2018-02-02T20:01:59.159Z",
"EventType" : "ERROR",
"EventText" : "[Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"@version" : "1",
"host" : "hostname",
"time" : "11:31:36,978",
"message" : "2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"type" : "test_4"
}
},
{
"_index" : "test_4",
"_type" : "test_4",
"_id" : "CMEFWGEBOC31Kjf9ZD-n",
"_score" : 18.249443,
"_source" : {
"date" : "18-02-02",
"path" : "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app.log",
"@timestamp" : "2018-02-02T19:36:43.919Z",
"EventType" : "ERROR",
"EventText" : "[Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"@version" : "1",
"host" : "hostname",
"time" : "11:31:36,978",
"message" : "2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"type" : "test_4"
}
},
{
"_index" : "test_4",
"_type" : "test_4",
"_id" : "8cAAWGEBOC31Kjf90X7K",
"_score" : 17.824947,
"_source" : {
"date" : "18-02-02",
"path" : "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app.log",
"@timestamp" : "2018-02-02T19:31:44.177Z",
"EventType" : "ERROR",
"EventText" : "[Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"@version" : "1",
"host" : "hostname",
"time" : "11:31:36,978",
"message" : "2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"type" : "test_4"
}
},