There is a dataloss when data is been pushed to ElasticSearch via Fluentd

HI ,

I am using ElasticSearch With fluentd with Below configuration in Kubernetes environment.

ElasticSearch Configuration:-
belk-elasticsearch:
elasticsearch_master:
replicas: 3
no_of_masters: 2
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "500m"
memory: "1Gi"
es_java_opts: "-Djava.net.preferIPv4Stack=true -Xms1g -Xmx1g"
discovery_service: "elasticsearch-discovery"
elasticsearch_client:
replicas: 3
resources:
limits:
cpu: "1"
memory: "4Gi"
requests:
cpu: "500m"
memory: "2Gi"
es_java_opts: "-Djava.net.preferIPv4Stack=true -Xms2g -Xmx2g"
esdata:
replicas: 2
resources:
limits:
cpu: "1"
memory: "4Gi"
requests:
cpu: "500m"
memory: "2Gi"
es_java_opts: "-Xms2g -Xmx2g"

Fluentd Configuration:;-

<source>
        @type tail
        path /var/log/scale/*.json
        read_from_head true
        tag fluentd
        pos_file /var/log/access.log.pos
        format json
      </source>
      <match fluentd.**>
        @type elasticsearch
        @log_level debug
        host elasticsearch
        port 9200
        logstash_format true
        logstash_prefix journal
       </match>

I have 1 gb of data at /var/log/scale - 100 files of 10 mb each. I am reading these files using tail plugin and sending to elasticsearch.
With these configurations, I notice a lot of data loss.
Expected Num of Records: 2,944,362
Obtained Records in Elasticsearch: 1,427,105

Highlighting the fluentd logs: 

2019-05-09 07:16:39 +0000 [info]: #0 Connection opened to Elasticsearch cluster => {:host=>"elasticsearch", :port=>9200, :scheme=>"http"}
2019-05-09 07:16:39 +0000 [info]: #0 Detected ES 6.x: ES 7.x will only accept _doc in type_name.
2019-05-09 07:16:39 +0000 [info]: adding source type="tail"
2019-05-09 07:16:39 +0000 [info]: #0 starting fluentd worker pid=31 ppid=1 worker=0
2019-05-09 07:16:39 +0000 [debug]: #0 buffer started instance=24145100 stage_size=0 queue_size=0
2019-05-09 07:16:39 +0000 [debug]: #0 flush_thread actually running
2019-05-09 07:16:39 +0000 [debug]: #0 enqueue_thread actually running
2019-05-09 07:16:39 +0000 [info]: #0 following tail of /var/log/scale/scale_pqr-100.json
2019-05-09 07:16:39 +0000 [info]: #0 following tail of /var/log/scale/scale_pqr-101.json
....
2019-05-09 07:17:02 +0000 [info]: Worker 0 finished unexpectedly with signal SIGKILL
2019-05-09 07:16:14 +0000 [info]: #0 Connection opened to Elasticsearch cluster => {:host=>"elasticsearch", :port=>9200, :scheme=>"http"}
2019-05-09 07:16:14 +0000 [info]: #0 Detected ES 6.x: ES 7.x will only accept _doc in type_name.
2019-05-09 07:16:14 +0000 [info]: adding source type="tail"
2019-05-09 07:16:14 +0000 [info]: #0 starting fluentd worker pid=26 ppid=1 worker=0
2019-05-09 07:16:14 +0000 [debug]: #0 buffer started instance=13228140 stage_size=0 queue_size=0
2019-05-09 07:16:14 +0000 [debug]: #0 flush_thread actually running
2019-05-09 07:16:14 +0000 [debug]: #0 enqueue_thread actually running

**here are no error logs in elasticsearch.**

When I increase the memory limit of fluentd pod to 1Gi, I see all the records in elasticsearch without any data loss. But, it does not make sense to increase the memory limit of fluentd pod to 1Gi for just 1GB of static data.

What could be the reason of this loss of records? Am I missing any configuration in ElasticSearch?

What configuration would be recommended for my scenario?

That strongly suggests that Elasticsearch is doing exactly what it's asked to do, but Fluentd isn't asking it to do what you want.

This strongly suggests that Fluentd is dropping some of this data instead of indexing it into Elasticsearch. It's probably best to speak to the Fluentd community about how best to configure it not to do so.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.