HI ,
I am using ElasticSearch With fluentd with Below configuration in Kubernetes environment.
ElasticSearch Configuration:-
belk-elasticsearch:
elasticsearch_master:
replicas: 3
no_of_masters: 2
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "500m"
memory: "1Gi"
es_java_opts: "-Djava.net.preferIPv4Stack=true -Xms1g -Xmx1g"
discovery_service: "elasticsearch-discovery"
elasticsearch_client:
replicas: 3
resources:
limits:
cpu: "1"
memory: "4Gi"
requests:
cpu: "500m"
memory: "2Gi"
es_java_opts: "-Djava.net.preferIPv4Stack=true -Xms2g -Xmx2g"
esdata:
replicas: 2
resources:
limits:
cpu: "1"
memory: "4Gi"
requests:
cpu: "500m"
memory: "2Gi"
es_java_opts: "-Xms2g -Xmx2g"
Fluentd Configuration:;-
<source>
@type tail
path /var/log/scale/*.json
read_from_head true
tag fluentd
pos_file /var/log/access.log.pos
format json
</source>
<match fluentd.**>
@type elasticsearch
@log_level debug
host elasticsearch
port 9200
logstash_format true
logstash_prefix journal
</match>
I have 1 gb of data at /var/log/scale - 100 files of 10 mb each. I am reading these files using tail plugin and sending to elasticsearch.
With these configurations, I notice a lot of data loss.
Expected Num of Records: 2,944,362
Obtained Records in Elasticsearch: 1,427,105
Highlighting the fluentd logs:
2019-05-09 07:16:39 +0000 [info]: #0 Connection opened to Elasticsearch cluster => {:host=>"elasticsearch", :port=>9200, :scheme=>"http"}
2019-05-09 07:16:39 +0000 [info]: #0 Detected ES 6.x: ES 7.x will only accept _doc in type_name.
2019-05-09 07:16:39 +0000 [info]: adding source type="tail"
2019-05-09 07:16:39 +0000 [info]: #0 starting fluentd worker pid=31 ppid=1 worker=0
2019-05-09 07:16:39 +0000 [debug]: #0 buffer started instance=24145100 stage_size=0 queue_size=0
2019-05-09 07:16:39 +0000 [debug]: #0 flush_thread actually running
2019-05-09 07:16:39 +0000 [debug]: #0 enqueue_thread actually running
2019-05-09 07:16:39 +0000 [info]: #0 following tail of /var/log/scale/scale_pqr-100.json
2019-05-09 07:16:39 +0000 [info]: #0 following tail of /var/log/scale/scale_pqr-101.json
....
2019-05-09 07:17:02 +0000 [info]: Worker 0 finished unexpectedly with signal SIGKILL
2019-05-09 07:16:14 +0000 [info]: #0 Connection opened to Elasticsearch cluster => {:host=>"elasticsearch", :port=>9200, :scheme=>"http"}
2019-05-09 07:16:14 +0000 [info]: #0 Detected ES 6.x: ES 7.x will only accept _doc in type_name.
2019-05-09 07:16:14 +0000 [info]: adding source type="tail"
2019-05-09 07:16:14 +0000 [info]: #0 starting fluentd worker pid=26 ppid=1 worker=0
2019-05-09 07:16:14 +0000 [debug]: #0 buffer started instance=13228140 stage_size=0 queue_size=0
2019-05-09 07:16:14 +0000 [debug]: #0 flush_thread actually running
2019-05-09 07:16:14 +0000 [debug]: #0 enqueue_thread actually running
**here are no error logs in elasticsearch.**
When I increase the memory limit of fluentd pod to 1Gi, I see all the records in elasticsearch without any data loss. But, it does not make sense to increase the memory limit of fluentd pod to 1Gi for just 1GB of static data.
What could be the reason of this loss of records? Am I missing any configuration in ElasticSearch?
What configuration would be recommended for my scenario?