I am using ELK stack. I am parsing my logs using Logstash then forward to elasticsearch after that on Kibana dashboard.
My total log file size of 24 Hour is 3 GB at backend. But when i am checking at elasticsearch its showing pri.store.size around 7 GB which is more than the double size of actual file size.
I am using 2 Primary shards and 1 replica. So store size is around 15 GB which seems fine according to my replica shards. Its become double because i have one replica.
Please refer the below output:-
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open filebeat-2018.08.07 Ta44e3HNQ4uAs4_m5q0IgA 2 1 12714187 0 15.2gb 7.6gb
Which version of Elasticsearch are you using? What does your data look like? What does your mappings look like for any custom fields that have been parsed?
This blog post discusses the impact of mappings and enrichment on storage size for Elasticsearch 5.x, but some of that also applies to Elasticsearch 6.x. How to tune your mappings to reduce the indexed size on disk is also covered in the documentation.
I am using elasticsearch 6.2.4. Below are the sample logs:
I, [2018-08-07T06:26:02.966222 #9981] INFO -- : [4c21290c-bdc5-42c2-8327-6b691c62fe49] Completed 200 OK in 17ms (ActiveRecord: 4.3ms)
I, [2018-08-07T06:26:02.966327 #9974] INFO -- : [3b63b133-354f-4c33-a620-05815e7d61c8] notify_device :: previous_changes: {} :: device id: 100159
I, [2018-08-07T06:26:02.967995 #9981] INFO -- : [2056560c-935e-4efc-a0b6-1770d3fed736] Started POST "/api/v1/gps_audits.json" for 2600:1004:b11a:74bf:0:5a:5f78:3c01 at 2018-08-07 06:26:02 +0000
I, [2018-08-07T06:26:02.968362 #9974] INFO -- : [3b63b133-354f-4c33-a620-05815e7d61c8] [active_model_serializers] Rendered ActiveModel::Serializer::Null with Hash (0.16ms)
I, [2018-08-07T06:26:02.969514 #9974] INFO -- : [3b63b133-354f-4c33-a620-05815e7d61c8] Completed 200 OK in 20ms (Views: 0.8ms | ActiveRecord: 4.6ms)
I, [2018-08-07T06:26:02.970578 #10002] INFO -- : [830f249e-22f4-4124-9480-aefbbca40abb] notify_eva :: Dropping as chat not enabled for device owner: 97077
I, [2018-08-07T06:26:02.970714 #10002] INFO -- : [830f249e-22f4-4124-9480-aefbbca40abb] notify_device :: previous_changes: {} :: device id: 97077
I, [2018-08-07T06:26:02.971049 #9974] INFO -- : [116335ed-dca5-4798-b9f5-100b30845dc5] Started PUT "/api/v1/devices/ping.json" for 2001:44c8:4141:e86:1:1:6687:8b57 at 2018-08-07 06:26:02 +0000
I, [2018-08-07T06:26:02.972398 #10002] INFO -- : [830f249e-22f4-4124-9480-aefbbca40abb] [active_model_serializers] Rendered ActiveModel::Serializer::Null with Hash (0.08ms)
I, [2018-08-07T06:26:02.973289 #10002] INFO -- : [830f249e-22f4-4124-9480-aefbbca40abb] Completed 200 OK in 20ms (Views: 0.6ms | ActiveRecord: 7.2ms)
I, [2018-08-07T06:26:02.973708 #9974] INFO -- : [116335ed-dca5-4798-b9f5-100b30845dc5] Processing by Api::V1::DevicesController#ping as JSON
I have created some custom fileds like date-time, pid, verb, request-id etc.
Please let me know how we can reduce log size at elasticsearch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.