I had harvested 100 csv files. using filebeat -> logstash (Using csv filter) -> Elasticsearch. The disk size of 100 CSV flat files is 609 MB and translated to 1 GB in elasticsearch (pri.store.size).
I am aware of the advantages of _source field. However, I am trying to see the space that we could save by disabling _source field in elasticsearch 6.2.4 version. Using below command to disable the _source field
Harvest logs to perflogs-2018.19 index. I see below error in elasticsearch logs
[2018-05-16T06:44:41,919][DEBUG][o.e.a.b.TransportShardBulkAction] [perflogs-2018.19][0] failed to execute bulk item (index) BulkShardRequest [[perflogs-2018.19][0]] containing [12] requests
java.lang.IllegalArgumentException: Rejecting mapping update to [perflogs-2018.19] as the final mapping would have more than 1 type: [_doc, log]
at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:501) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:353) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:285) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:313) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:230) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:643) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:273) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:198) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:133) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Then run a _forcemerge call to rewrite the segments.
But also instead of removing the _source field which I do not recommend as you will miss a lot of feature then, have a look at your mapping and see what else you can optimize instead.
Yes, we use logstash in the data pipeline. How do we fix "Multiple mapping" issue?
The data pipeline is filebeat -> logstash -> elasticsearch.
With default compression codec, 608 MB flat file size translated to 1 GB of index store size.
With best_compression codec, 608 MB flat file size translated to 740 MB of index store size. Even best_compression codec seems to be little high on disk utilization. Hence, would like to know the disk usage of index store size when _source field is disabled.
I was able to disable the _source field after fixing the mapping name. Looks like logstash uses "log" as mapping name and I used the same name to disable _source field. 608 MB of log translated to 301 MB with best_compression codec.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.