I have a JSON file which has 93k records in it. The size of the JSON file is 413MB. When Indexed via logstash it is just 88MB in the index. I could see all the records ingested.
Index setting :
I have 50 fields for indexing and 200 fields are not indexed and the same is configured in the index template.
I have _source configured. No replicas configured. Removed message fields in logstash. I have tried benchmarking with the above settings and could see huge difference in the index size vs raw doc size.
Can someone help me understand how there is a big difference in the size?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.