We just want to have a clear understanding of the _source field. Is it going to be indexed or stored? if it is stored where it is going to be stored.
As an experiment, we disabled the _source using PUT index_name/_mapping {enable=false}, and we were unable to see the source in the discovery page or retrieve it using devtools GET index_name/_search, but we didn't see any changes in the storage size of the Index. So we want to know if, when we disable the _source, it will delete the _source or just stop retrieving it.
In another scenario, I applied the mapper-size plugin to one index and calculated the sum of the _source field size for all the indexed documents. The result is much bigger than the index file.
Is _source stored somewhere else or stored within the index?
This is explained in the documentation, the _source field is stored in the index, but it is not indexed.
What was the size of the index and what was the sample document you used? If you had just a couple of documents or if your sample document is small you will barely see any differences, but as soon as the indice starts to grow (tens of GB) you will start to see the diference.
I do not use the mapper-size plugin, but if I'm not wrong it will store the size of the original _source field, but this is without compression, when the data is stored it will be compressed.
If you want to save space I would recommend that first you try to change the index.codec to best_compression instead of disabling the _source field.
How did you deleted the _source? This is a mapping changing, I'm not sure this can be changed on already existing indices or even if it can, I don't think that this will have any impact to already ingested documents.
Also, 2,17 GB is pretty small, not sure you will see a big difference on the size.
If you have questions about the disk usage of individual fields in an index, it is simplest to use the analyze index disk usage API to investigate more deeply.
can I assume that the _source (stored) and extracted fields ( indexed) are resided in the same index file?
For example an index size is 2GB and it has 60 million events, if I use size plugin, it shows average size of the document is 2KB, when I calculate the size of the index by using the formula of (Index Size in MB/Number of documents*1024) I get the average size of 0.33 KB. Does that mean the original _source is compressed from 2KB to 0.33KB?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.