_source field storage (_source Field Overview)

Hello,

We just want to have a clear understanding of the _source field. Is it going to be indexed or stored? if it is stored where it is going to be stored.

As an experiment, we disabled the _source using PUT index_name/_mapping {enable=false}, and we were unable to see the source in the discovery page or retrieve it using devtools GET index_name/_search, but we didn't see any changes in the storage size of the Index. So we want to know if, when we disable the _source, it will delete the _source or just stop retrieving it.

In another scenario, I applied the mapper-size plugin to one index and calculated the sum of the _source field size for all the indexed documents. The result is much bigger than the index file.

Is _source stored somewhere else or stored within the index?

Thanks In advance !

This is explained in the documentation, the _source field is stored in the index, but it is not indexed.

What was the size of the index and what was the sample document you used? If you had just a couple of documents or if your sample document is small you will barely see any differences, but as soon as the indice starts to grow (tens of GB) you will start to see the diference.

I do not use the mapper-size plugin, but if I'm not wrong it will store the size of the original _source field, but this is without compression, when the data is stored it will be compressed.

If you want to save space I would recommend that first you try to change the index.codec to best_compression instead of disabling the _source field.

1 Like

Thanks, @leandrojmp, for the information; it was really helpful.

Just for my understanding, should I just assume that the 2.17 GB of my index is made up of the fields and the _source?

For your understading, our index size was 2.17 GB when we deleted the _source, and we didn't see even a single percent change in the storage size.

How did you deleted the _source? This is a mapping changing, I'm not sure this can be changed on already existing indices or even if it can, I don't think that this will have any impact to already ingested documents.

Also, 2,17 GB is pretty small, not sure you will see a big difference on the size.

2 Likes

We just disabled the _source. PUT index_name/_mapping {enable=false}

Thank you for clarifying. Your insights are appreciated.

If you have questions about the disk usage of individual fields in an index, it is simplest to use the analyze index disk usage API to investigate more deeply.

1 Like

Thanks for the info, @DavidTurner. I will look into that.

Hello @DavidTurner and @leandrojmp
Just for the confirmation i am asking,

can I assume that the _source (stored) and extracted fields ( indexed) are resided in the same index file?

For example an index size is 2GB and it has 60 million events, if I use size plugin, it shows average size of the document is 2KB, when I calculate the size of the index by using the formula of (Index Size in MB/Number of documents*1024) I get the average size of 0.33 KB. Does that mean the original _source is compressed from 2KB to 0.33KB?

There's no need to assume anything, just use the API I linked above.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.