Document size in elastic 7.0

I have done below steps but still not getting the document size in result.

  1. Installed mapper-size plugin.
  2. Enabled _size also and its reflecting in index mapping
    curl -X PUT "localhost:9200//_mapping" -H 'Content-Type: application/json' -d'
    {
    "_size": {
    "enabled": true
    }
    }’

When i am executing search apis, still not getting document size in response.

GET //_search
{
"query": { "match_all": {} }
}

That's expected.
Documentation says:

The value of the _size field is accessible in queries, aggregations, scripts, and when sorting

It does not mean that it's accessible as the _source field.

Sure, but is there any way to get the size of document via creating some query? Something like pass document id and get the document size.

What is the use case? I mean that it's just the size of the json content which is something you can easily compute on the client side.

ok, basically i have detailed-description field in the document, this field can have 50KB of data in api request so document size can be 50KB max.
I am doing capacity planning for the data nodes, I know elastic uses Lz4 for compression but want to make sure whats the total size of document in elastic against the 50KB of data on client side. Do we have any plugin or some api which can return the saved document size on elastic.

so i found the way to get the size like below.
curl -X GET "http://localhost:9200/< index-name >/_doc/<_id>?stored_fields=_size"

Now the problem is on client side and above query responding same data size.
detailed_description field is part of _source.

Whats this size ? Is this the size of raw data or saved data in elastic(compressed) ?

why its not showing the compressed data size ? Please suggest.

If your goal is just to do capacity planning, what about sending real data against a real cluster and measure the place it's actually taking on disk?

There are so many things happening behind the scene in addition to storing a json document. Think about field indices, doc_values, ...

1 Like

In addition of what @dadoonet mentioned (getting size reported on file system), "total size" is also influenced by number of segments.
Compression mostly happens within a segment, so depending on your use cases, if you are creating a lot of small segments, the reported size might be bigger compared to after segment merging happened.
So I would suggest to also consider recording number of segments when you are polling the index size to avoid wrong capacity planning.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.