Actual Size of a document - Mapper Size Plugin

prateek_shekhar · February 27, 2023, 8:48pm

Hi All,

Recently we have enabled the mapper-size plugin on our ES cluster. We have also added the index mapping _size as per the recommendations at this link: Using the _size field | Elasticsearch Plugins and Integrations [8.6] | Elastic

We are trying to keep the track of largest documents in our ES cluster. However I observed one behavior:
We use the following query to find the top 50 largest documents in the index:

{
    "from": 0,
    "size": 50,
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "_size": {
                "order": "desc"
            }
        }
    ],
    "script_fields": {
        "size": {
            "script": "doc['_size']"
        }
    },
    "_source": [
        "id"
    ]
}

Let's say that this query returns a document with id : 123 as the top most result. Now in the response it is observed that the size of the document is 35084836 bytes (35MB).

However if I download this document using the curl command:
curl https://es-endpoint/index_name/_doc/123 -H "Accept: application/json" > 123.json

it is observed that the document size is only about 1022118 bytes (1 MB).

Can someone please explain this discrepancy to me? I might be missing something here.

system · March 27, 2023, 8:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document size in elastic 7.0 Elasticsearch	8	8943	May 22, 2019
Mapper Size Plugin VS real Index Size Elasticsearch	4	453	December 21, 2021
How to find largest size document? Elasticsearch	3	3461	November 19, 2021
Comparison between index size and doc source size Elasticsearch	5	1990	April 24, 2023
Determining Size of A Document Elasticsearch	6	6487	May 10, 2019

Actual Size of a document - Mapper Size Plugin

Related topics