Actual Size of a document - Mapper Size Plugin

Hi All,

Recently we have enabled the mapper-size plugin on our ES cluster. We have also added the index mapping _size as per the recommendations at this link: Using the _size field | Elasticsearch Plugins and Integrations [8.6] | Elastic

We are trying to keep the track of largest documents in our ES cluster. However I observed one behavior:
We use the following query to find the top 50 largest documents in the index:

{
    "from": 0,
    "size": 50,
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "_size": {
                "order": "desc"
            }
        }
    ],
    "script_fields": {
        "size": {
            "script": "doc['_size']"
        }
    },
    "_source": [
        "id"
    ]
}

Let's say that this query returns a document with id : 123 as the top most result. Now in the response it is observed that the size of the document is 35084836 bytes (35MB).

However if I download this document using the curl command:
curl https://es-endpoint/index_name/_doc/123 -H "Accept: application/json" > 123.json

it is observed that the document size is only about 1022118 bytes (1 MB).

Can someone please explain this discrepancy to me? I might be missing something here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.