Hi All,
Recently we have enabled the mapper-size plugin on our ES cluster. We have also added the index mapping _size as per the recommendations at this link: Using the _size field | Elasticsearch Plugins and Integrations [8.6] | Elastic
We are trying to keep the track of largest documents in our ES cluster. However I observed one behavior:
We use the following query to find the top 50 largest documents in the index:
{
"from": 0,
"size": 50,
"query": {
"match_all": {}
},
"sort": [
{
"_size": {
"order": "desc"
}
}
],
"script_fields": {
"size": {
"script": "doc['_size']"
}
},
"_source": [
"id"
]
}
Let's say that this query returns a document with id : 123 as the top most result. Now in the response it is observed that the size of the document is 35084836 bytes (35MB).
However if I download this document using the curl command:
curl https://es-endpoint/index_name/_doc/123 -H "Accept: application/json" > 123.json
it is observed that the document size is only about 1022118 bytes (1 MB).
Can someone please explain this discrepancy to me? I might be missing something here.