Index size is increased while deleting

nhatnam · February 4, 2020, 12:38pm

I've just upgraded my cluster from v5.3 to v7.5 and I see strange behaviour.

I'm using Bulk API to delete and index docs to a newly created index. I'm sure that at the beginning my bulk requests only contain deletion of non-existent docs but I see the number of deleted docs (docs.deleted) and the index size (store.size_in_bytes) keep going up.

I don't see this behaviour in ES 5.3 cluster.

Can someone shed some light on this please?

I'm running the cluster with ~20 nodes and 6 shards, each shard has 20 replicas. Below you can find my cluster & index settings.

{
"settings" : {
"analysis": {
"analyzer": {
"standard_lowercase_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"normalizer": {
"uppercase_normalizer": {
"type": "custom",
"filter": ["uppercase"]
},
"lowercase_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
},
"refresh_interval": -1,
"number_of_shards": 6,
"auto_expand_replicas": false,
"search": {
"slowlog": {
"threshold": {
"fetch": {
"warn": "1s",
"trace": "200ms",
"debug": "500ms",
"info": "800ms"
},
"query": {
"warn": "5s",
"trace": "200ms",
"debug": "400ms",
"info": "1s"
}
}
}
},
"queries": {
"cache": {
"enabled": "true"
}
}
}
}

cluster.name: classified-search
node.name: ${HOSTNAME}
plugin.mandatory: discovery-ec2,analysis-icu

network.bind_host: 0.0.0.0
network.publish_host: 0.0.0.0
network.host: ec2:privateIp

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
xpack.security.authc:
anonymous:
roles: monitoring_user
authz_exception: true

discovery.seed_providers: ec2
discovery.ec2.endpoint: ec2.eu-west-1.amazonaws.com
discovery.ec2.host_type: private_ip
discovery.ec2.availability_zones: eu-west-1a,eu-west-1b,eu-west-1c
discovery.ec2.tag.Environment: ENVIRONMENT

path:
data: /media/ephemeral0
logs: /var/log/elasticsearch

http.cors.enabled: true
http.cors.allow-origin: /https?://localhost(:[0-9]+)?/

indices.queries.cache.size: 20%
indices.requests.cache.size: 20%

action.auto_create_index: .watches,.triggered_watches,.watcher-history-,.monitoring-,logstash*,performance*,-*

Armin_Braun · February 12, 2020, 6:03am

Hi @nhatnam

the reason that deleting documents at least temporarily increases the size of an index is that when Lucene, the underlying storage engine ES uses, deletes a document it only adds the information that this document is deleted to the index. A delete does not initially cause a document to be removed from disk physically. You can find some detailed information on how this mechanism works under the hood in this older but still valid article.

As explained in that article ES will eventually reclaim the disk space used by those deleted documents during segment merges. Also, if you are done writing to a certain index and know that you won't write to it again, you may force the reclaiming of disk space via the force merge API.

As for

I don't see this behaviour in ES 5.3 cluster.

I think this is likely a coincidence to some degree. ES 5.3 has the same behaviour in general but the actual specifics of how quickly disk space is reclaimed will vary according to the size of your indices, their settings, the Lucene/ES version etc. and is not easy to predict in the concrete case I think.

system · March 11, 2020, 6:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doc.deleted keeps increasing while indexing Elasticsearch	3	1665	January 10, 2017
When deleting by query api the number of deleted_docs and index size_in_bytes does not decrease Elasticsearch	3	1074	July 6, 2017
Index with heavy updates/deletes, deleted docs on the rise Elasticsearch	2	763	October 5, 2017
ES5 -> ES6 -> ES7, Snapshot Restore, Reindex, Index Size increase Elasticsearch	2	1083	May 7, 2020
Deleted docs could be still retrieved although refreshed Elasticsearch	15	1442	December 21, 2022

Index size is increased while deleting

Related topics