Index size is increased while deleting

I've just upgraded my cluster from v5.3 to v7.5 and I see strange behaviour.

I'm using Bulk API to delete and index docs to a newly created index. I'm sure that at the beginning my bulk requests only contain deletion of non-existent docs but I see the number of deleted docs (docs.deleted) and the index size (store.size_in_bytes) keep going up.

I don't see this behaviour in ES 5.3 cluster.

Can someone shed some light on this please?

I'm running the cluster with ~20 nodes and 6 shards, each shard has 20 replicas. Below you can find my cluster & index settings.

{
"settings" : {
"analysis": {
"analyzer": {
"standard_lowercase_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"normalizer": {
"uppercase_normalizer": {
"type": "custom",
"filter": ["uppercase"]
},
"lowercase_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
},
"refresh_interval": -1,
"number_of_shards": 6,
"auto_expand_replicas": false,
"search": {
"slowlog": {
"threshold": {
"fetch": {
"warn": "1s",
"trace": "200ms",
"debug": "500ms",
"info": "800ms"
},
"query": {
"warn": "5s",
"trace": "200ms",
"debug": "400ms",
"info": "1s"
}
}
}
},
"queries": {
"cache": {
"enabled": "true"
}
}
}
}

cluster.name: classified-search
node.name: ${HOSTNAME}
plugin.mandatory: discovery-ec2,analysis-icu

network.bind_host: 0.0.0.0
network.publish_host: 0.0.0.0
network.host: ec2:privateIp

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
xpack.security.authc:
anonymous:
roles: monitoring_user
authz_exception: true

discovery.seed_providers: ec2
discovery.ec2.endpoint: ec2.eu-west-1.amazonaws.com
discovery.ec2.host_type: private_ip
discovery.ec2.availability_zones: eu-west-1a,eu-west-1b,eu-west-1c
discovery.ec2.tag.Environment: ENVIRONMENT

path:
data: /media/ephemeral0
logs: /var/log/elasticsearch

http.cors.enabled: true
http.cors.allow-origin: /https?://localhost(:[0-9]+)?/

indices.queries.cache.size: 20%
indices.requests.cache.size: 20%

action.auto_create_index: .watches,.triggered_watches,.watcher-history-,.monitoring-,logstash*,performance*,-*

Hi @nhatnam

the reason that deleting documents at least temporarily increases the size of an index is that when Lucene, the underlying storage engine ES uses, deletes a document it only adds the information that this document is deleted to the index. A delete does not initially cause a document to be removed from disk physically. You can find some detailed information on how this mechanism works under the hood in this older but still valid article.

As explained in that article ES will eventually reclaim the disk space used by those deleted documents during segment merges. Also, if you are done writing to a certain index and know that you won't write to it again, you may force the reclaiming of disk space via the force merge API.

As for

I don't see this behaviour in ES 5.3 cluster.

I think this is likely a coincidence to some degree. ES 5.3 has the same behaviour in general but the actual specifics of how quickly disk space is reclaimed will vary according to the size of your indices, their settings, the Lucene/ES version etc. and is not easy to predict in the concrete case I think.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.