Disk usage after indexing much higher then after restart service

Benjamin_Gathmann · February 14, 2017, 2:39pm

I noticed that if I index a large amount of documents to my cluster (which currently is restricted to one server), I have e.g. a disk usage of 90 GByte. (du -sh /var/lib/elasticsearch/mycluster)
If I then restart elasticsearch service (sudo service elasticsearch restart), the size of that directory is sharply reduced to 68 GByte.
Why is that so?
What can I do to free the space without restarting the service?

The phenomenon resembles this issue:

but I am running ES 2.4

warkolm · February 15, 2017, 9:25pm

Depends, could be merges, translog or other things.
What do the APIs (eg _cat) show about disk use?

Benjamin_Gathmann · February 16, 2017, 9:44am

Hi Mark,

Which commands should I run in particular? E.g.:

curl -GET 'http://localhost:9200/_cat/indices'
curl -GET 'http://localhost:9200/_stats'

indices output is pretty limited, _stats output is very large, what should I look for?

Benjamin_Gathmann · February 16, 2017, 9:47am

OK, I think now I know what you mean - I should use _cat API instead of using "du -sh" because it will show the size of the index itself, not the directory size.
I will now re-index and check it out.

Benjamin_Gathmann · February 16, 2017, 12:30pm

OK, so I reindexed and then collected all kinds of data before and after restarting the service.

_cat/indices output was simply

health status index         pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex   5   1   96414689            0       97gb           97gb

before and:

health status index         pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex   5   1   96414689            0     69.9gb         69.9gb

afterwards

As you can see, I have the default settings, so five primary nodes in total.
For each of these, the directory

/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index

shows a du -sh of 20 GB before, and 16 GB afterwards.

translog folders are very small in both cases (around 20 KB).

So I ran

du  /var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/* | sort -n

on each of the nodes before and afterwards to get an idea of which files change in size.

After the restart, the files in this directory either stayed constant in size, or they completely disappeared.

The largest of the disappeared files for a sample directory were (first column is size in KByte):

284632	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_t0.cfs
370488	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_3v.nvd
477448	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_wy.cfs
506380	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_hl.cfs
608492	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_pc.cfs
820228	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_jt.nvd
838104	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_rl.cfs
1269064	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_mm.nvd

Can you roughly tell me what happens there?

I can also email you a complete list of the files if you like.

Thanks!

Benjamin_Gathmann · February 16, 2017, 12:33pm

By the way, this is an index with a lot of nested documents, maybe this has some special effects.

warkolm · February 16, 2017, 9:47pm

Does _cat/segments change between reboots?

Benjamin_Gathmann · February 17, 2017, 9:29am

Before service restart, the following exist in _cat/segments, but not after:

index         shard prirep ip        segment generation docs.count docs.deleted    size size.memory committed searchable version compound     
my_index      0     p      127.0.0.1 _h2            614    1768000            0   1.3gb           0 true      false      5.5.2   false    
my_index      0     p      127.0.0.1 _i7            655     410189            0 262.9mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _kc            732     820267            0   479mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _mg            808         75            0  36.2kb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _n9            837    1489193            0     1gb           0 true      false      5.5.2   false    
my_index      0     p      127.0.0.1 _ou            894    2131694            0   1.5gb           0 true      false      5.5.2   false    
my_index      0     p      127.0.0.1 _py            934     166920            0   105mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _q9            945     490438            0 282.4mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _r1            973      16775            0   7.9mb           0 true      false      5.5.2   true

Plus the following entry changes in the "committed" column (i.e. it is "true" after restart):

my_index      0     p      127.0.0.1 _rc            984    7293551            0   5.5gb     1542548 **false**     true       5.5.2   false

Cheers,
Benjamin

warkolm · February 18, 2017, 1:13am

Looks like things are being merged and then compressed then.

Benjamin_Gathmann · February 18, 2017, 6:14am

Ok, but will this eventually happen by itself, or do I have to restart the
service? I am worried if my index grows to a Terabyte, will I have a few
hundred Gigabytes lying around that only get cleaned up after service
restart?

warkolm · February 18, 2017, 7:11am

It's an automatic process, see https://www.elastic.co/guide/en/elasticsearch/reference/5.2/index-modules-merge.html

Benjamin_Gathmann · February 20, 2017, 7:53am

OK, thanks.
I will watch the cluster and see whether the merging happens after some time (but I trust that ES does what the specs say).

system · March 20, 2017, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
/_cat/indices size not matching actual size on disk? Elasticsearch	1	365	April 24, 2018
Disk space almost filled Elasticsearch	5	701	January 15, 2020
Storage full possible to reset data Elasticsearch	1	356	October 2, 2019
Elasticsearch Disk indices storage greater than the indices used space Elasticsearch	5	1931	May 27, 2021
Many deleted indices, no recovery of disk space? Elasticsearch	4	290	October 3, 2021

Disk usage after indexing much higher then after restart service

Related topics