Disk usage after indexing much higher then after restart service

I noticed that if I index a large amount of documents to my cluster (which currently is restricted to one server), I have e.g. a disk usage of 90 GByte. (du -sh /var/lib/elasticsearch/mycluster)
If I then restart Elasticsearch service (sudo service elasticsearch restart), the size of that directory is sharply reduced to 68 GByte.
Why is that so?
What can I do to free the space without restarting the service?

The phenomenon resembles this issue:

but I am running ES 2.4

Depends, could be merges, translog or other things.
What do the APIs (eg _cat) show about disk use?

1 Like

Hi Mark,

Which commands should I run in particular? E.g.:

curl -GET 'http://localhost:9200/_cat/indices'
curl -GET 'http://localhost:9200/_stats' 

indices output is pretty limited, _stats output is very large, what should I look for?

OK, I think now I know what you mean - I should use _cat API instead of using "du -sh" because it will show the size of the index itself, not the directory size.
I will now re-index and check it out.

OK, so I reindexed and then collected all kinds of data before and after restarting the service.

_cat/indices output was simply

health status index         pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex   5   1   96414689            0       97gb           97gb

before and:

health status index         pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex   5   1   96414689            0     69.9gb         69.9gb

afterwards

As you can see, I have the default settings, so five primary nodes in total.
For each of these, the directory

/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index

shows a du -sh of 20 GB before, and 16 GB afterwards.

translog folders are very small in both cases (around 20 KB).

So I ran

du  /var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/* | sort -n 

on each of the nodes before and afterwards to get an idea of which files change in size.

After the restart, the files in this directory either stayed constant in size, or they completely disappeared.

The largest of the disappeared files for a sample directory were (first column is size in KByte):

284632	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_t0.cfs
370488	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_3v.nvd
477448	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_wy.cfs
506380	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_hl.cfs
608492	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_pc.cfs
820228	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_jt.nvd
838104	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_rl.cfs
1269064	/var/lib/elasticsearch/mycluster/nodes/0/indices/myindex/0/index/_mm.nvd

Can you roughly tell me what happens there?

I can also email you a complete list of the files if you like.

Thanks!

By the way, this is an index with a lot of nested documents, maybe this has some special effects.

Does _cat/segments change between reboots?

Before service restart, the following exist in _cat/segments, but not after:

index         shard prirep ip        segment generation docs.count docs.deleted    size size.memory committed searchable version compound     
my_index      0     p      127.0.0.1 _h2            614    1768000            0   1.3gb           0 true      false      5.5.2   false    
my_index      0     p      127.0.0.1 _i7            655     410189            0 262.9mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _kc            732     820267            0   479mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _mg            808         75            0  36.2kb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _n9            837    1489193            0     1gb           0 true      false      5.5.2   false    
my_index      0     p      127.0.0.1 _ou            894    2131694            0   1.5gb           0 true      false      5.5.2   false    
my_index      0     p      127.0.0.1 _py            934     166920            0   105mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _q9            945     490438            0 282.4mb           0 true      false      5.5.2   true     
my_index      0     p      127.0.0.1 _r1            973      16775            0   7.9mb           0 true      false      5.5.2   true     

Plus the following entry changes in the "committed" column (i.e. it is "true" after restart):

my_index      0     p      127.0.0.1 _rc            984    7293551            0   5.5gb     1542548 **false**     true       5.5.2   false    

Cheers,
Benjamin

Looks like things are being merged and then compressed then.

Ok, but will this eventually happen by itself, or do I have to restart the
service? I am worried if my index grows to a Terabyte, will I have a few
hundred Gigabytes lying around that only get cleaned up after service
restart?

It's an automatic process, see https://www.elastic.co/guide/en/elasticsearch/reference/5.2/index-modules-merge.html

OK, thanks.
I will watch the cluster and see whether the merging happens after some time (but I trust that ES does what the specs say). :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.