ElasticSearch storage needs grows fast

Hello everyone,

I encounter a strange behavior I'm trying to understand.
We have an SQL Server with some data and index it all in ElasticSearch.
After this operation, the space needed on the drive is 11GB.

The workflow with ElasticSearch is as described below:
First, we update, delete and index new documents.
The second, we do search queries on ElasticSearch the whole time.

After a few days the storage needed by the \data\ folder is now 100GB wich was a problem for us since its all the storage we had allocated to ElasticSearch.

What can explain this much grow and how can I control/prepare to it/get back to the original state?

Thanks,
Matthias.

are there some folders/files within the data directory that take the most size compared to all the rest?

1 Like

What is your deleted document count looking like?

1 Like

Hello Javanna & Warkolm.

I cannot really answer your question since we had to clean the storage to keep the system running because it was in production. But last night, I tried to reproduce the problem with the latest version of ElasticSearch (2.4) but cannot reproduce it but I made some observations that I would like to explain.

First I indexed 30.000 documents and it took 160 mbs on the disk. I update these documents continuously by changing an analysed text field with new text but with the same number of characters and the same format.

The storage change a lot as you can see here: http://i.imgur.com/4ZPTEwR.png

The maximum storage needed went to 1306 mbs.
The mimum storage required changed with time between 200 mbs and 400 mbs approx.

One last observation I would like to note is that when I stopped the indexing process, the storage stayed at 1000 mbs then I stopped ElasticSearch and the storage went as low as 120 mbs.

Someone with some inside knowledges know why this happens?

Thanks, if i'm not clear or you need more informations, post a comment! :slight_smile:

Hello everyone,

I continued to look around to better understand how ElasticSearch store data and why the index storage requirements varies that much.

I tried the same experiment with ElasticSearch 1.7.x and the values follow a almost similar path.

Plus one more question, what's the point of the storage water marks?
"cluster.routing.allocation.disk.watermark.low": "80%",
"cluster.routing.allocation.disk.watermark.high": "50gb",

Thanks,
Matthias.

The docs explain that - https://www.elastic.co/guide/en/elasticsearch/reference/2.4/disk-allocator.html

It explains how it works but not why ElasticSearch has this mecanism.

Thanks for your answer

The point is to prevent disks filling up and forcing ES to stop.