Data directory is still growing after initial index build


(4b3l) #1

Hi,

I have a 3 Node ES Cluster. I have built an index in my new cluster. This took a while to complete.

I've noticed that since the index being built, the data directory is still growing and I am not sure why or where to troubleshoot this, our Application is idle so there aren't any requests going through.

Anyone have any ideas or point me in the right direction?


(David Pilato) #2

Might be segment merge is happening behind the scene?
Monitoring (available in X-Pack basic - free) could help to see that may be.


(4b3l) #3

Thanks for your reply. Not sure which specific monitoring to use,

I used _cat/segments?v and _segments, but it doesn't tell me if there are any active segment merging going on

I don't see anything in the X-pack API documentation.


(David Pilato) #4

X-Pack monitoring collects information about the segment numbers. And lot of other metrics.
You should give it a try.


(4b3l) #5

Thanks. From reading the documentation Segment merging is something that happens constantly behind the scenes but is triggered when there is new data coming in.

Our system has been idle for over a week, e.g. no new data going in.

Seems strange it takes over a week for something like this to happen or is it just how Lucene behaves?


(David Pilato) #6

Our system has been idle for over a week, e.g. no new data going in.

Then it looks strange indeed. Could you run:

GET _cat/indices?v

(4b3l) #7

The list is big to copy and paste? Is there anything to look for?

health is green, and status open for all indexes.


(David Pilato) #8

You can share it on gist.github.com and just add the link here.


(4b3l) #9

thanks: Here is the url:


(David Pilato) #10

Can you explain what are all those indices?
Can you run the same command again and compare which one is still growing?

As you are using xpack, for sure the number of documents in indices like .monitoring-es-* is still increasing. But it should not be that much.


(4b3l) #11

Just ran it again it seems the same, I will do this again after 1 hour to confirm.

The indices are our data we have created our own naming convention.


(David Pilato) #12

Ok but that's 600 indices on 3 nodes. That is becoming to be a lot.
And probably a waste of resources as you have at most some hundred of mb used per shard.

Also notice that the big indices you have are the monitoring ones (some gb).


(4b3l) #13

I just did a compare and the monitoring-es indices seem to grow to a max 5.7GB/5.8GBbefore a new one is created on each day, reading some background it keeps up to 7 days for a basic license. What are the purpose of these indices.


(David Pilato) #14

What are the purpose of these indices.

They are used by X-Pack Monitoring.
You can change xpack.monitoring.collection.interval (defaults to every 10s). See https://www.elastic.co/guide/en/elasticsearch/reference/6.1/monitoring-settings.html#monitoring-collection-settings

Also you can change xpack.monitoring.history.duration to 1d so you should keep only one day of monitoring data online.

Anyway, that's why your disk space is increasing.


(4b3l) #15

Thank you for your help and patience :grinning:


(David Pilato) #16

You're very welcomed.

Just a note. Let me highlight what I said previously as you might have troubles in the future:


(4b3l) #17

Is there a guideline of how many indices we should have or is there an article around this subject?


(David Pilato) #18

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing


(system) #19

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.