What elasticsearch does when restarting a cluster?


#1

I have en elasticsearch cluster consisting 3 nodes, and a pretty big index (~1500 fields), and when it restarts, ether from a "clean" restart or forced shotdown, it takes a while to go back up, and it hoggs the JVM heap and indexing slow down, as described in feeds such as:


My question is not how to handle this issues, but to understand: What does elasticsearch tries to do when the cluser is restrating? what is happening behind the scenes that cause this slowdown and heavy memory usage?
To what configurations does it relate to?
I need to be able to explain the process to my peers and then think how to make the cluser restart not stop other process in the machine that also uses JVM heap memory.
Thanks!


(David Turner) #2

Elasticsearch is preparing everything it needs to serve searches and indexing requests, loading various data structures from disk and doing other preparatory work.

The threads you linked were nothing to do with startup, so I don't see how they're relevant here.

Elasticsearch runs in a JVM on its own. It does not share a heap with any other process.

However, let me turn the question around: why does the startup performance matter to you? A properly configured 3-node cluster only needs to restart from cold after a serious disaster affecting multiple nodes.


#3

Thanks for your answer.

Elasticsearch is preparing everything it needs to serve searches and indexing requests, loading 
various data structures from disk and doing other preparatory work

What preperations? reindexing? remapping?
Does it matter if the data currently stored in elastic is big in size/mapping?

I have two java process in my VM, one of them is elastic. When I push a large number of files to elastic (meaning: a large number of files (~1,000,000 at a time for a while) with a large number of fields), and I stop elastic an restart it to see what happens if the machine would crash, the second process running stop to a crawl (or even carshes completely) and elasticsearch log tells me that:

  1. now throttling indexing for shard segment writing can't keep up stop throttling indexing for shards
  2. JVM garbage collector reports on overhead, like in the post I linked too (I know that the posts themselves are not related to cluster restart, but the log messages are the same for me, when I restart my cluster).
  3. etc

Again, I'm not here to understad the perticular issues I face when I restart, I'm Just trying to understand what elasticsearch is going through when restarting, how many tasks does it has, and how the number of docs or fields in mapping effect the restart.


(David Turner) #4

No, it does not do any reindexing unless you ask it to. I do not know what you mean by remapping.

Yes, I would expect it to take longer to restart if it has more data and/or if the data are more complex.

You can find out what the Elasticsearch process is doing at any time using the hot threads API or by taking a thread dump with jstack. You can also increase the logging level to get more information in the logs.

I would expect a certain amount of GC activity when starting up a node - it's working as hard as it can trying to get the node back up as quickly as possible, so it will try and use all the resources that you've allowed it to use. This includes CPU, memory and I/O bandwidth. If it's a problem that it's affecting other processes then it sounds like you will need to isolate these processes better.


#5

Awsome.
Thank you for your time and patience!