Elasticsearch 5 stops during long indexing

Luca_Mastrostefano · May 12, 2017, 10:36am

Hi all,

I'm indexing 2B small documents on a single node with ES 5.4.

Documents are organized in ~3K indexes, for a total of 2TB. The indexes' occupancy varies from few KB to hundreds of GB and they are sharded in order to keep each shard under 5GB.

I'm indexing with 14 threads concurrently on different indexes with bulk requests of 2K documents each. The server has 16CPU and 32GB of RAM and no reads are performed during this indexing process.

Relevant ES configurations are:

Xms12g -Xmx12g in jvm.options
bootstrap.memory_lock:true
indices.memory.index_buffer_size: 95%
MAX_OPEN_FILES=1000000

After 50~100M indexed documents ES starts to slow down drastically.

In the logs I read:

[2017-05-11T16:44:24,751][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2141] overhead, spent [13.5s] collecting in the last [14.3s]
[2017-05-11T16:44:40,323][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2143][337] duration [14s], collections [1]/[14.5s], total [14s]/[54.2m], memory [11.8gb]->[11.4gb]/[11.8gb], all_pools {[young] [865.3mb]->[533.8mb]/[865.3mb]}{[survivor] [75mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:44:40,323][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2143] overhead, spent [14s] collecting in the last [14.5s]
[2017-05-11T16:44:53,004][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2145][338] duration [10.8s], collections [1]/[11.6s], total [10.8s]/[54.4m], memory [11.8gb]->[11.4gb]/[11.8gb], all_pools {[young] [865.3mb]->[538.9mb]/[865.3mb]}{[survivor] [44.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:44:53,004][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2145] overhead, spent [10.8s] collecting in the last [11.6s]
[2017-05-11T16:45:05,141][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2147][339] duration [11s], collections [1]/[11.1s], total [11s]/[54.6m], memory [11.8gb]->[11.4gb]/[11.8gb], all_pools {[young] [865.3mb]->[558.3mb]/[865.3mb]}{[survivor] [103.1mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:05,142][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2147] overhead, spent [11s] collecting in the last [11.1s]
[2017-05-11T16:45:19,928][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2149][340] duration [13s], collections [1]/[13.7s], total [13s]/[54.8m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[570.1mb]/[865.3mb]}{[survivor] [48.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:19,928][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2149] overhead, spent [13s] collecting in the last [13.7s]
[2017-05-11T16:45:35,926][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2152][341] duration [13.7s], collections [1]/[13.8s], total [13.7s]/[55.1m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[575mb]/[865.3mb]}{[survivor] [104.6mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:35,931][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2152] overhead, spent [13.7s] collecting in the last [13.8s]
[2017-05-11T16:45:49,919][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2154][342] duration [12.8s], collections [1]/[12.9s], total [12.8s]/[55.3m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[577.1mb]/[865.3mb]}{[survivor] [102.3mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:49,919][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2154] overhead, spent [12.8s] collecting in the last [12.9s]
[2017-05-11T16:46:03,976][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2156][343] duration [12.1s], collections [1]/[13s], total [12.1s]/[55.5m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[601mb]/[865.3mb]}{[survivor] [50.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}

Looking at the end dates of the GC, they all are quite the same of the start dates of the next GC. These GCs have a duration of more than 10seconds and it seems that they are blocking ES to prevent Java from going out of memory. As soon as ES halts for GC, the bulk requests go in timeout (30 seconds or more) and the indexing process fails.

With ES 2.4 I hadn't this problem. I'm using ES 5.4 because I read that now it splits better the heap among the indexes.

Am I doing anything wrong? What should I change in order to keep the performance high during the entire indexing process?

PS: the same question is posted on Stackoverflow: Elasticsearch 5 stops during long indexing

Thanks!

system · June 9, 2017, 10:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JVM pause,Elasticsearch stop Elasticsearch	3	2507	July 5, 2017
ES hangs while indexing(very slow indexing) Elasticsearch	2	1235	July 5, 2017
Elasticsearch bulk slows down after a certain amount of documents Elasticsearch	4	1363	April 24, 2020
ES searching and indexing speed reduced after processing 600milion records Elasticsearch	10	956	July 6, 2017
Bulk indexing stalls (7.2.0) Elasticsearch	11	2136	July 25, 2019

Elasticsearch 5 stops during long indexing

Related topics