Too many Lucene merge threads while indexing

pokaleshrey · June 21, 2019, 7:44am

We are doing indexing on 2 indices with one shard and a replica each.
Avg size of document is 15Kb. Total documents=35million
ES version 7.1.1
2 data nodes, 3 master and 2 coord nodes.
refresh_interval is disabled.
Using bulk requests (BulkProcessor), the indexing is being done.

Query
The index performance seems too slow. On profiling i saw that there are about 400 lucene merge threads for each index doing nothing. Just either thread among them is active and trying to merge segments.
The write threads seem to steadily write the data.

Any pointers what is happening and how can i achieve more speedy indexing ?

Christian_Dahlqvist · June 21, 2019, 7:59am

What is the specification of your cluster? What type of storage are you using? have you optimized your mappings? Have you followed these guidelines? What does CPU usage and disk I/O look like while you are indexing? What indexing throughput are you seeing?

pokaleshrey · June 21, 2019, 9:33am

Anything specific ? 3master, 2 coord, 2 data.

RAID-0

max_num_segments=1 is something that i have not tried. Rest is taken care of.

Number of replicas is 1 while indexing.
Disabling swapping not done
Rest is taken care of.

Average CPU utilization of data nodes is 60%
Disk I/O write throughput is 300 Ops/sec and writing at 45,000 KB/sec

2000 docs/sec are being indexed

Query
How to summarize all above to related with the number of lucene merge threads being created ?

Christian_Dahlqvist · June 21, 2019, 10:02am

I meant hardware specification. Sorry this was not clear.

What type of disks? Local SSDs?

Elasticsearch should always run with swap disabled. Do you have any other processes running on the Elasticsearch nodes that could interfere?

pokaleshrey · June 21, 2019, 10:19am

Yes. Masters are 4core16Gb machines, data are 16core64Gb, coord are 8core32Gb

AWS EBS

No other process running that can interfere. One ES on each node only.

martinr_ubi · June 21, 2019, 10:26am

--EDIT Was writing while you guys continued, sorry for stuff now out of context--

Node specs is hardware specs, what's the hardware profile of your nodes. CPU/core count, RAM size, JVM HEAP setting, etc.
Are the disks SSD?
How many fields do you have in each of your documents?
Do the the fields names vary per document? (Different fields in different documents or all document have the same schema)
How many total field in the indices, visible in your index mapping or by looking at the index pattern in Kibana.

I've had problems in the past with merge threads when I was suffering from field count explosion, I'm not guru enough to know if there is a true technical relationship there but I remember that's what I was seeing. Way too many fields in the same index lead to apparent lucene merge threads problems and what ES calls index throttling.

I don't remember if the threads were idle though, and if I remember correctly there was CPU contention in my case.

2000/sec ok but are you receiving back pressure while you're indexing? What's to say that your bottleneck is not in the client(s) and that they are simply not even trying to push faster?
If you are not finding any contention on your ES nodes, CPU RAM DISK IO, what makes you think that adding client(s) also trying to send bulk requests would not result in more doc/sec?

You could also try to provide more technical details like dumps of the info you're referencing. I don't actually know what to think of your:

On profiling i saw that there are about 400 lucene merge threads for each index doing nothing. Just either thread among them is active and trying to merge segments.
The write threads seem to steadily write the data.

Because it's a narration instead of being a properly formatted output of a command in something like a gist. When looking for technical help, you have a better chance if you post technical questions with technical data, still not a guarantee . Not sharing hardware spec is also an example, a pentium 166 with 2 spinning disk from 1994 in raid-0 is what I picture you're running since you didn't say. This goes deep, document example, index mapping, field count, what you tried, results obtained... "I added clients and couldn't past 2000doc/s, I added shards and ..., I added nodes and ... "

Hoping this helps,

pokaleshrey · June 21, 2019, 10:55am

Thanks for your suggestion regarding lucene threads.

I understand the more technical details you mention the more things can be looked into by the one who could be answering.
But there is another angle to it. If you give the problem statement that is not too big, some ES experts have a unique ability to ask questions targeting correct technical area, because of their experience. That leads to a faster answering time and time is not wasted
Excuse me, but I would like to differ from your approach of giving plethora of technical details in initial problem statement, some of which might not be of any use.

Nevertheless, that is a separate topic, out of context as you have rightly mentioned. Period.

system · July 19, 2019, 10:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Throttle merges Elasticsearch	5	3950	July 5, 2017
High CPU usage because lucene merge thread Elasticsearch	6	1621	July 15, 2021
ElasticSearch high CPU on merge threads Elasticsearch	8	2593	July 5, 2017
CPU utilization has been increased by 40% in ES 2.1.1 when compared with ES 1.6 Elasticsearch	7	1312	July 5, 2017
High Disk Read I/O in Elasticsearch Elasticsearch	1	2420	November 14, 2017

Too many Lucene merge threads while indexing

Related topics