hi:
I am runing rally benchmark-only indexing a 365G(1 billions docs) dataset,
Racing on track [geonames], challenge [append-only-no-conflicts-while-searching] and car [external]
I check the docs count every 30 seconds,when document count reach 286309645 , the indexing-speed
reduce to 2000 docs/s from 9000 docs/s, and the cpu usage reduce to 300% from 1200% (12 core).
hard to tell, that could be anything from GC issues to the cluster simply merging segments. You should inspect the Elastcisearch logs for hints and also look at hot threads (no support for that in Rally).
If you use the defaults, then Rally starts Elasticsearch on port 39200, i.e. curl http://localhost:39200/_nodes/hot_threads should do the trick. If that does not reveal anything (but I'm pretty sure it reveals the cause), you can check the GC logs e.g. by starting Rally with the --telemetry=gc option.
What kind of hardware are you running on? Is there any change in disk IO patterns between the start of the run when speed is good and when it slows down? Is the slowdown gradual or sudden? What does the _cat/indices API report once it has slowed down?
I am sure that there no change in disk IO during running rally.
The indexing-speed reduce from 9000docs/s to 5000dos/s gradually and suddenly reduce to 2000docs/s at 280000000+ docs and I have try it twice and the problem reappeared.
5 shards 0 replica, and the health shoud be green (I will check it again at Monday)
Hi Daniel:
Now docs reach 0.45 billion , and the indexing-speed reduce to 400 docs/s .
I list top 30 threads: es.nodes.hot_threads(ignore_idle_threads=False,threads=40,timeout="30s")
the hot_threads info (now):
11.0% (54.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#8]'
9.5% (47.2ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][0]: Lucene Merge Thread #41502]'
8.4% (41.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#9]'
7.9% (39.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][2]: Lucene Merge Thread #41431]'
7.2% (36.1ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#3]'
7.1% (35.3ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][0]: Lucene Merge Thread #41390]'
6.6% (33ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#1]'
6.6% (32.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#5]'
6.6% (32.8ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][4]: Lucene Merge Thread #41313]'
6.6% (32.8ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#12]'
6.2% (30.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#17]'
5.9% (29.2ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][3]: Lucene Merge Thread #41549]'
5.8% (28.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#18]'
5.7% (28.6ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][1]: Lucene Merge Thread #41493]'
5.7% (28.5ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][refresh][T#1]'
5.6% (27.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][2]: Lucene Merge Thread #41430]'
5.5% (27.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#4]'
5.5% (27.5ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][search][T#27]'
5.5% (27.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][0]: Lucene Merge Thread #41512]'
5.3% (26.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][3]: Lucene Merge Thread #41550]'
5.3% (26.6ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#6]'
5.3% (26.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#7]'
5.0% (25.1ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][search][T#18]'
5.0% (24.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#10]'
4.9% (24.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#2]'
4.8% (24.1ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#19]'
4.8% (23.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][4]: Lucene Merge Thread #41327]'
the hot_threads info used to be (just start):
51.4% (257ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#5]'
50.3% (251.6ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#3]'
49.3% (246.4ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][2]: Lucene Merge Thread #71]'
47.4% (237.1ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#4]'
46.4% (232ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#23]'
45.2% (226ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#17]'
44.7% (223.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#6]'
44.4% (221.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#22]'
44.2% (220.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#15]'
44.1% (220.6ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][1]: Lucene Merge Thread #62]'
43.9% (219.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#14]'
40.5% (202.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#1]'
38.3% (191.3ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][1]: Lucene Merge Thread #70]'
34.5% (172.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#16]'
34.1% (170.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#20]'
33.3% (166.4ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#18]'
33.0% (164.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#13]'
32.4% (161.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][3]: Lucene Merge Thread #70]'
31.2% (156.2ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#9]'
31.2% (155.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#10]'
29.9% (149.4ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#8]'
29.7% (148.6ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#2]'
24.9% (124.2ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#7]'
24.3% (121.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#19]'
20.3% (101.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#24]'
20.2% (100.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#12]'
19.6% (97.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#11]'
17.4% (86.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#21]'
15.2% (75.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][0]: Lucene Merge Thread #70]'
Is this based on monitoring data? Based on your hot threads it looks like Elasticsearch is spending a fair amount of time merging Lucene segments. What type of storage do you have?
For maximum indexing performance you should set the refresh_interval to a larger value in order to reduce the merging activity. Set it to 10 or 30 to see if it makes any difference. This will however increase the time it takes for records to become searchable.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.