The problem about runing a big dataset

a943013827 · December 23, 2016, 2:27am

hi:
I am runing rally benchmark-only indexing a 365G(1 billions docs) dataset,
Racing on track [geonames], challenge [append-only-no-conflicts-while-searching] and car [external]

I check the docs count every 30 seconds,when document count reach 286309645 , the indexing-speed
reduce to 2000 docs/s from 9000 docs/s, and the cpu usage reduce to 300% from 1200% (12 core).

what can be cause? thank you.

a943013827 · December 23, 2016, 2:59am

the cpu usage of esrally

danielmitterdorfer · December 23, 2016, 8:36am

Hi @a943013827,

hard to tell, that could be anything from GC issues to the cluster simply merging segments. You should inspect the Elastcisearch logs for hints and also look at hot threads (no support for that in Rally).

If you use the defaults, then Rally starts Elasticsearch on port 39200, i.e. curl http://localhost:39200/_nodes/hot_threads should do the trick. If that does not reveal anything (but I'm pretty sure it reveals the cause), you can check the GC logs e.g. by starting Rally with the --telemetry=gc option.

Daniel

a943013827 · December 24, 2016, 10:33am

Hi Daniel,
I check the hot_threads,
Normal time :

Abnormal time：

The problem may be ...?

Christian_Dahlqvist · December 24, 2016, 5:24pm

What kind of hardware are you running on? Is there any change in disk IO patterns between the start of the run when speed is good and when it slows down? Is the slowdown gradual or sudden? What does the _cat/indices API report once it has slowed down?

a943013827 · December 25, 2016, 1:49am

Hi Christian_Dahlqvist :

hardware : cpu(12core) disk(2T remain).

I am sure that there no change in disk IO during running rally.

The indexing-speed reduce from 9000docs/s to 5000dos/s gradually and suddenly reduce to 2000docs/s at 280000000+ docs and I have try it twice and the problem reappeared.

5 shards 0 replica, and the health shoud be green (I will check it again at Monday)

a943013827 · December 26, 2016, 8:36am

Hi Daniel:
Now docs reach 0.45 billion , and the indexing-speed reduce to 400 docs/s .
I list top 30 threads:
es.nodes.hot_threads(ignore_idle_threads=False,threads=40,timeout="30s")

the hot_threads info (now):

   11.0% (54.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#8]'
    9.5% (47.2ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][0]: Lucene Merge Thread #41502]'
    8.4% (41.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#9]'
    7.9% (39.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][2]: Lucene Merge Thread #41431]'
    7.2% (36.1ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#3]'
    7.1% (35.3ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][0]: Lucene Merge Thread #41390]'
    6.6% (33ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#1]'
    6.6% (32.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#5]'
    6.6% (32.8ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][4]: Lucene Merge Thread #41313]'
    6.6% (32.8ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#12]'
    6.2% (30.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#17]'
    5.9% (29.2ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][3]: Lucene Merge Thread #41549]'
    5.8% (28.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#18]'
    5.7% (28.6ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][1]: Lucene Merge Thread #41493]'
    5.7% (28.5ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][refresh][T#1]'
    5.6% (27.9ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][2]: Lucene Merge Thread #41430]'
    5.5% (27.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#4]'
    5.5% (27.5ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][search][T#27]'
    5.5% (27.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][0]: Lucene Merge Thread #41512]'
    5.3% (26.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][3]: Lucene Merge Thread #41550]'
    5.3% (26.6ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#6]'
    5.3% (26.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#7]'
    5.0% (25.1ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][search][T#18]'
    5.0% (24.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#10]'
    4.9% (24.4ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#2]'
    4.8% (24.1ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][bulk][T#19]'
    4.8% (23.7ms out of 500ms) cpu usage by thread 'elasticsearch[M-Twins][[geonames][4]: Lucene Merge Thread #41327]'

the hot_threads info used to be (just start):

   51.4% (257ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#5]'
   50.3% (251.6ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#3]'
   49.3% (246.4ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][2]: Lucene Merge Thread #71]'
   47.4% (237.1ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#4]'
   46.4% (232ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#23]'
   45.2% (226ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#17]'
   44.7% (223.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#6]'
   44.4% (221.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#22]'
   44.2% (220.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#15]'
   44.1% (220.6ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][1]: Lucene Merge Thread #62]'
   43.9% (219.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#14]'
   40.5% (202.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#1]'
   38.3% (191.3ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][1]: Lucene Merge Thread #70]'
   34.5% (172.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#16]'
   34.1% (170.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#20]'
   33.3% (166.4ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#18]'
   33.0% (164.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#13]'
   32.4% (161.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][3]: Lucene Merge Thread #70]'
   31.2% (156.2ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#9]'
   31.2% (155.7ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#10]'
   29.9% (149.4ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#8]'
   29.7% (148.6ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#2]'
   24.9% (124.2ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#7]'
   24.3% (121.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#19]'
   20.3% (101.5ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#24]'
   20.2% (100.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#12]'
   19.6% (97.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#11]'
   17.4% (86.8ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][bulk][T#21]'
   15.2% (75.9ms out of 500ms) cpu usage by thread 'elasticsearch[Toad][[geonames][0]: Lucene Merge Thread #70]'

Christian_Dahlqvist · December 26, 2016, 8:50am

Is this based on monitoring data? Based on your hot threads it looks like Elasticsearch is spending a fair amount of time merging Lucene segments. What type of storage do you have?

What is your index refresh_interval set to?

a943013827 · December 26, 2016, 9:48am

sorry，I misunderstood what you mean
the disk IO reduced ,but there is a large IO every 5-10 seconds.

All configurations are default. geonames's mapping，refresh_interval=1。

Christian_Dahlqvist · December 26, 2016, 10:00am

For maximum indexing performance you should set the refresh_interval to a larger value in order to reduce the merging activity. Set it to 10 or 30 to see if it makes any difference. This will however increase the time it takes for records to become searchable.

system · January 23, 2017, 10:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Benchmarking docs/s Elasticsearch rally	7	810	October 18, 2018
ElasticSearch high CPU on merge threads Elasticsearch	8	2593	July 5, 2017
CPU utilization is too low while indexing Elasticsearch rally	6	1200	May 24, 2019
No Median CPU Usage in my track results Elasticsearch rally	2	919	July 17, 2017
Need Help With Analyzing Rally Benchmark Report Elasticsearch rally	3	631	March 6, 2020

The problem about runing a big dataset

Related topics