Mapper turns out to be very slow

machong · May 15, 2018, 7:52am

I'm using eshadoop lib to do some data process work. the hadoop application seems to get stuck when the mapper progress reached 80%. I tried to add some debug message into mapper and found that the program is not blocked nor has any error but just running very slow(too slow to bear).

I found below message in GC Log（elasticsearch）, it guess it has some kind of correlation with this problem.

{Heap before GC invocations=0 (full 1):
par new generation total 18874368K, used 16777216K [0x0000000180000000, 0x0000000680000000, 0x0000000680000000)
eden space 16777216K, 100% used [0x0000000180000000, 0x0000000580000000, 0x0000000580000000)
from space 2097152K, 0% used [0x0000000580000000, 0x0000000580000000, 0x0000000600000000)
to space 2097152K, 0% used [0x0000000600000000, 0x0000000600000000, 0x0000000680000000)
concurrent mark-sweep generation total 5242880K, used 0K [0x0000000680000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 59486K, capacity 64396K, committed 64456K, reserved 1103872K
class space used 8006K, capacity 9715K, committed 9768K, reserved 1048576K
2018-05-15T15:18:43.846+0800: 14.899: [GC (Allocation Failure) 2018-05-15T15:18:43.846+0800: 14.899: [ParNew
Desired survivor size 1073741824 bytes, new threshold 15 (max 15)

age 1: 261086432 bytes, 261086432 total
: 16777216K->255372K(18874368K), 0.1813307 secs] 16777216K->255372K(24117248K), 0.1814428 secs] [Times: user=1.30 sys=0.26, real=0.18 secs]

any one can help to give some suggestion ?

machong · May 15, 2018, 7:59am

By the way, i'v try to enlarged the jvm heap of Elasticsearch to 25g, but it did't help

james.baiera · May 16, 2018, 6:31pm

Can you perform a heap dump? Perhaps that can highlight what is taking up all the space in eden

machong · May 17, 2018, 3:48am

heap dump as below, occupancy rate of eden space is not very high, but at this time the mapper get to run in rather too slow speed when the progress reaches 80 percentage.

S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 2.41 10.01 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 10.88 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 11.27 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 12.11 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 13.27 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 14.92 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 16.30 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 18.09 44.25 91.94 82.60 47 5.440 2 0.619 6.059
0.00 2.41 19.75 44.25 91.94 82.60 47 5.440 2 0.619 6.059

Hadoop app console out:

2018-05-17 11:41:44,767 INFO mapreduce.Job: Running job: job_1526364289182_0011
2018-05-17 11:41:51,941 INFO mapreduce.Job: Job job_1526364289182_0011 running in uber mode : false
2018-05-17 11:41:51,943 INFO mapreduce.Job: map 0% reduce 0%
2018-05-17 11:42:04,125 INFO mapreduce.Job: map 4% reduce 0%
2018-05-17 11:42:05,147 INFO mapreduce.Job: map 10% reduce 0%
2018-05-17 11:42:06,157 INFO mapreduce.Job: map 14% reduce 0%
2018-05-17 11:42:07,166 INFO mapreduce.Job: map 20% reduce 0%
2018-05-17 11:42:08,173 INFO mapreduce.Job: map 34% reduce 0%
2018-05-17 11:42:09,179 INFO mapreduce.Job: map 56% reduce 0%
2018-05-17 11:42:10,187 INFO mapreduce.Job: map 80% reduce 0%
2018-05-17 11:42:24,265 INFO mapreduce.Job: map 80% reduce 27%

machong · May 17, 2018, 5:43am

add a piece of jstack dump for reference:
"elasticsearch[node-1][search][T#3]" #144 daemon prio=5 os_prio=0 tid=0x00007f0e6c015ef0 nid=0x33ea4 runnable [0x00007f0e5811f000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings(SegmentTermsEnum.java:998)
at org.elasticsearch.search.slice.TermsSliceQuery.build(TermsSliceQuery.java:87)
at org.elasticsearch.search.slice.TermsSliceQuery.access$000(TermsSliceQuery.java:49)
at org.elasticsearch.search.slice.TermsSliceQuery$1.scorer(TermsSliceQuery.java:62)
at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:113)
at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:329)
at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:329)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:295)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:147)
at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:289)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:462)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:265)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:107)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:466)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:424)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:421)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:258)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:316)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:635)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

machong · May 17, 2018, 1:01pm

James,
i have a new discovery, the number of splits may have an impact on the performance of mapper.
When i make es.input.max.docs.per.partition times of what it was to decrease the number of splits,
my application had obvious increase in speed. However, the time it took is still too long for me, is there
any other way to make it faster?

james.baiera · May 24, 2018, 3:52pm

We've seen in some cases the slice feature can cause some slow downs. If you set es.input.max.docs.per.partition to a really high number, like max int, how does it do? Which version of the connector and ES are you running?

system · June 21, 2018, 3:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Production cluster slows down after 15-20 days of starting the services Elasticsearch	8	955	July 5, 2017
Debugging extremely slow indexing Elasticsearch	39	6561	February 16, 2021
ES - Peformance GC Elasticsearch	1	290	July 6, 2017
Elasticsearch and endless garbage collection Elasticsearch	1	405	July 6, 2017
ES going through slow gc Elasticsearch	2	785	July 5, 2017

Mapper turns out to be very slow

Related topics