I'm using eshadoop lib to do some data process work. the hadoop application seems to get stuck when the mapper progress reached 80%. I tried to add some debug message into mapper and found that the program is not blocked nor has any error but just running very slow(too slow to bear).
I found below message in GC Log(elasticsearch), it guess it has some kind of correlation with this problem.
{Heap before GC invocations=0 (full 1):
par new generation total 18874368K, used 16777216K [0x0000000180000000, 0x0000000680000000, 0x0000000680000000)
eden space 16777216K, 100% used [0x0000000180000000, 0x0000000580000000, 0x0000000580000000)
from space 2097152K, 0% used [0x0000000580000000, 0x0000000580000000, 0x0000000600000000)
to space 2097152K, 0% used [0x0000000600000000, 0x0000000600000000, 0x0000000680000000)
concurrent mark-sweep generation total 5242880K, used 0K [0x0000000680000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 59486K, capacity 64396K, committed 64456K, reserved 1103872K
class space used 8006K, capacity 9715K, committed 9768K, reserved 1048576K
2018-05-15T15:18:43.846+0800: 14.899: [GC (Allocation Failure) 2018-05-15T15:18:43.846+0800: 14.899: [ParNew
Desired survivor size 1073741824 bytes, new threshold 15 (max 15)
age 1: 261086432 bytes, 261086432 total
: 16777216K->255372K(18874368K), 0.1813307 secs] 16777216K->255372K(24117248K), 0.1814428 secs] [Times: user=1.30 sys=0.26, real=0.18 secs]
heap dump as below, occupancy rate of eden space is not very high, but at this time the mapper get to run in rather too slow speed when the progress reaches 80 percentage.
add a piece of jstack dump for reference:
"elasticsearch[node-1][search][T#3]" #144 daemon prio=5 os_prio=0 tid=0x00007f0e6c015ef0 nid=0x33ea4 runnable [0x00007f0e5811f000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings(SegmentTermsEnum.java:998)
at org.elasticsearch.search.slice.TermsSliceQuery.build(TermsSliceQuery.java:87)
at org.elasticsearch.search.slice.TermsSliceQuery.access$000(TermsSliceQuery.java:49)
at org.elasticsearch.search.slice.TermsSliceQuery$1.scorer(TermsSliceQuery.java:62)
at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:113)
at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:329)
at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:329)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:295)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:147)
at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:289)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:462)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:265)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:107)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:466)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:424)
at org.elasticsearch.action.search.SearchTransportService$9.messageReceived(SearchTransportService.java:421)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:258)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:316)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:635)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
James,
i have a new discovery, the number of splits may have an impact on the performance of mapper.
When i make es.input.max.docs.per.partition times of what it was to decrease the number of splits,
my application had obvious increase in speed. However, the time it took is still too long for me, is there
any other way to make it faster?
We've seen in some cases the slice feature can cause some slow downs. If you set es.input.max.docs.per.partition to a really high number, like max int, how does it do? Which version of the connector and ES are you running?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.