How to run elasticsearch with yarn?


(Julian Zhang) #1

My hadoop version is 2.5.0

Follow the guide http://www.elastic.co/guide/en/elasticsearch/hadoop/2.1.Beta/ey-usage.html

I had down load the elasticsearh-hadoop-2.1.0beta4.zip.

I run the hadoop jar elasticsearh-yarn-2.1.0beta4.jar as bellow.

hadoop jar elasticsearh-yarn-2.1.0beta4.jar -download-es 
hadoop jar elasticsearh-yarn-2.1.0beta4.jar -install-es
hadoop jar elasticsearh-yarn-2.1.0beta4.jar -install

The elastic-1.5.2 had been downloaded.
The elastic-1.5.2.zip and elasticsearh-yarn-2.1.0beta4.jar had been upload to hdfs://host:port/apps/elasticsearch.

When I start the es with yarn, the output is fine.

hadoop jar /tmp/elasticsearch-hadoop-2.1.0.Beta4/dist/elasticsearch-yarn-2.1.0.Beta4.jar -start
15/05/13 16:50:09 INFO client.RMProxy: Connecting to ResourceManager at hdp1/10.80.3.231:8032
15/05/13 16:50:11 INFO impl.YarnClientImpl: Submitted application application_1431506015778_0003
Launched a 1 node Elasticsearch-YARN cluster [application_1431506015778_0003@http://xxxx:8088/proxy/application_1431506015778_0003/] at ......

But when I check the status later, the app is failed.

hadoop jar /tmp/logsystem/elasticsearch-hadoop-2.1.0.Beta4/dist/elasticsearch-yarn-2.1.0.Beta4.jar -status
INFO client.RMProxy: Connecting to ResourceManager at hdp1/10.80.3.231:8032
Id                              State       Status     Start Time         Finish Time        Tracking URL
application_1431506015778_0001  FINISHED    FAILED     15-5-13 PM 4:36     15-5-13 下午4:37     http://xxx:8088/proxy/application_1431506015778_0001/A
application_1431506015778_0002  FINISHED    FAILED     15-5-13 PM 4:49     15-5-13 下午4:49     http://xxxx:8088/proxy/application_1431506015778_0002/A

I had check the log with yarn, the AM's log is blow

15/05/13 16:50:18 INFO am.ApplicationMaster: Starting ApplicationMaster...
15/05/13 16:50:19 INFO client.RMProxy: Connecting to ResourceManager at hdp1/10.80.3.231:8030
15/05/13 16:50:19 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
15/05/13 16:50:19 INFO am.EsCluster: Allocating Elasticsearch cluster with 1 nodes
15/05/13 16:50:24 INFO impl.AMRMClientImpl: Received new token for : hdp4:8041
15/05/13 16:50:26 INFO am.EsCluster: About to launch container for command: [{{SHELL}} elasticsearch-1.5.2.zip/elasticsearch-1.5.2/bin/elasticsearch 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr]
15/05/13 16:50:26 INFO impl.ContainerManagementProtocolProxy: Opening proxy : hdp4:8041
15/05/13 16:50:26 INFO am.EsCluster: Started container Container: [ContainerId: container_1431506015778_0003_01_000002, NodeId: xxxx:8041, NodeHttpAddress: hdp4:8042, Resource: <memory:2048, vCores:1>, Priority: -1, Token: Token { kind: ContainerToken, service: XXXX:8041 }, ]
15/05/13 16:50:26 INFO am.EsCluster: Fully allocated 1 containers
**15/05/13 16:50:31 WARN am.EsCluster: Container container_1431506015778_0003_01_000002 exited with an invalid/unknown exit code...**
15/05/13 16:50:31 WARN am.EsCluster: Cluster has not completed succesfully...
15/05/13 16:50:31 INFO am.EsCluster: Cluster has completed running...
15/05/13 16:50:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : hdp4:8041
15/05/13 16:50:46 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.

In the nodemanager's log, the only error info is exitcode 3

2015-05-13 16:50:26,083 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1431506015778_0003_01_000002 is : 3
2015-05-13 16:50:26,084 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1431506015778_0003_01_000002 and exit code: 3
ExitCodeException exitCode=3:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
2015-05-13 16:50:26,085 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.

My questions is :

1,Does the es-hadoop 2.1.0beta4 and es 1.5.2 can run with hadoop-2.5.0?

2,Do I need to rebuild the jar with hadoop-2.5.0?

3,What is the exitcode 3 mean?

4,How can I get more information about the error in the Container?


(Costin Leau) #2

First off, thanks for the detailed (and formatted) post. Nothing really stands out - it looks like there's a bug and potentially issue with the environment.
Exitcode 3 is associated typically with a path not found (at least on Windows) and as you pointed out the logs are quite cryptic about what has failed.
The YARN integration has been tested on Hadoop 2.4 and 2.6 - not on 2.5 as it is not marked as stable and sometimes the APIs tend to change. To answer your questions:

  1. It should work
  2. No, there's no need. Of course, you can try but again the binary should work on Hadoop 2.2 and up, typically 2.4 is recommended
  3. It depends on the environment - if I recall correctly, 3 indicates the path is not found but I can't find right now any material indicating this.
  4. That's a good question - outside the logs from the web UI, I'm not aware of any way to tell Hadoop to not delete the container in case of a failure. Even if the shell execution command succeeds, if the command itself has an issue (anything from non enough memory, Java is not present, disk is full), the logs of the app themselves are deleted

As such, currently I would recommend trying to start Elasticsearch 1.5.2 on one of the target Elasticsearch nodes in a similar fashion, outside YARN and see whether Elastic starts or not. If the JVM version is old, Elastic might give an error message and bail out which might be the case here.


(Julian Zhang) #3

Dear Costin:

You are right, the problem is the jvm's version. When I install the new jre the ES can run in yarn now.

But I had some other questions:

How can I change the ES's config such as cluster name?
Do I need change the config file and rebuild a zip file to hdfs?

Does ES-yarn support dynamically scalable now? Just like auto start more nodes when more query coming.

thanks


(Costin Leau) #4

Glad to hear you sorted out.
Unfortunately, I'm not aware of any way to help with provisioning on Yarn
so the solution is really to update the zip (with whatever configuration or
plugins you want) and use that instead.

As for the dynamic part, there's nothing built in currently. You should be
able to start new nodes and these will automatically join any existing
cluster.
Note that when it comes to stateful services (whether it is elasticsearch
or not) care should be taken since the data/state needs to be redistributed
to take advantage of the new nodes. Without it, the extra resources
wouldn't make a real difference.


(Julian Zhang) #5

Dear Costin

I had a new problem, the ES can't write the log in yarn model.
I can get the stderr from yarn's container log.

In my opinion the best way to fix the problem is let ES write log to hdfs.
But is there any document to help me do the setting.

root # more stderr
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /yarn/nm/usercache/root/appcache/application_1431583395001_0004/contain
er_1431583395001_0004_01_000002/elasticsearch-1.5.2.zip/elasticsearch-1.5.2/logs/elasticsearch.log (Pe
rmission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(Unknown Source)
at java.io.FileOutputStream.(Unknown Source)
at java.io.FileOutputStream.(Unknown Source)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223
)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:440)
at org.elasticsearch.common.logging.log4j.LogConfigurator.configure(LogConfigurator.java:109)
at org.elasticsearch.bootstrap.Bootstrap.setupLogging(Bootstrap.java:100)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:184)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
log4j:ERROR Either File or DatePattern options are not set for appender [file].
log4j:ERROR setFile(null,true) call failed.


(Costin Leau) #6

If you want ES or any other service to write to HDFS, you can expose HDFS through its official NFS bridge as a local partition/fs and point Elastic to that. The write exception typically occurs due to incorrect permissions - YARN is starting Elastic using a different user/group (typically under the Hadoop umbrella) and likely the log you are referring to has a different set of permissions (likely root).
This can be sorted out by making sure the same permissions/groups are used across the board.
Note that HDFS wouldn't really solve the problem since all nodes will log to the same file which likely will be damaged or overwritten constantly and in the end, would not really help.

I'm thinking of adding an option to disable log redirection of the console as an option for YARN so that the immediate error message, and potentially the log permission itself, can be disabled while things are diagnosed.


(system) #7