Unable to start Elasticsearch-YARN (Container is running beyond virtual memory limits)


#1

Hello,

I am having trouble starting Elasticsearch-YARN on a Hadoop cluster.
I am running on a 2 node (m3.large EC2 instances) cluster with hdp-2.1.2.0.

I am following the steps described on https://www.elastic.co/guide/en/elasticsearch/hadoop/current/ey-usage.html.
After I run the -start command, I end up with the following error :

Application application_1437578079095_0008 failed 2 times due to AM 
Container for appattempt_1437578079095_0008_000002 exited with  
exitCode: 143 due to: Container 
[pid=7620,containerID=container_1437578079095_0008_02_000001] is running
 beyond virtual memory limits. Current usage: 110.4 MB of 1 GB physical 
memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.

I am able to run this when I set up a pseudo distributed (single node) cluster on my local machine.

Any clues would be greatly appreciated.

Thanks,
Vivek


(Costin Leau) #2

There's no easy answer here. Basically the distro that you are using limits the containers to only 2GB of memory while in this case, the container starts with more.
By increasing the default maximum memory for the container, you should be able to solve this issue by editing some properties, typically in yarn-site.xml.

There are several options here; the HDP distro in fact has a doc page on this; it's basically revolves around the number of container needed and how much memory we want per container (note this is affected by the maximum amount).

Do note that HDP (and other distros) provide configuration tools to change these settings - in fact, I recommend doing so instead of manual editing since there might be multiple places where this occurs and the tools tends to work globally.

Last note, if possible, try to upgrade to a more recent distro - HDP 2.2 has been released for a while 2.3 is fairly fresh. Both provide improvements over the previous releases especially on the configuration front.

Hope this helps,


#3

Thanks for your response.

We've tried setting the container.mem option on the command line (during -install, -install-es, -start) and it doesn't seem to be changing the behavior in any way. We get the same output message as above.

Note that if we change the containers option (e.g. containers=2) it does get used or picked up (we see it in the log message).

Just to confirm, the Elasticsearch application manager doesn't use any of the mapred/mapreduce configuration settings (e.g. mapred.child.java.opts).

We are going to try changing the cluster configuration, but it will be important to understand how we can control the Elasticsearch application manager memory settings.

Thanks again for your continued help.

Vivek


(Costin Leau) #4

YARN doesn't allow all of its parameters to be configured by a client (which is what es-yarn is). In fact, in many cases these are global and affect all YARN instances and thus need to be configured separately through the dedicated configuration files or, as I mentioned, the WEB UI of your respective distro.
The behavior is expected - ES-YARN options are recognized, everything else is discarded.

Also between Hadoop versions, some params were introduced or their behaviour slightly changed hence why using the latest version is a good idea.

This is also the reason why the client doesn't currently allow parameters to be passed on - we could potentially do this but again it might confuse more than anything. However, I agree that it would be overall helpful hence why I raised an issue here.

It does not and it should not; that's a Map/Reduce specific setting, why should a long-lived container like ES rely on that feature? Moreover, this is used by the YARN container itself and not by a YARN consumer.


#5

Hi Costin,

Thanks for your detailed response.
For now, I can work around the issue by setting up properties in yarn-site.xml when setting up the cluster. I guess I just expected es-yarn to work without any tweaks given that HDP 2.1.x is a supported Elasticsearch platform.

Somewhat related to the above - I am running a 3 instance EC2 cluster (with HDP 2.1.2.0).

I installed Elasticsearch on YARN using es-yarn with containers=3.

Here's what I am noticing :

  • one EC2 instance with one Elasticsearch node running.
  • another EC2 instance with two Elasticsearch nodes running.
  • these two instances don't seem to be talking to each other (this is evidenced by my indexing ~ 9M documents which all end up on the first instance in one index with 5 shards on the same instance as well).
  1. Is this situation a result of not using the cloud-aws plugin ?
  2. If this were a normal cluster, then would the nodes discover each other automatically ?
  3. Lastly, just as we can pass in the number of containers to the Elasticsearch YARN cli, is it possible to also pass in the aws keys for the cloud-aws plugin (basically can all elasticsearch/plugin settings be overridden on the command line) ?

Thanks,
Vivek


(Costin Leau) #6

What options did you have to set? Potentially we can add those automatically when the container starts (do note there are some restrictions so this might not work).

  1. you need cloud-aws; this is just a matter of networking. The two instances don't see each other (private vs public IP). This is true of all clouds (hosted services) and affects all services (not just Elasticsearch)

  2. assuming your firewall allows it - then yes. For example, if you start multiple nodes on the same machine, they'll join the cluster since on localhost traffic is typically allowed.

  3. No, not at this point but we could extend the CLI to pass any given properties further to elasticsearch itself. can you please comment on the issue mention before in this thread?

Thanks,


#7

Hi Costin,

To get going I just disabled the YARN node manager virtual memory check in yarn-site.xml

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

But I've read that one could also bump up the value for the yarn.nodemanager.vmem-pmem-ratio property.

We are using the CLI to deploy to multiple platforms. In some cases it could be EC2 based so it's necessary for us to include the (necessary) plugins in the zip file - but we can't hard-code the AWS keys in the config (they aren't always the same). Hence the request to allow such properties to be passed in on the command line. Similarly having a way to pass in the cluster name, the number of shards and replicas would be useful.
Basically, as you mentioned above, extending the CLI to pass any given property would be extremely useful.

Thanks,
Vivek


(system) #8