Hey all, I thought I'd share an interesting experience I had in experimentation with Yarn, Mesos and MapR. In the es-yarn instructions it talks about container level storage vs. mounting NFS in HDFS (which has huge performance hits). Well, in MapRFS, that performance hit isn't NEARLY as much. It's quite performant actually (side note, if anyone would like to talk about how to gauge that better, I'd be all ears).
Basically, since on MapRFS (I am using a trial version of M5, this allows me to mount MapRFS to the same mount point on every node). I set the storage location to /mapr/mycluster/elasticsearch/esdata/${CONTAINER_ID}/data (I also put logs and tmp in the ${CONTAINER_ID} folder). This gives me a unique location for each container that spins up. At this point, each new container gets new data, so "persist" isn't the right word, but that's something I'll discuss later. It's persisted, but since the location is based on the container_id, each new container that spins up creates a new directory. (I do this by altering the elasticsearch start script in ./bin to read the ID and create the directories as needed).
Ok cool, so I loaded some date and I am getting decent performance, and that made me think:Ok, this is actually scalable, except for a couple of things.
The location using the container_id. If a container dies, and it is resurrected, it can't use the old data, thus recovery of containers is not really possible. What I think about here is would it be possible to have a setup where some variable like the Node Name when generated is registered in the application manager in such a way that for the duration of that application. It will ensure an instance of that node name is running (unless specifically removed). What I see here, say I have Node.Name Hulkling. When the Yarn Application starts, it should knows the name of the 6 nodes it wants to run. (it doesn't have to be from the Marvel Corpus, i could be as simple as esnode0, esnode1, esnode2) Thus, I can create data directories for them, and the application will start containers based on having the nodes running. It will access the data directory by name, if esnode2 fails or killed for some reason, the es-yarn application will try to restart, in a new container, esnode2, thus the data that was in the directory would be preserved. We would probably want the ability to name the cluster, so even if the the application needed to restart, we could access our data in ${clustername}/${nodename} and that data could persist and still be scalable. This would allow multi tenancy, and be resilient to failures. Where the nodes started wouldn't matter, we aren't using the hostname, we are using the nodename and cluster name.
So that's the first thing, the next thing we need is elasticity. Once an application is started, we need the ability to say "add more nodes" i.e. more nodes in containers to the already running es-yarn application (or scale down for that matter). Perhaps an API listener specific to the es-yarn system itself to accomplish this. I am less certain on this, and how it would be approached, but basically, this would give us the flexibility, especially as we want to grow and shrink the clusters. Since I am using Myriad with Mesos, yarn is actually pretty awesome here, because it can scale down and give the resources back to Mesos if needed. This a really neat setup for ultimate flexibility.
So on the first point, I know the running on the NFS mount may not see optimal. This is something that standard HDFS could face challenges with. That said, there may be ways to mount other filesystems via NFS and still get near the performance we are looking for, while adding the the multi tenancy, the elasticity, and such of running this system on Yarn and Mesos via Myriad. MapR has some advantages, but I want to frame this conversation in the context of a generic solution. My testing so far seems extremely promising. I'd highly value any questions or discussions on this topic, and happy to share my approaches thus far!
John