We are currently trying to decide which architecture we should use for a new system&application logs analyzing ITOA application.
Firstly we decided to use pure ELK stack, however as we researched further, we decided to use Hadoop ecosystem in our design in order to increase our analytical capabilities and opportunuties.
Here is what we think:
Logstash will forward the raw log data and maybe some pre-filter to HDFS (on Isolon). Logs will be queriable from Spark. We'll apply Mlib and other extra features too. Then we will forward indexed data to ES and visualize it in Kibana.
Here are our questions:
Do you recommend keeping ES indexes on HDFS too, or should we write them on NFS? I heard one Spark contributer stated that putting ES indexes on HDFS is not the best thing to do. Is this still valid?
Is logstash filtering, indexing (grok) and sending to ES scenario similar with putting raw data on HDFS, index it with Spark and send to ES scenario? Should we and can we apply all those logstash filters&indexes capabilities on Spark and send it to ES like logstash can? Is this approach better and faster for ES indexing as described here: es-hadoop question or is there another scenario about Spark integration that we are not aware of? How can ES-Hadoop reduces the amount of indexing that should be performed on ES and index directly from Spark data structures to ES. What differs with respect to logstash grok filtering and indexing to ES directly?
What is the advantage of backuping up ES on HDFS?
Should we seperate Spark and ES servers or can they reside on the same server?
I hope that you can provide some insight about this architecture and guide us. As you may notice, we are new and a bit confused about the concepts and any guidence will be very much appretiated.