I am working on building a distributed real time cluster system to supervise and analyze a network. I did several researches on internet and I came out with few technologies:
-for real time processing : logstash, storm and apache streaming
-for storage: elasticsearch
-for analysis: Apache Spark over Hadoop (I will use ES-Hadoop to connect with Elasticsearch)
-for data visualization: kibana, D3js, c3js
However, logstash is not often mentioned as spark streaming and storm. I found in internet the following architecture presented in the below picture:
I have two questions:
- I don't understand why logstash is not often mentioned as a real-tim processing system like spark streaming and storm. What are the main reasons ? I hav been using it and it is very powerful..
2)Regarding the Analyze part, can I use the machine learning librairies in that configuration ?
Thank you in advance.