I was searching on Google and this forum but I am still undecided.
I would like to develop a system which collects logs from several servers and then analyze them.
I thinked about Beats to collect logs (and maybe Logstash to parse logs or whatever) and Elastic as a centralized store. Then, I would use Spark for reading from ES and processing data to create some machine learning models.
But, during the previous search, some people use Spark to preprocess the data before to write into ES. Why do not they use Logstash for that? Is Spark better than Logstash?
Is there any way to know a possible bottleneck in this kind of steps?
Very good. Elasticsearch has built-in ingest pipelines I'd recommend instead of adding Logstash.
Just note that we do have a built-in machine learning feature (available on cloud or with a commercial license) which does that out of the box to perform things like anomaly detection for example. So may be you don't need to reinvent the wheel.
I guess that Spark is more for computing things where Logstash is built to parse, enrich and load single events one by one. Logstash is an ETL. Spark is an analytics engine. Not the same purpose...
Just note that we do have a built-in machine learning feature (available on cloud or with a commercial license) which does that out of the box to perform things like anomaly detection for example. So may be you don't need to reinvent the wheel.
Yes, I used it during the 30 days trial license, but I would like to use Spark (or others) for more flexibility. I mean, I would like to have the possibility to get the data out from ES.
I guess that Spark is more for computing things where Logstash is built to parse, enrich and load single events one by one. Logstash is an ETL. Spark is an analytics engine. Not the same purpose...
Ummm, I am overwhelmed... It is not clear for me what I have to use. I know there is a Spark-ES connector and I tried it yesterday. Is not it the best solution?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.