Logstash or Spark -- Elasticsearch

alvgoro · April 6, 2021, 9:33am

Hi there!

I was searching on Google and this forum but I am still undecided.

I would like to develop a system which collects logs from several servers and then analyze them.

I thinked about Beats to collect logs (and maybe Logstash to parse logs or whatever) and Elastic as a centralized store. Then, I would use Spark for reading from ES and processing data to create some machine learning models.

But, during the previous search, some people use Spark to preprocess the data before to write into ES. Why do not they use Logstash for that? Is Spark better than Logstash?

Is there any way to know a possible bottleneck in this kind of steps?

dadoonet · April 6, 2021, 9:52am

Welcome!

Very good. Elasticsearch has built-in ingest pipelines I'd recommend instead of adding Logstash.

Just note that we do have a built-in machine learning feature (available on cloud or with a commercial license) which does that out of the box to perform things like anomaly detection for example. So may be you don't need to reinvent the wheel.

I guess that Spark is more for computing things where Logstash is built to parse, enrich and load single events one by one. Logstash is an ETL. Spark is an analytics engine. Not the same purpose...

alvgoro · April 6, 2021, 10:24am

Thanks for your early reply,

Just note that we do have a built-in machine learning feature (available on cloud or with a commercial license) which does that out of the box to perform things like anomaly detection for example. So may be you don't need to reinvent the wheel.

Yes, I used it during the 30 days trial license, but I would like to use Spark (or others) for more flexibility. I mean, I would like to have the possibility to get the data out from ES.

I guess that Spark is more for computing things where Logstash is built to parse, enrich and load single events one by one. Logstash is an ETL. Spark is an analytics engine. Not the same purpose...

Ummm, I am overwhelmed... It is not clear for me what I have to use. I know there is a Spark-ES connector and I tried it yesterday. Is not it the best solution?

dadoonet · April 6, 2021, 4:05pm

I don't know it. Probably that if you want to run a "Spark job", that's the best option.
I'd probably start from Apache Spark support | Elasticsearch for Apache Hadoop [8.11] | Elastic

system · May 4, 2021, 4:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash vs spark streaming and storm Logstash	3	9345	July 6, 2017
When should I use filebeat instead of logstash Beats filebeat	3	419	July 29, 2020
Architecture decision confusion Elasticsearch es-hadoop	3	947	December 27, 2016
About Logstash cloud service Logstash	9	269	February 11, 2021
Elasticsearch with log data and elastic stack Elasticsearch	8	437	June 1, 2018

Logstash or Spark -- Elasticsearch

Related topics