Logstash or Spark -- Elasticsearch

Hi there!

I was searching on Google and this forum but I am still undecided.

I would like to develop a system which collects logs from several servers and then analyze them.

I thinked about Beats to collect logs (and maybe Logstash to parse logs or whatever) and Elastic as a centralized store. Then, I would use Spark for reading from ES and processing data to create some machine learning models.

But, during the previous search, some people use Spark to preprocess the data before to write into ES. Why do not they use Logstash for that? Is Spark better than Logstash?

Is there any way to know a possible bottleneck in this kind of steps?

Welcome!

Very good. Elasticsearch has built-in ingest pipelines I'd recommend instead of adding Logstash.

Just note that we do have a built-in machine learning feature (available on cloud or with a commercial license) which does that out of the box to perform things like anomaly detection for example. So may be you don't need to reinvent the wheel.

I guess that Spark is more for computing things where Logstash is built to parse, enrich and load single events one by one. Logstash is an ETL. Spark is an analytics engine. Not the same purpose...

Thanks for your early reply,

Just note that we do have a built-in machine learning feature (available on cloud or with a commercial license) which does that out of the box to perform things like anomaly detection for example. So may be you don't need to reinvent the wheel.

Yes, I used it during the 30 days trial license, but I would like to use Spark (or others) for more flexibility. I mean, I would like to have the possibility to get the data out from ES.

I guess that Spark is more for computing things where Logstash is built to parse, enrich and load single events one by one. Logstash is an ETL. Spark is an analytics engine. Not the same purpose...

Ummm, I am overwhelmed... It is not clear for me what I have to use. I know there is a Spark-ES connector and I tried it yesterday. Is not it the best solution? :frowning:

I don't know it. Probably that if you want to run a "Spark job", that's the best option.
I'd probably start from Apache Spark support | Elasticsearch for Apache Hadoop [7.12] | Elastic