Spark querying on ELK Stack

Tomas_Law · January 19, 2018, 4:22pm

Hello,

I currently have an ELK Stack cluster with the following nodes:

Logstash
Elasticsearch 1 (ES1)
Elasticsearch 2 (ES2)
Elasticsearch (orchestrator) + Kibana

I'm importing logs into Logstash, sending them to Elasticsearch nodes and creating visualizations in Kibana.

The problem is that I can't obtain some specific visualizations which require more complex data processing/querying. Since ES is not specially suited for complex data processing, I'm going to use Apache Spark to process that data.

I'm not familiar with Apache Spark so I'm unsure on how this might work but I'm planning on:

installing Apache Spark on a separate node
from the Spark node, retrieve data from ES1 and ES2
process that data with complex queries
send newly processed data to ES1 and ES2
retrieve data from ES1 and ES2 with Kibana

Is this approach possible?

Will there be any problem if related data for a query is not in the same ES node
e.g:

Shop: Walmart, Product: Milk, Price: 1€ <- Stored in ES1
Shop: Walmart, Product: Beer, Price: 2€ <- Stored in ES2

Will Spark be able to query to find the total price of the items in Shop Walmart?

james.baiera · January 29, 2018, 4:44am

I'm not sure I fully understand your question here: When you say ES1 and ES2, do you mean there are "two Elasticsearch clusters" or "two Elasticsearch clusters, one running v1 one running v2"? I don't see that much of an issue with reading data from a cluster, doing the enrichment and re-writing it back to a cluster. I will say that while you should be able to read the data from both clusters (you may need to union the resulting RDD's together) you will only be able to write to a single cluster at the end of your RDD processing (due to limitations of the connector).

Perhaps if you elaborated more on your use case I could provide some more help?

system · February 26, 2018, 4:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple ES clusters in SparkSQL Elasticsearch es-hadoop	9	2905	July 6, 2017
Logstash or Spark -- Elasticsearch Elasticsearch es-hadoop	4	1787	May 4, 2021
Use cases Elasticsearch and Spark Elasticsearch es-hadoop	5	3387	July 6, 2017
Elasticsearch Spark plugin Elasticsearch	1	361	February 19, 2018
How to connect elasticsearch to apache spark streaming or apache storm? Elasticsearch es-hadoop	5	6990	July 6, 2017

Spark querying on ELK Stack

Related topics