Hello,
I currently have an ELK Stack cluster with the following nodes:
- Logstash
- Elasticsearch 1 (ES1)
- Elasticsearch 2 (ES2)
- Elasticsearch (orchestrator) + Kibana
I'm importing logs into Logstash, sending them to Elasticsearch nodes and creating visualizations in Kibana.
The problem is that I can't obtain some specific visualizations which require more complex data processing/querying. Since ES is not specially suited for complex data processing, I'm going to use Apache Spark to process that data.
I'm not familiar with Apache Spark so I'm unsure on how this might work but I'm planning on:
- installing Apache Spark on a separate node
- from the Spark node, retrieve data from ES1 and ES2
- process that data with complex queries
- send newly processed data to ES1 and ES2
- retrieve data from ES1 and ES2 with Kibana
Is this approach possible?
Will there be any problem if related data for a query is not in the same ES node
e.g:
- Shop: Walmart, Product: Milk, Price: 1€ <- Stored in ES1
- Shop: Walmart, Product: Beer, Price: 2€ <- Stored in ES2
Will Spark be able to query to find the total price of the items in Shop Walmart?