Spark querying on ELK Stack

Hello,

I currently have an ELK Stack cluster with the following nodes:

  • Logstash
  • Elasticsearch 1 (ES1)
  • Elasticsearch 2 (ES2)
  • Elasticsearch (orchestrator) + Kibana

I'm importing logs into Logstash, sending them to Elasticsearch nodes and creating visualizations in Kibana.

The problem is that I can't obtain some specific visualizations which require more complex data processing/querying. Since ES is not specially suited for complex data processing, I'm going to use Apache Spark to process that data.

I'm not familiar with Apache Spark so I'm unsure on how this might work but I'm planning on:

  • installing Apache Spark on a separate node
  • from the Spark node, retrieve data from ES1 and ES2
  • process that data with complex queries
  • send newly processed data to ES1 and ES2
  • retrieve data from ES1 and ES2 with Kibana

Is this approach possible?

Will there be any problem if related data for a query is not in the same ES node
e.g:

  • Shop: Walmart, Product: Milk, Price: 1€ <- Stored in ES1
  • Shop: Walmart, Product: Beer, Price: 2€ <- Stored in ES2

Will Spark be able to query to find the total price of the items in Shop Walmart?

I'm not sure I fully understand your question here: When you say ES1 and ES2, do you mean there are "two Elasticsearch clusters" or "two Elasticsearch clusters, one running v1 one running v2"? I don't see that much of an issue with reading data from a cluster, doing the enrichment and re-writing it back to a cluster. I will say that while you should be able to read the data from both clusters (you may need to union the resulting RDD's together) you will only be able to write to a single cluster at the end of your RDD processing (due to limitations of the connector).

Perhaps if you elaborated more on your use case I could provide some more help?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.