Hi,
I am trying to use spark ( pyspark) and my goal is to query from the huge Elasticsearch index a subset of data that matches a particular condition of column.
For example, if my columns are - name, age, timestamp
I want to get only those records where timestamp matches the current timestamp.
Could someone please assist on how to achieve this?
Hi @Khushboo_Kaul. The best way to get started is probably with spark-sql. One way is to create a temporary table from your index, and then run ordinary SQL queries on it:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
sqlContext.sql("CREATE TEMPORARY TABLE myTable USING org.elasticsearch.spark.sql OPTIONS (resource 'my_index')")
sqlContext.sql("select * from myTable").show()
Once you have that working you can add a where clause for the timestamp field like you would for any ordinary SQL query.