Get documents based on some filter conditions

Hi,
I am trying to use spark ( pyspark) and my goal is to query from the huge Elasticsearch index a subset of data that matches a particular condition of column.
For example, if my columns are - name, age, timestamp
I want to get only those records where timestamp matches the current timestamp.

Could someone please assist on how to achieve this?

Hi @Khushboo_Kaul. The best way to get started is probably with spark-sql. One way is to create a temporary table from your index, and then run ordinary SQL queries on it:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
sqlContext.sql("CREATE TEMPORARY TABLE myTable USING org.elasticsearch.spark.sql OPTIONS (resource 'my_index')")
sqlContext.sql("select * from myTable").show()

Once you have that working you can add a where clause for the timestamp field like you would for any ordinary SQL query.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.