Get documents based on some filter conditions

Khushboo_Kaul · January 26, 2022, 8:35am

Hi,
I am trying to use spark ( pyspark) and my goal is to query from the huge Elasticsearch index a subset of data that matches a particular condition of column.
For example, if my columns are - name, age, timestamp
I want to get only those records where timestamp matches the current timestamp.

Could someone please assist on how to achieve this?

Keith_Massey · January 26, 2022, 11:50pm

Hi @Khushboo_Kaul. The best way to get started is probably with spark-sql. One way is to create a temporary table from your index, and then run ordinary SQL queries on it:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
sqlContext.sql("CREATE TEMPORARY TABLE myTable USING org.elasticsearch.spark.sql OPTIONS (resource 'my_index')")
sqlContext.sql("select * from myTable").show()

Once you have that working you can add a where clause for the timestamp field like you would for any ordinary SQL query.

system · February 25, 2022, 11:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1587	March 2, 2017
Spark code to get select firelds from ES Elasticsearch es-hadoop	3	1944	November 1, 2017
ES query from spark returns all despite of filter Elasticsearch es-hadoop	9	665	November 22, 2022
Having trouble executing isin query from Pyspark to elastic Elasticsearch es-hadoop	1	939	April 23, 2018
Reading elasticsearch data using spark SQL is too slow Elasticsearch es-hadoop	1	702	April 3, 2020

Get documents based on some filter conditions

Related topics