i am using pyspark to query elstic search
i have the folowing code
d1 = data_frame1.collect() # return 1 row
d2 = data_frame2.collect() # return no rows
but when i call
d3 data_frame1.union(data_frame2).collect()
the code us stuck
the collection i am quering has 20 rows
the schema of data_frame1 and data_frame2 is the same
when i call collect on each it returns correct result
but when i merge them and call collect it stuck
i am using elasticsearch-spark-30_2.12-8.9.0
Elasticsearch 8.9.2
It seems odd that it would get stuck on such a small amount of data. Have you checked the logs (the logs for the driver as well as the executors running tasks)?
thanks for the reply
i checked the logs of spark and i dont see any errors
its just get stuck on a single task
is there anything i should be looking for?
where do i look for errors?
Did you check the task logs on each spark executor? That's where I would guess you would see something, but it is hard to know. You could also check the elasticsearch logs, but I wouldn't think that an elasticsearch error would cause spark to hang.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.