(Spark) Write data with new timestamp and then delete all with old timestamp


(Denis Shvedov) #1

Hello, everybody!

I have some test RDD with case class Product (tokens : String, count : Long, timestamp : Long,id :Int) in Spark-Scala.
timestamp - System.currentTimeMillis()
I wrote it to elasticsearch, but later i'll need to update this data and write new rdd.
My problem is that I don't know, how to delete old data by field timestamp (using Spark ElasticSearch).This is part of online application, so I can't just delete old data and then write new.

I could't find any information in documentation.
Could someone help me?


(James Baiera) #2

@c4c8f8b3048c927d1e54 The ES-Hadoop connector does not currently support delete operations. Elasticsearch doesn't support deletes by fields other than the id out of the box. There are plugins that allow delete-by-query, but they are not packaged up with the core Elasticsearch distribution due to safety issues (deleting data by query can be inaccurate depending on the query used). Perhaps it makes sense to write your data into date based indices, or to change your data model to better mesh with Elasticsearch's write and update patterns.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.