I have some test RDD with case class Product (tokens : String, count : Long, timestamp : Long,id :Int) in Spark-Scala.
timestamp - System.currentTimeMillis()
I wrote it to elasticsearch, but later i'll need to update this data and write new rdd.
My problem is that I don't know, how to delete old data by field timestamp (using Spark ElasticSearch).This is part of online application, so I can't just delete old data and then write new.
I could't find any information in documentation.
Could someone help me?
@c4c8f8b3048c927d1e54 The ES-Hadoop connector does not currently support delete operations. Elasticsearch doesn't support deletes by fields other than the id out of the box. There are plugins that allow delete-by-query, but they are not packaged up with the core Elasticsearch distribution due to safety issues (deleting data by query can be inaccurate depending on the query used). Perhaps it makes sense to write your data into date based indices, or to change your data model to better mesh with Elasticsearch's write and update patterns.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.