How to sync delete operation from relation db to ES


(Gary Wu) #1

Hello all.
I am researching on how to sync relation database to elastic search using EsRDD. I found the EsRDD doesn't support delete operation from previous conversation in this forum. I encounter a problem, if a row is deleted in relation db, how could I do it in ESRDD? I figure out some resolutions,

  1. use the rest API of elasticsearch for deleting, other operations are done by EsRDD (add update)
    OR
  2. add an additional key value in every index, for example, "really_exist_flag": "true/false"
    When the user queries data, he should filter "really_exist_flag" which is "false" firstly.
    OR
  3. update all key-value in this index to empty value, so these values will not be found

I don't know the efficiency of the elasticsearch on deleting the index. Let us assume that the relation db has many delete operations. Could you give me some advice for it? OR is there any other method on how to sync relation db to elasticsearch ? :slightly_smiling:

thanks


(Costin Leau) #2

esRDD is a bridge to ES, it doesn't hold any data per se. You can simply recreate the RDD and you'll get the latest state. Further more RDDs tend to be short-lived as in, get the data, process it, throw it away.
In your post you refer to it as a long lived structure that needs to be kept in sync which is wrong.


(Gary Wu) #3

Hi @Costin,
Thank you for your reply.
I apologize for my explanation in last question. I am a newer in ES and ESRDD :joy:. Actually I plan to use the esRDD as short-lived, it is created several times in a period of time, and in each time it creates or updates some data from RDBMS to ES, then goes away.
I find the ES restful API supplies the delete interface for type and id for a document.
As we known (_index, _table, _id) can mark a document.


DELETE /website/blog/123


Does esRDD supply the same delete operation for _type and _id now?
I search the code, but ES_OPERATION_DELETE is not used in building bulk message.

String ES_WRITE_OPERATION = "es.write.operation";
String ES_OPERATION_INDEX = "index";
String ES_OPERATION_CREATE = "create";
String ES_OPERATION_UPDATE = "update";
String ES_OPERATION_UPSERT = "upsert";
String ES_OPERATION_DELETE = "delete";

Thanks


(Gary Wu) #4

Hi @Costin,
I found this link



Nevertheless the point stands that this operation should be supported. However it will not be a part of 2.0.x but rather 2.1
If you are using Spark, I highly recommend using the native Java/Scala API in 2.1.


does the latest version Es-hadoop library support for document delete now?

if it doesn't support the delete , could you supply information about the native Java/Scala API for it?
I'm a bit stuck now. :sweat:


(Costin Leau) #5

What do you mean by "supply information about"...?


(Gary Wu) #6

Oh, I find the java API can do it(client). :joy: Thanks @costin


(system) #7