Enforce Primary Key in ElasticSearch


(Yoav) #1

Hi,

Please consider the following senarion :

I have a RDBMS database which load every 5 minuts 100K rows from a CSV file into database table.
Once the the file data has been insert into the database , end user can only run select statment (no insert-update-delete are allowed).

Currently i have 5 tera of data.

I would like to move from that RDBMS to elastic search , but have one problem.
I must enforce Primary key constraint, which currently built on 3 fields.
This mean that i need to prevent old records/documents to be uploaded once again into elasticsearch.

  1. Please advise if it possible to enforce a Primary key as describe above
  2. I am wondering if the P.K limitation cause ElasticSearch to be inappropriate for my case and mybe i need to stick with the RDBMS or maybe try Mongodb .....

Regards
Yoav


(Christoph) #2

Hi,

while there is no direct notion of a Primary Key like in RDBMS in Elasticsearch, you can probably use the document id for this. You need to find a way to create a unique "id" out of your three current pk fields. You can then either op_type=create or the _create endpoint of the index api to prevent overwrites of existing documents like so:

PUT index/type/id?op_type=create
{
    "foo" : "bar"
}

or

PUT index/type/id/_create
{
    "foo" : "bar"
}

All future attempts to index a document with the same "id" will result in an error. Hope this solves your use case.


(Amar - Persistent Systems) #3

Hi You may try two approaches here :

  • Create separate field say "id" and while inserting use delete by query to ensure this gets deleted.

  • Second approach is use "PUT" and while inserting record along with actual record id.

So your _id field in ES will take your db record id and will prevent duplicating.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.