Enforce Primary Key in ElasticSearch

Hi,

Please consider the following senarion :

I have a RDBMS database which load every 5 minuts 100K rows from a CSV file into database table.
Once the the file data has been insert into the database , end user can only run select statment (no insert-update-delete are allowed).

Currently i have 5 tera of data.

I would like to move from that RDBMS to elastic search , but have one problem.
I must enforce Primary key constraint, which currently built on 3 fields.
This mean that i need to prevent old records/documents to be uploaded once again into elasticsearch.

  1. Please advise if it possible to enforce a Primary key as describe above
  2. I am wondering if the P.K limitation cause ElasticSearch to be inappropriate for my case and mybe i need to stick with the RDBMS or maybe try Mongodb .....

Regards
Yoav

Hi,

while there is no direct notion of a Primary Key like in RDBMS in Elasticsearch, you can probably use the document id for this. You need to find a way to create a unique "id" out of your three current pk fields. You can then either op_type=create or the _create endpoint of the index api to prevent overwrites of existing documents like so:

PUT index/type/id?op_type=create
{
    "foo" : "bar"
}

or

PUT index/type/id/_create
{
    "foo" : "bar"
}

All future attempts to index a document with the same "id" will result in an error. Hope this solves your use case.

Hi You may try two approaches here :

  • Create separate field say "id" and while inserting use delete by query to ensure this gets deleted.

  • Second approach is use "PUT" and while inserting record along with actual record id.

So your _id field in ES will take your db record id and will prevent duplicating.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.