Need own unqiue ID with Bulk insert

elasticitm · June 3, 2020, 4:05pm

Hallo @all,

I've switched from a older elasticsearch version to the current 7.7 and now getting troubles using unique id's.

In the newer elasticsearch version the "_id" is set automatically to a short one and this breaks the whole logic of my implementation.

My product id (uuid) is for example: 709_dis__29618840141927_252041531

if I import products in bulk and the id exits the product should be overwritte for example:

PUT {{base_url}}/_bulk
{"index":{"_index":"{{domain}}_product"}, "_id": "709_dis__29618840141927_252041531",}
{"date": "2019-01-01", "price": 200, "promoted": true, "rating": 1, "type": "hat"}

I know:

"better performance with shorter id" - it is not an option!
"using hash to shorten" - it is not an option!

Is there a work arrount for setting own unique id with my need length (if ES is slower in this case thats finde for me)?

Thank you for your help!

willemdh · June 3, 2020, 4:17pm

@elasticitm Hey, maybe the fingerprint filter / processor could be an option for you? Check this article to get an idea => https://www.elastic.co/blog/logstash-lessons-handling-duplicates

Grtz

elasticitm · June 3, 2020, 5:08pm

Hi Willem,

thanks for your replay! Properly it could solve the problem, but makes the system more compley, so I would like to avoid using locklash.

I found out, that it works very fine when I add a single product instead of bulk:

PUT {{base_url}}/{{domain}}_product/_doc/709_dis__29618840141927_252041531

Question is now if there is a working syntax for bulk insert updates, that allows to set with own "_id"?

Best Regards

Christian_Dahlqvist · June 3, 2020, 5:14pm

Can you try setting _type to _doc in the bulk request and see if that changes anything?

elasticitm · June 3, 2020, 5:37pm

Hi Christian,

thanks for this hint

setting _doc like this

PUT {{base_url}}/_doc/_bulk

brings no effect

I'm not sure how I can set type here. Could you help me with the code?

Best Regards

Christian_Dahlqvist · June 3, 2020, 5:41pm

I meant putting it in each bulk header next to the _id.

elasticitm · June 4, 2020, 11:47am

Hi, Christian,

unforunetly that doesn't fix the problem.

Here is my request

PUT {{base_url}}/_bulk
{"index":{"_index":"{{domain}}_product"}, "_id": "709_dis__29618840141927_252041531", "_type":"_doc"}
{"id": "709_dis__29618840141927_252041531", "date": "2019-01-01", "price": 200, "promoted": true, "rating": 1, "type": "hat"}

But the _id was created automatacllay. Result is:

{
    "took": 13,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "kr_product",
                "_type": "_doc",
                "_id": "Olcmf3IBGdIxFphTkg1Y",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 5,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}

Do you have any other ideas for me? It is only the Bulk import what makes problems, but bulk import is very important for me.

psramkumar · June 7, 2020, 3:27pm

type. is removed in 7.7v https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

elasticitm · June 7, 2020, 10:07pm

Oh yes I see.

Does anyone has an idea how to fix the bulk-import changeing the "_id" issue?

Bernt_Rostad · June 8, 2020, 7:02am

I maintain a system where I generate the document IDs before performing bulk index, update and delete operations using these IDs. The main difference from your example is that I use "id" and "index", without a leading underscore, rather than "_id" and "_index".

Here's how I build up my bulk operation (in the Perl programming language):

   for (my $doc (@doc_array)) {
        my $payload = { # build a new payload
            id    => $doc->{id},
            type  => $fixed_type, # TODO: remove in ES7
            index => $indexname,
        };

        if ($doc->{action} ne 'delete' ) { # add document as 'source' or 'doc' depending on the action
            if ($doc->{action} eq 'update' ) { # just add the partial 'doc'
                $payload->{doc}           = $doc->{partial};
                $payload->{doc_as_upsert} = 'false';
                $payload->{detect_noop}   = 'true';
            } else { # for new documents add the full document in 'source'
                $payload->{source}        = $doc->{full};
                $payload->{pipeline}      = $pipeline if $pipeline; # only supported for indexing new docs
            }
        }
        $bulk->add_action( $doc->{action} => $payload );
   }

I hope this solves your problem. Good luck!

jprante · June 8, 2020, 6:26pm

Maybe you should check the curly brace positions.

How about

{"index":{"_index":"{{domain}}_product", "_id": "709_dis__29618840141927_252041531"}}

elasticitm · June 12, 2020, 9:35pm

Hi Jörg,

That was the problem - works perfectly now!

Thanks so much for your help!

Best Regards

system · July 10, 2020, 9:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to set _id in bulk index with raw source documents Elasticsearch	3	27	August 12, 2024
How to create the perfect _id by my own? Elasticsearch	5	447	March 18, 2019
Bulk upsert with _id path rather than explicit _id Elasticsearch	2	685	July 6, 2017
Question on using my own value as _id Elasticsearch	5	384	July 19, 2021
Elastic search _id uuid format Elasticsearch	6	12182	July 5, 2017

Need own unqiue ID with Bulk insert

Related Topics