Is this the correct impl' for customizing `_id` meta data field

To get Elasticsearch docker image version:

image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2

To accept a client side customized UUID on the _id field i had to change the: dynamic: strict property to true.

All this time, we have been using strict and auto generated UUIDs for the _id field.

When inspecting, the newly inserted document, i noticed a new field is added: id whose contents are matched exactly onto the _id metadata field.

Is this the expected behaviour, is this right way of achieving what i set out to do, which is to generate UUIDs, client side and then push to ELASTIC via bulk insert method.

I am using c# and the nest nuget package in dotnet 7.

Thanks for reading.

I changed my client side implementation to have a property named "_Id".

In the error logs i see, it is indeed mapped correctly.

But i get bad request 400 errors on bulk insert.

When i try a single insert using the 1 json doc payload, i get a 201 created response.

Elastic ignores the client side generated UUId.

When going back to the dynamic mode: "strict", there is no winning with Elastic Search API. I get confusing and conflicting error messages:

"Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters.""
Or
"type" : "strict_dynamic_mapping_exception", "reason" : "mapping set to strict, dynamic introduction of [id] within [_doc] is not allowed",

I do not want to add a new field/property to the document, i just want to customize the metadata field called: _id.

Why is this so hard?

I am confused ... this is not an Elasticsearch REST API Endpoint.

So, it is unclear how you are accessing Elasticsearch. Do you have a proxy or app in between?

Hi Stephen, i am using the nest chsarp library like so:

single doc:

many docs:

The NEST sdk confers the c# logic above to a low level client which in the end talks to the ES API. Does this clarify my usage?

The _id is the "resource" id and goes on the end of the URL

POST my_index/_doc/12345678-999
{
  "foo" : "bar"
}

GET my_index/_search

# POST my_index/_doc/12345678-999 201 Created
{
  "_index": "my_index",
  "_id": "12345678-999",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
# GET my_index/_search 200 OK
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "12345678-999",
        "_score": 1,
        "_source": {
          "foo": "bar"
        }
      }
    ]
  }
}

Shouldn't the nest c# library do this for me? What purpose do this extension method solve then?

.Id(document._id)

or do i have to manually construct the http api call myself ?

Please do not past screen shots of text... really hard to work with

1 Like

I am not the C# expert.... I will have to take a look but I think it is important that you understand the underlying calls... you do not set _id as a normal property that is the error message you are getting.

with the _bulk endpoint you can do that as part of the payload.

It would be cool if you could refer my Q to a c# expert at ES. That would be greatly appreciated.

I have tried the _bulk endpoint and the i can see the mapped _id properties when the error messages are logged and handled. But every document gets rejected. Let me try again.

@stephenb - will this work if the mapping schema document is set to: dynamic: strict ?

I pinged internally...

I do not think dynamic: strict is a factor

1 Like

Ok thanks for claryfing, my tech lead advises we dont want to change to:
dynamic: true.

I tried the .BulkAll() method with a list of documents passed to it. Each document has a pregenerated UUID assigned to a _id property on each document.

The DroppedDocumentCallback() callback function is triggered revealing no errors why the document was dropped. Looking closely at the logs, i see:

2023-12-12 16:32:26 2023-12-12 16:32:26.332 +00:00 [My Worker] [my_scan_1] [] [ERR] [1deab1ef21cc] [Development] [1.0.0.0] Dropped doc - on index: cds_scan_1-doc with _id: hWLfXowBADcS-dXjXrm2
2023-12-12 16:32:26 Api Response: Status: 400 Result: 
2023-12-12 16:32:26 Error Response: {"headers":{},"root_cause":null}
2023-12-12 16:32:26 Failed doc: {"_id":"Ba/FyFrAslLo9b82OurV","file_type":"csv","file_name":"20121001 Setembro.csv","ingest":"2023-12-12T16:32:26.0732641+00:00" .... }.

You can see two different ids have been generated, one by me client side (Ba/FyFrAslLo9b82OurV) and another (hWLfXowBADcS-dXjXrm2) by, ES API - this is revealed by inspecting the: BulkResponseItemBase.Id property.

The .NET lead is out today, hopefully he can take a look tomorrow..

1 Like

Sorry, I'm not really following all the "snippets."

Documents can be dropped for several reasons...

For tomorrow I would think you would want to show a sample of the CSV, your template, and your C# code...

I got this working, now, i will post an update shortly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.