Set _id during pipeline in bulk ingestion

How can I set the _id field to the value of the id property defined in my mappings?

In my mappings I have an id field defined as follows:

			"id": {
				"type": "text",
				"fields": {
					"type": "keyword",
					"ignore_above": 512
				},
				"store": "true"
			},

and I want the _id to be set to its value. I've added a set processor as follows at the end of the array of processors:

    {
      "set": {
        "field": "_id",
        "copy_from": "id"
      }
    }

Note: I've also tried value rather than copy_from...

However, I am getting errors stating:
Input field [text_field] does not exist in the source document

I've tried using different ways to reference the value I want the _id to receive:
{{id}} or {{{id}}} or "id"

My mapping for the _source simply has:

		"_source": {
			"enabled": "true",
			"excludes": [
				"passage_embedding.predicted_value",
				"path_embedding.predicted_value"
			]
		},

as the id field is defined as an object in the properties.

Is the id field located at the root level of the document? Can you show us a sample document?

        ObjectNode content = s_objectMapper
            .createObjectNode();
        content.put("passage", text);
        content.put("path", path);
        content.put("url", url);
        content.put("title", title);
        content.put("linenum", linenum);
        content.put("filestem", filestem);
        content.put("id", id);
        content.put("language", langFamily);
        ingester.add(op -> op.index(
            idx -> idx.index(indexName).document(content)));

What you're doing should work (and works for me).

Can you post a complete working example so that we can diagnose where the issue might be.

This is what I tested (on 8.11.3)

PUT /index-1
{
  "mappings": { "properties": { "id": { "type": "keyword" }} }
}
PUT /_ingest/pipeline/test
{
  "processors": [
      { "set": { "field": "_id", "copy_from": "id" } }
  ]
}
PUT /index-1/_doc/1?pipeline=test
{
  "id" : "abc"
}
GET /index-1/_search

===
{
  "took": 181,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "index-1",
        "_id": "abc",
        "_score": 1.0,
        "_source": {
          "id": "abc"
        }
      }
    ]
  }
}
1 Like

Indeed, changing to use type keyword worked. What I'd had before was:


"id": {

"type": "text",

"fields": {

"type": "keyword",

"ignore_above": 512

},

"store": "true"

},

but now just:


"id": {

"type": "keyword"

},