Set custom document ids on bulk insert

Sorry, let me try to elaborate further:

I created a custom track from data in an existing cluster:
esrally create-track --track=test-track --target-hosts=127.0.0.1:9200 --indices="articles" --output-path=~/tracks

The track generator generated a json file with the documents from my cluster:

// documents.json file
{"contentType":"page", "contentId": 1, "title":"hello world"}
{"contentType":"post", "contentId": 2, "title":"hello world"}
{"contentType":"page", "contentId": 3, "title":"hello world"}
{"contentType":"page", "contentId": 4, "title":"hello world"}
{"contentType":"post", "contentId": 5, "title":"hello world"}
{"contentType":"post", "contentId": 6, "title":"hello world"}
  • Note here that the generator only outputs the source data and doesn't preserve any of the self-defined IDs the documents might have.

In my track.json, I reference my source file within the corpora and the operations I want to run:

"corpora": [
    {
      "name": "test-documents",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 6,
          "uncompressed-bytes": 123
        }
      ]
    }
  ]
"schedule": [
    {
      "operation": {
        "operation-type": "delete-index"
      }
    },
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 5
      },
    },
  ]

The bulk operation indexes the documents with auto-generated IDs (expected), but I want to use self-defined IDs. I went ahead and modified the documents.json file to include the _id on each document. It follows the structure of the bulk API:

// modified documents.json file
{ "index" : { "_index" : "articles", "_id" : "1" } }
{"contentType":"page", "contentId": 1, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "2" } }
{"contentType":"post", "contentId": 2, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "3" } }
{"contentType":"page", "contentId": 3, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "4" } }
{"contentType":"page", "contentId": 4, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "5" } }
{"contentType":"post", "contentId": 5, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "6" } }
{"contentType":"post", "contentId": 6, "title":"hello world"}

The documents are successfully inserted when I re-run the track, but the IDs are still autogenerated. The self-defined IDs seem to be ignored.