Sorry, let me try to elaborate further:
I created a custom track from data in an existing cluster:
esrally create-track --track=test-track --target-hosts=127.0.0.1:9200 --indices="articles" --output-path=~/tracks
The track generator generated a json file with the documents from my cluster:
// documents.json file
{"contentType":"page", "contentId": 1, "title":"hello world"}
{"contentType":"post", "contentId": 2, "title":"hello world"}
{"contentType":"page", "contentId": 3, "title":"hello world"}
{"contentType":"page", "contentId": 4, "title":"hello world"}
{"contentType":"post", "contentId": 5, "title":"hello world"}
{"contentType":"post", "contentId": 6, "title":"hello world"}
- Note here that the generator only outputs the source data and doesn't preserve any of the self-defined IDs the documents might have.
In my track.json, I reference my source file within the corpora and the operations I want to run:
"corpora": [
{
"name": "test-documents",
"documents": [
{
"source-file": "documents.json",
"document-count": 6,
"uncompressed-bytes": 123
}
]
}
]
"schedule": [
{
"operation": {
"operation-type": "delete-index"
}
},
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
}
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5
},
},
]
The bulk operation indexes the documents with auto-generated IDs (expected), but I want to use self-defined IDs. I went ahead and modified the documents.json file to include the _id
on each document. It follows the structure of the bulk API:
// modified documents.json file
{ "index" : { "_index" : "articles", "_id" : "1" } }
{"contentType":"page", "contentId": 1, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "2" } }
{"contentType":"post", "contentId": 2, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "3" } }
{"contentType":"page", "contentId": 3, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "4" } }
{"contentType":"page", "contentId": 4, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "5" } }
{"contentType":"post", "contentId": 5, "title":"hello world"}
{ "index" : { "_index" : "articles", "_id" : "6" } }
{"contentType":"post", "contentId": 6, "title":"hello world"}
The documents are successfully inserted when I re-run the track, but the IDs are still autogenerated. The self-defined IDs seem to be ignored.