Does Rally 2.7.0 support index-append (bulk) into a data-stream?
I am able to generate a data-stream using component templates and a composable index template without issue, but the index-append is not writing data to the .ds hidden indices
My challenge setup index-append is structured with a basic setup of:
I get the following at the end of the rally race:
[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.
But the underlying auto generated hidden .ds index is not increasing in size at all and shows no documents written.
The named operation index-append of type bulk is then referenced in the append-no-conflicts-index-only challenge in the track challenges default.json file.
track.json references both the locations of track operations and challenges here.
Thank you for the response and the clarification on the difference between the operation name and operation type.
What I am trying to work out is why the operation type "bulk" that is running during my race, is not appending data into a data-stream.
The data-stream is properly created during the race and the number of shards are appropriately active across all nodes based on my configuration.
When the the bulk append operation runs it does not add records to the underlying data-stream .ds hidden indices.
Is a bulk append operation possible with Rally 2.7.0 when using a data-stream? If so, what checks can be applied or setting to ensure that the bulk append is able to write to the hidden data-stream indices?
Thank you for the example config. I have narrowed down the issue I am seeing as the following:
" Cannot run task [index]: Request returned an error. Error type: bulk, Description: HTTP status: 400, message: failed to parse"
The issue is I cannot figure out what it "failed to parse" to correct it.
Do you have any information where to look for "failed to parse" issues? I set the following:
"loggers": {
"elasticsearch": {
"handlers": ["rally_log_handler"],
"level": "DEBUG",
"propagate": false
},
But it is spamming the rally.log to the extent that it is unreadable.
Do you have any advice on how to determine why it thinks it is unable to parse the data set I am using? I figure it is a mapping issue, but am unable to determine at what stage it is unable to parse. At the beginning, in the middle, is it a path issue?
use the --on-error-abort=true flag when you run rally
set the logging for elasticsearch to DEBUG, as mentioned above.
tail -f rally.log from another ssh session to watch the log as you run the rally race.
when rally fails you will have much fewer log lines to go through and likely it will be very near the end of the tail.
This was the key for me in determining why the "failed to parse" was happening. In my case this was because the @timestamp of the data-stream was not able to parse the incoming data. I had to setup a copy of one of the nyc_taxis fields to the @timestamp configuration of the component template to get the ingest to work.
My example:
component configuration example: (I added this)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.