Index-append into data-stream

Hello,

Does Rally 2.7.0 support index-append (bulk) into a data-stream?

I am able to generate a data-stream using component templates and a composable index template without issue, but the index-append is not writing data to the .ds hidden indices

My challenge setup index-append is structured with a basic setup of:

{
      "operation": "index-append",
      "warmup-time-period": 500,
      "clients": {{bulk_indexing_clients | default(8)}},
      "ignore-response-error-level": "{{error_level | default('non-fatal')}}"
    },

I get the following at the end of the rally race:
[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

But the underlying auto generated hidden .ds index is not increasing in size at all and shows no documents written.

Thank you.

Hi @Safty,

index-append is an arbitrary name assigned to a bulk type operation in some tracks. Take a look at the HTTP Logs track for example:

  1. index-append is defined as a bulk operation in the track operations default.json file. I.e.,

        {
          "name": "index-append",
          "operation-type": "bulk",
          "bulk-size": {{bulk_size | default(5000)}},
          "ingest-percentage": {{ingest_percentage | default(100)}},
          "corpora": "http_logs"
        }
    
  2. The named operation index-append of type bulk is then referenced in the append-no-conflicts-index-only challenge in the track challenges default.json file.

  3. track.json references both the locations of track operations and challenges here.

I hope this helps.

Thank you,
Jason

Hello,

Thank you for the response and the clarification on the difference between the operation name and operation type.

What I am trying to work out is why the operation type "bulk" that is running during my race, is not appending data into a data-stream.

The data-stream is properly created during the race and the number of shards are appropriately active across all nodes based on my configuration.

When the the bulk append operation runs it does not add records to the underlying data-stream .ds hidden indices.

Is a bulk append operation possible with Rally 2.7.0 when using a data-stream? If so, what checks can be applied or setting to ensure that the bulk append is able to write to the hidden data-stream indices?

Thank you.

Hi,

Rally 2.7.0 supports bulk append operations to data streams. There is an example of how to use data streams in a track I am developing, see rally-tracks/track.json at github-archive · inqueue/rally-tracks · GitHub.

Check rally.log for errors. If needed, you can turn up the logging level for the Elasticsearch client as described in the docs.

Thank you,
Jason

Hello,

Thank you for the example config. I have narrowed down the issue I am seeing as the following:
" Cannot run task [index]: Request returned an error. Error type: bulk, Description: HTTP status: 400, message: failed to parse"
The issue is I cannot figure out what it "failed to parse" to correct it.

Do you have any information where to look for "failed to parse" issues? I set the following:
"loggers": {
"elasticsearch": {
"handlers": ["rally_log_handler"],
"level": "DEBUG",
"propagate": false
},
But it is spamming the rally.log to the extent that it is unreadable.

Do you have any advice on how to determine why it thinks it is unable to parse the data set I am using? I figure it is a mapping issue, but am unable to determine at what stage it is unable to parse. At the beginning, in the middle, is it a path issue?

Thank you.

For those following along at home:

  1. use the --on-error-abort=true flag when you run rally
  2. set the logging for elasticsearch to DEBUG, as mentioned above.
  3. tail -f rally.log from another ssh session to watch the log as you run the rally race.
  4. when rally fails you will have much fewer log lines to go through and likely it will be very near the end of the tail.

This was the key for me in determining why the "failed to parse" was happening. In my case this was because the @timestamp of the data-stream was not able to parse the incoming data. I had to setup a copy of one of the nyc_taxis fields to the @timestamp configuration of the component template to get the ingest to work.

My example:
component configuration example: (I added this)

`"properties": {`
`               "@timestamp": {`
`                 "type": "date",`
`                 "format": "yyyy-MM-dd HH:mm:ss"`
`               },`

addded:

`"pickup_datetime": {`
`                 "type": "date",`
`                 "format": "yyyy-MM-dd HH:mm:ss",`
`                 "copy_to": [`
`                   "@timestamp"`
`                 ]`
`               },`

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.