Index-append into data-stream

Safty · December 9, 2022, 7:53pm

Hello,

Does Rally 2.7.0 support index-append (bulk) into a data-stream?

I am able to generate a data-stream using component templates and a composable index template without issue, but the index-append is not writing data to the .ds hidden indices

My challenge setup index-append is structured with a basic setup of:

{
      "operation": "index-append",
      "warmup-time-period": 500,
      "clients": {{bulk_indexing_clients | default(8)}},
      "ignore-response-error-level": "{{error_level | default('non-fatal')}}"
    },

I get the following at the end of the rally race:
[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

But the underlying auto generated hidden .ds index is not increasing in size at all and shows no documents written.

Thank you.

json · December 11, 2022, 10:49pm

Hi @Safty,

index-append is an arbitrary name assigned to a bulk type operation in some tracks. Take a look at the HTTP Logs track for example:

index-append is defined as a bulk operation in the track operations default.json file. I.e.,

    {
      "name": "index-append",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(5000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}},
      "corpora": "http_logs"
    }

The named operation index-append of type bulk is then referenced in the append-no-conflicts-index-only challenge in the track challenges default.json file.
track.json references both the locations of track operations and challenges here.

I hope this helps.

Thank you,
Jason

Safty · December 12, 2022, 8:50pm

Hello,

Thank you for the response and the clarification on the difference between the operation name and operation type.

What I am trying to work out is why the operation type "bulk" that is running during my race, is not appending data into a data-stream.

The data-stream is properly created during the race and the number of shards are appropriately active across all nodes based on my configuration.

When the the bulk append operation runs it does not add records to the underlying data-stream .ds hidden indices.

Is a bulk append operation possible with Rally 2.7.0 when using a data-stream? If so, what checks can be applied or setting to ensure that the bulk append is able to write to the hidden data-stream indices?

Thank you.

json · December 13, 2022, 3:22pm

Hi,

Rally 2.7.0 supports bulk append operations to data streams. There is an example of how to use data streams in a track I am developing, see rally-tracks/track.json at github-archive · inqueue/rally-tracks · GitHub.

Check rally.log for errors. If needed, you can turn up the logging level for the Elasticsearch client as described in the docs.

Thank you,
Jason

Safty · December 14, 2022, 6:52pm

Hello,

Thank you for the example config. I have narrowed down the issue I am seeing as the following:
" Cannot run task [index]: Request returned an error. Error type: bulk, Description: HTTP status: 400, message: failed to parse"
The issue is I cannot figure out what it "failed to parse" to correct it.

Do you have any information where to look for "failed to parse" issues? I set the following:
"loggers": {
"elasticsearch": {
"handlers": ["rally_log_handler"],
"level": "DEBUG",
"propagate": false
},
But it is spamming the rally.log to the extent that it is unreadable.

Do you have any advice on how to determine why it thinks it is unable to parse the data set I am using? I figure it is a mapping issue, but am unable to determine at what stage it is unable to parse. At the beginning, in the middle, is it a path issue?

Thank you.

Safty · December 20, 2022, 4:39pm

For those following along at home:

use the --on-error-abort=true flag when you run rally
set the logging for elasticsearch to DEBUG, as mentioned above.
tail -f rally.log from another ssh session to watch the log as you run the rally race.
when rally fails you will have much fewer log lines to go through and likely it will be very near the end of the tail.

This was the key for me in determining why the "failed to parse" was happening. In my case this was because the @timestamp of the data-stream was not able to parse the incoming data. I had to setup a copy of one of the nyc_taxis fields to the @timestamp configuration of the component template to get the ingest to work.

My example:
component configuration example: (I added this)

`"properties": {`
`               "@timestamp": {`
`                 "type": "date",`
`                 "format": "yyyy-MM-dd HH:mm:ss"`
`               },`

addded:

`"pickup_datetime": {`
`                 "type": "date",`
`                 "format": "yyyy-MM-dd HH:mm:ss",`
`                 "copy_to": [`
`                   "@timestamp"`
`                 ]`
`               },`

system · January 17, 2023, 4:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Create track using datastreams Elasticsearch rally	3	532	December 13, 2021
What does "bulk operation" mean exactly in custom track on es-rally? Elasticsearch rally	2	231	April 10, 2024
Bulk index operation with iterations failed Elasticsearch rally	4	1388	November 2, 2018
Rally op_metrics throughput is null Elasticsearch rally	1	207	September 22, 2023
Can NEST index multiple documents to a data stream? Elasticsearch language-clients	3	1296	December 15, 2021

Index-append into data-stream

Related topics