Increase data size in Rally with existing tracks

We are using Rally as benchmark tool for our experiments .

In Rally we have one larger track which is "nyc_taxis", and it will give around 27GB indexing data. but I would require larger dataset so, I have followed the steps from below link.

We are using ES7.3.0 version and if use the custom track from above link we are getting following error:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at org.elasticsearch.index.mapper.DocumentParser.wrapInMapperParsingException( ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument( ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.mapper.DocumentMapper.parse( ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndex( ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation( ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary( ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest( [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun( [elasticsearch-7.3.0.jar:7.3.0]
at [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary( [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary( [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary( [elasticsearch-7.3.0.jar:7.3.0]

Can you please provide your inputs on this.

You could look into using the rally-eventdata-track which generates data on the fly instead of working with a fized size corpora. I have used this to continously generate many terabytes of data over long periods.

Thanks for your input.

We already using eventdata-track to index around 5Billion json documents..

Along with eventdata track, we need nyc_taxis track where we can index at least 300GB indexing data from nyc_taxis. (to have different dataset).

Can you please help me , how we can resolve the issue with nyc_taxis custom track.


it appears as if there is a problem creating the index because the mapping is incorrect (also the error message shown here seems incomplete?). As a first step you can run Rally with --on-error=abort which should give you an idea what's wrong with your track. If that does not help you find the problem, please share the complete track including your changes. Thanks.


