Increase data size in Rally with existing tracks

We are using Rally as benchmark tool for our experiments .

In Rally we have one larger track which is "nyc_taxis", and it will give around 27GB indexing data. but I would require larger dataset so, I have followed the steps from below link.

We are using ES7.3.0 version and if use the custom track from above link we are getting following error:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at org.elasticsearch.index.mapper.DocumentParser.wrapInMapperParsingException(DocumentParser.java:191) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:74) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:772) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:749) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:721) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:256) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:159) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:191) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:116) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:77) [elasticsearch-7.3.0.jar:7.3.0]

Can you please provide your inputs on this.

You could look into using the rally-eventdata-track which generates data on the fly instead of working with a fized size corpora. I have used this to continously generate many terabytes of data over long periods.

Thanks for your input.

We already using eventdata-track to index around 5Billion json documents..

Along with eventdata track, we need nyc_taxis track where we can index at least 300GB indexing data from nyc_taxis. (to have different dataset).

Can you please help me , how we can resolve the issue with nyc_taxis custom track.

Hi,

it appears as if there is a problem creating the index because the mapping is incorrect (also the error message shown here seems incomplete?). As a first step you can run Rally with --on-error=abort which should give you an idea what's wrong with your track. If that does not help you find the problem, please share the complete track including your changes. Thanks.

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.