Increase data size in Rally with existing tracks

suvarna · November 7, 2019, 4:58am

We are using Rally as benchmark tool for our experiments .

In Rally we have one larger track which is "nyc_taxis", and it will give around 27GB indexing data. but I would require larger dataset so, I have followed the steps from below link.

We are using ES7.3.0 version and if use the custom track from above link we are getting following error:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at org.elasticsearch.index.mapper.DocumentParser.wrapInMapperParsingException(DocumentParser.java:191) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:74) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:772) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:749) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:721) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:256) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:159) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:191) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:116) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:77) [elasticsearch-7.3.0.jar:7.3.0]

Can you please provide your inputs on this.

Christian_Dahlqvist · November 7, 2019, 6:17am

You could look into using the rally-eventdata-track which generates data on the fly instead of working with a fized size corpora. I have used this to continously generate many terabytes of data over long periods.

suvarna · November 7, 2019, 9:40am

Thanks for your input.

We already using eventdata-track to index around 5Billion json documents..

Along with eventdata track, we need nyc_taxis track where we can index at least 300GB indexing data from nyc_taxis. (to have different dataset).

Can you please help me , how we can resolve the issue with nyc_taxis custom track.

danielmitterdorfer · November 11, 2019, 8:12am

Hi,

it appears as if there is a problem creating the index because the mapping is incorrect (also the error message shown here seems incomplete?). As a first step you can run Rally with --on-error=abort which should give you an idea what's wrong with your track. If that does not help you find the problem, please share the complete track including your changes. Thanks.

Daniel

system · December 9, 2019, 8:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increase data size in Rally existing tracks Elasticsearch rally	3	2414	February 20, 2018
Increasing data size of existing track nyc_taxis Elasticsearch rally	3	652	May 5, 2020
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	872	July 5, 2019
Rally dump existing cluster data while 429 occur Elasticsearch rally	5	514	July 7, 2021
Multiple indices are indexing in sequence Elasticsearch rally	2	708	May 7, 2019

Increase data size in Rally with existing tracks

Related topics