Increase track data with for range statement,but meet java.lang.ClassCastException

when do increase data with exists data, add for range statements,meet Execption when run rally test.

ClassCastException as follow:

warn 2019-03-26T14:32:20.001Z  path: /geonames, params: {index=geonames}
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
	at org.elasticsearch.action.admin.indices.create.CreateIndexRequest.source(CreateIndexRequest.java:394) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.admin.indices.create.CreateIndexRequest.source(CreateIndexRequest.java:375) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.rest.action.admin.indices.RestCreateIndexAction.prepareRequest(RestCreateIndexAction.java:53) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:80) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.xpack.security.rest.SecurityRestFilter.lambda$handleRequest$0(SecurityRestFilter.java:61) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-6.3.2.jar:6.3.2]

track.json modified as follow:

{% import "rally.helpers" as rally with context %}
{% set index_count = 10 %}
{
  "version": 2,
  "description": "POIs from Geonames",
  "data-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames",
  "indices": [
    {
      "name": "geonames",
      "body": "index.json"
    }
  ],
  "corpora": [
    {
      "name": "geonames",
      "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames",
      "documents": [
        {% set comma = joiner() %}
        {% for item in range(index_count) %}
        {{comma()}}
        {
          "source-file": "documents-2.json-{{item}}.bz2",
          "document-count": 11396505,
          "compressed-bytes": 264698741,
          "uncompressed-bytes": 3547614383
        }
        {% endfor %}
      ]
    }
  ],
  "operations": [
    {{ rally.collect(parts="operations/*.json") }}
  ],
  "challenges": [
    {{ rally.collect(parts="challenges/*.json") }}
  ]
}

Hello,

The jinja2 loop you've defined will try to download source files like documents-2.json-0.bz2, documents-2.json-1.bz2 etc.

Since you are using the same base-url as the upstream geonames, you end up referencing files that don't exist, which you can easily check yourself with:

curl http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames/documents-2.json-0.bz2
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>corpora/geonames/documents-2.json-0.bz2</Key><RequestId>037E3BB5A46782DE</RequestId><HostId>7oagYu1Ol6QaaxOINX+0VFpGzQ0o9enTiz/uDRIHsEDwTaNVQH3tqV+MWkuvi8/gvSHT4Bo6MXc=</HostId></Error>

I am surprised you didn't get a 404 while running this, as such files don't exist.

If you want to use a larger corpus by repeating geonames 10 times, you can simply download locally and concatenate it in a larger file and either provide the full path to this in source-file or upload it to some location you control and change base-url accordingly. Details in https://esrally.readthedocs.io/en/latest/track.html#corpora.

Rgs,
Dimitris

thanks for you feedback info.
according to esrally download policy. documents-2.json-xxx.bz2 will be download from asw bucket.
but i copy the documents-2.json-xxx.bz2 from the exists documents-2.json.bz2. esrally uncompress successfully to prepareing test.

I see. This is a very original but completely unsupported way of increasing the corpus size; for future compatibility you'd be better off following the approached I mentioned earlier.

Nevertheless I tried your geonames modification against 6.6.0 and 6.3.2 and didn't have any issues.

I used something like: esrally --distribution-version=6.3.2 --runtime-jdk=8 --track-path=~/.rally/geonames --challenge=append-no-conflicts-index-only.

Could you please past your Rally command? On top of other things I am interested in which pipeline you are using.

Thank you for your posted.
i run commands as normal use, such as:

esrally race --pipeline=benchmark-only --target-hosts=localhost:9200 --track-path=geonames_large --client-options="xxxxxxxxx" --challenge=append-no-conflicts --track-params="bulk_size:10000,clients:200" --report-file=./markdown.md

i'm not change any pipeline configuration or other params.

What's your version of Rally? esrally --version

It seems you are using the benchmark-only pipeline, so Rally is benchmarking against a cluster it hasn't setup itself. Is your Elasticsearch version 6.3.2? Is it the the default distribution with security enabled?
As I said earlier I have been successful running a challenge from the modified geonames with >1 corpus against 6.3.2, so without additional information it's not clear what's going on.
Can you try running your track against Elasticsearch launched by Rally using something like (just make sure you have JAVA_HOME pointing to a java 8):

esrally --distribution-version=6.3.2 --runtime-jdk=8 --track-path=<your_custom_geonames_track> --challenge=append-no-conflicts-index-only

Additionally the cluster seems to be running on the same host (--target-hosts=localhost:9200), which is not a good practice for meaningful benchmarking results; the load driver should be kept separated from the ES nodes to avoid contention between each other.

1, my rally version is 1.0.4 latest.
2. jave_home point to 1.8.121
3.my target host is setup locally by myself.
4.target host elasticsearch version is 6.3.2.

Can you try running your track against Elasticsearch launched by Rally using something like (just make sure you have JAVA_HOME pointing to a java 8):

esrally --distribution-version=6.3.2 --runtime-jdk=8 --track-path=<your_custom_geonames_track> --challenge=append-no-conflicts-index-only

and report back?

oaha...i get , if use master branch copied config. add for loop statements. to test ver. 6.3x, the errors ocurred.

i compared same track with branch master and 6, branch 6 will be ok testing (add my for loop), but master branch is not.

Right, Rally will check out (when not specifying a track-path explicitly`) the right rally-tracks branch that corresponds to the detected Elasticsearch version. So you'll need to based your custom track on the right branch.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.