Esrally run failed for "Cannot find documents-2.json.bz2"

it report errors as following:


****** Use this pipeline only if you are aware of the tradeoffs. ******
*************************** Watch your step! ***************************


[INFO] Racing on track [geonames], challenge [append-no-conflicts-index-only] and car ['external'] with version [2.3.5].

[WARNING] refresh_total_time is 2 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 4 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.

[ERROR] Cannot race. Error in track preparator
Cannot find /home/.rally/benchmarks/data/geonames/documents-2.json.bz2. Please disable offline mode and retry again.

Getting further help:


however, there is documents.json.bz2 file:
[root@hdh154 /home/.rally/benchmarks/data/geonames]# ll
total 193224
-rwxrwxrwx 1 root root 197857614 Jun 23 15:37 documents.json.bz2

could you help to resolve it please? thank you so much

esrally command:
esrally --track=geonames --target-hosts=10.3.68.168:9200,10.3.68.172:9200,10.3.68.174:9200 --challenge=append-no-conflicts-index-only --pipeline=benchmark-only --offline

If you are using it in offline mode, you need to download all the data the track uses before you run it.

I have download them, and put them under another path:
[root@hdh154 ~/.rally/benchmarks/tracks/default/geonames]# ll
total 660
drwxrwxrwx 2 root root 4096 Jun 20 11:27 challenges
-rwxrwxrwx 1 root root 44 Jun 20 10:39 files.txt
-rw-r--r-- 1 root root 2685 Jun 20 10:39 index.json
drwxrwxrwx 2 root root 4096 Jun 20 10:39 operations
-rw-r--r-- 1 root root 2061 Jun 20 10:39 README.md
-rw-r--r-- 1 root root 642669 Jun 20 10:39 terms.txt
-rwxrwxrwx 1 root root 1196 Jun 20 14:13 track.json
-rw-r--r-- 1 root root 4192 Jun 20 10:39 track.py
Is there some problem?

Hi,

you have:

However, the error message says:

So there is a file called documents.json.bz2 but Rally expects documents-2.json.bz2. Renaming the file won't help because it seems you downloaded and older version (the expected size is 264698741 bytes but yours has 197857614 bytes). How did you download the file? Did you use the download.sh script in rally-tracks as suggested in the docs?

Daniel

thank you for your reply.
Cause our server can't connect to network, so where can I get the documents-2.json.bz2 manually please?
Maybe I can try to replace the the documents.json.bz2 to documents-2.json.bz2, and try again.

I have tried download the bz2 file by download.sh, it failed:

[root@hdh154 /home]# ./download.sh geonames
: No such file or directory

Hi,

That is what the download.sh script is for. It will download all track-related files that you need to run in offline mode.

As I have explained in my first answer, this will not help you. This is the wrong file to begin with. The file length does not match and also it will have the wrong file structure.

Is there a download.sh file in that directory? If not, please follow the instructions in the docs.

Daniel

I have got the file, however, still some error:
[INFO] Racing on track [geonames], challenge [append-no-conflicts-index-only] and car ['external'] with version [2.3.5].

[INFO] Decompressing track data from [/home/.rally/benchmarks/data/geonames/documents-2.json.bz2] to [/home/.rally/benchmarks/data/geonames/documents-2.json] (resulting size: 3.30 GB) ...
[ERROR] Cannot race. Error in track preparator
Invalid data stream

Hi,

this sounds like the downloaded file is corrupted. Can you show the output of ls -la /home/.rally/benchmarks/data/geonames? Thanks.

Daniel

It seems all right.

[root@hdh154 ~/.rally/benchmarks/tracks/default/geonames]# ls -la /home/.rally/benchmarks/data/geonames
total 456028
drwxrwxrwx 2 root root 4096 Jun 25 19:01 .
drwxr-xr-x 5 root root 4096 Jun 20 14:03 ..
-rw-r--r-- 1 root root 4403200 Jun 25 19:10 documents-2.json
-rw-r--r-- 1 root root 264698741 Jun 25 18:49 documents-2.json.bz2
-rwxrwxrwx 1 root root 197857614 Jun 23 15:37 documents.json.bz2

Hi,

indeed, the file size of documents-2.json.bz2 seems fine. Can you please delete the partially uncompressed file documents-2.json (i.e. rm -f documents-2.json) and retry?

Daniel

Also failed:

[INFO] Racing on track [geonames], challenge [append-no-conflicts-index-only] and car ['external'] with version [2.3.5].

[INFO] Decompressing track data from [/home/.rally/benchmarks/data/geonames/documents-2.json.bz2] to [/home/.rally/benchmarks/data/geonames/documents-2.json] (resulting size: 3.30 GB) ...
[WARNING] refresh_total_time is 36 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 33 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[ERROR] Cannot race. Error in track preparator
Invalid data stream

after run the command, it generated json file again

[root@hdh154 ~/.rally/benchmarks/tracks/default/geonames]# ls /home/.rally/benchmarks/data/geonames
documents-2.json documents-2.json.bz2

Can you please do the following?

  1. Run md5sum documents-2.json.bz2 and paste the result. The fingerprint should be c6fbf5e7b20c3c46f4cd6ab8385a9cb7.
  2. Remove documents-2.json once more with rm -f documents-2.json
  3. Attempt to decompress it yourself with bzip2 -dk documents-2.json.bz2. Note that this requires the bzip2 tool which you might need to install separately.

result as following:

[root@hdh154 /home/.rally/benchmarks/data/geonames]# md5sum documents-2.json.bz2
67f10be0f09ae6820d314a9c795fc5a7 documents-2.json.bz2

[root@hdh154 /home/.rally/benchmarks/data/geonames]# bzip2 -dk documents-2.json.bz2

bzip2: Data integrity error when decompressing.
Input file = documents-2.json.bz2, output file = documents-2.json

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

bzip2: Deleting output file documents-2.json, if it exists.

It seems that, for whatever reason, documents-2.json.bz2 is broken. You can redownload it with the following command:

curl -o documents-2.json.bz2 http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames/documents-2.json.bz2

(this is actually what the download.sh script does as well...)

Please verify integrity by checking the checksum right after downloading:

md5sum documents-2.json.bz2

which should print c6fbf5e7b20c3c46f4cd6ab8385a9cb7.

my server can't recognize the link, so I download the bz2 file manually by browser, is it OK?
[root@hdh154 /home/.rally/benchmarks/data/geonames]# curl -o documents-2.json.bz2 http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames/documents-2.json.bz2

curl: (6) Couldn't resolve host 'benchmarks.elasticsearch.org.s3.amazonaws.com'

I've downloaded it through web browser, and here's output:
[root@hdh154 /home/.rally/benchmarks/data/geonames]# md5sum documents-2.json.bz2
3682050a7eb9dc53ba923379bc4a4ea3 documents-2.json.bz2

it seems still not right.

Downloading via a browser is fine as well. I just don't understand why the artefact is corrupted on your machine. I tried that exact command now on several machines with the same (successful) result. Are you behind a proxy? Do you have an opportunity to download the file from a machine that is not behind that proxy?