mej101
(Molly)
March 22, 2022, 8:44pm
1
I've defined one corpus in my track, and when I run the track, I get the message RallyAssertionError: The provided corpus ['tag-data'] does not match any of the corpora ['tag-data'] . This happens whether or not I explicitly specify "corpora"
in the bulk operation. Here's my track:
{% set index_name = (index_name | default("an_index_name")) %}
{
"version": 0,
"description": "Reference track",
"indices": [
{
"name": "{{ index_name }}"
}
],
"corpora": [
{
"name": "tag-data",
"documents": [
{
"base-url": "s3://an_s3_url",
"source-file": "a_text_file.txt.bz2",
"includes-action-and-meta-data": true,
"document-count": 368687
}
]
}
],
"schedule": [
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"indices": ["{{ index_name }}"],
"bulk-size": 5
},
"warmup-iterations": 0,
"clients": 1
},
{
"operation": {
"operation-type": "force-merge",
"request-timeout": 7200
}
},
{
"name": "wait-until-merges-finish",
"operation": {
"operation-type": "index-stats",
"index": "_all",
"condition": {
"path": "_all.total.merges.current",
"expected-value": 0
},
"retry-until-success": true,
"include-in-reporting": false
}
}
]
}
Hi @mej101
This looks like a bug, I had a quick look at reproducing this on our current master branch, but I noticed you've specified version: 0
, and so I get:
esrally.exceptions.RallyError: Track rally is on version 0 but needs to be updated at least to version 2 to work with the current version of Rally.
As we document here , you should either omit this (defaults to version: 2
), or set it explicitly.
Can you please try again without version: 0
, and can you also please paste the output of:
$ esrally --version
esrally 2.4.0.dev0 (git revision: 37f4a5f)
1 Like
mej101
(Molly)
March 23, 2022, 4:51pm
3
Thanks! Same error with version: 2
, and I get esrally 2.3.1
. Digging through the source code, it seems it's finding 0 documents in the corpus, and that's why it's returning that error. I tried manually downloading the data to a local folder, and it still finds 0 documents in the corpus.
Using the same stable release as you, and your track as template, I'm not able to reproduce this.
This is the track I'm using, where sample-docs-1k.json
is quite literally 1001 lines of {"message": "in a bottle"}
:
{% set index_name = (index_name | default("an_index_name")) %}
{
"description": "Reference track",
"indices": [
{
"name": "{{ index_name }}"
}
],
"corpora": [
{
"name": "tag-data",
"documents": [
{
"source-file": "sample-docs-1k.json",
"document-count": 1001,
"includes-action-and-meta-data": false
}
]
}
],
"schedule": [
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"indices": ["{{ index_name }}"],
"bulk-size": 5
},
"warmup-iterations": 0,
"clients": 1
},
{
"operation": {
"operation-type": "force-merge",
"request-timeout": 7200
}
},
{
"name": "wait-until-merges-finish",
"operation": {
"operation-type": "index-stats",
"index": "_all",
"condition": {
"path": "_all.total.merges.current",
"expected-value": 0
},
"retry-until-success": true,
"include-in-reporting": false
}
}
]
}
This is my invocation:
esrally race --distribution-version=7.16.1 --track-path /path/to/my/track --kill-running-processes
Can you try it with my example above, and also attach the output from ~/.rally/logs/rally.log
?
system
(system)
Closed
April 21, 2022, 6:48am
5
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.