The provided corpus does not match any of the corpora

I've defined one corpus in my track, and when I run the track, I get the message RallyAssertionError: The provided corpus ['tag-data'] does not match any of the corpora ['tag-data']. This happens whether or not I explicitly specify "corpora" in the bulk operation. Here's my track:

{% set index_name = (index_name | default("an_index_name")) %}

{
  "version": 0,
  "description": "Reference track",
  "indices": [
    {
      "name": "{{ index_name }}"
    }
  ],
  "corpora": [
    {
      "name": "tag-data",
      "documents": [
        {
          "base-url": "s3://an_s3_url",
          "source-file": "a_text_file.txt.bz2",
          "includes-action-and-meta-data": true,
          "document-count": 368687
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "indices": ["{{ index_name }}"],
        "bulk-size": 5
      },
      "warmup-iterations": 0,
      "clients": 1
    },
    {
      "operation": {
        "operation-type": "force-merge",
        "request-timeout": 7200
      }
    },
    {
      "name": "wait-until-merges-finish",
      "operation": {
        "operation-type": "index-stats",
        "index": "_all",
        "condition": {
          "path": "_all.total.merges.current",
          "expected-value": 0
        },
        "retry-until-success": true,
        "include-in-reporting": false
      }
    }
  ]
}

Hi @mej101 :slight_smile:

This looks like a bug, I had a quick look at reproducing this on our current master branch, but I noticed you've specified version: 0, and so I get:

esrally.exceptions.RallyError: Track rally is on version 0 but needs to be updated at least to version 2 to work with the current version of Rally.

As we document here, you should either omit this (defaults to version: 2), or set it explicitly.

Can you please try again without version: 0, and can you also please paste the output of:

$ esrally --version
esrally 2.4.0.dev0 (git revision: 37f4a5f)
1 Like

Thanks! Same error with version: 2, and I get esrally 2.3.1. Digging through the source code, it seems it's finding 0 documents in the corpus, and that's why it's returning that error. I tried manually downloading the data to a local folder, and it still finds 0 documents in the corpus.

Using the same stable release as you, and your track as template, I'm not able to reproduce this.

This is the track I'm using, where sample-docs-1k.json is quite literally 1001 lines of {"message": "in a bottle"}:

{% set index_name = (index_name | default("an_index_name")) %}

{
  "description": "Reference track",
  "indices": [
    {
      "name": "{{ index_name }}"
    }
  ],
  "corpora": [
    {
      "name": "tag-data",
      "documents": [
        {
          "source-file": "sample-docs-1k.json",
          "document-count": 1001,
          "includes-action-and-meta-data": false
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "indices": ["{{ index_name }}"],
        "bulk-size": 5
      },
      "warmup-iterations": 0,
      "clients": 1
    },
    {
      "operation": {
        "operation-type": "force-merge",
        "request-timeout": 7200
      }
    },
    {
      "name": "wait-until-merges-finish",
      "operation": {
        "operation-type": "index-stats",
        "index": "_all",
        "condition": {
          "path": "_all.total.merges.current",
          "expected-value": 0
        },
        "retry-until-success": true,
        "include-in-reporting": false
      }
    }
  ]
}

This is my invocation:

esrally race --distribution-version=7.16.1 --track-path /path/to/my/track --kill-running-processes

Can you try it with my example above, and also attach the output from ~/.rally/logs/rally.log?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.