Esrally ingesting to two indices parallelly

I want to add documents to two indices simultaneously. For that, I tried with parallel option . But it stills run sequentially and add data to firstly mentioned index and later to the second one . Is this supported is Esrally ? If so could you please guide me . I already managed for the parallel search.

Below is my track.json

  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "customrecords",
      "body": "",
      "types": [
        "docs"
      ]
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 12109130,
          "uncompressed-bytes": 12258880607,
          "target-index": "geocustom"
        },
        {
          "source-file": "samples.json",
          "document-count": 12218269,
          "uncompressed-bytes": 7440925751,
          "target-index": "customrecords"
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "parallel": {
        "tasks": [
          {
            "operation": {
              "operation-type": "bulk",
              "bulk-size": 100
            },
            "warmup-time-period": 120,
            "clients": 8,
            "target-throughput": 800
          }
        ]
      }
    }
  ]
}
type or paste code here

Please check the docs: bulk operation, corpora and an example in the http_logs standard track: indices+corpora and index-append operation definition.

Thank you , I after revisiting the suggestion, I modified the track,json as below. Sadly, still its ingest data sequentially. Kindly help.

{
  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "customrecords",
      "body": "",
      "types": [ "docs" ]
    },
	    {
      "name": "geocustom",
      "body": "",
      "types": [ "docs" ]
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "samples.json",
          "document-count": 100000,
          "uncompressed-bytes": 3295600000,
          "target-index": "geocustom"
        }
		,
      {
        "source-file": "samples.json",
        "document-count": 100000,
	    "uncompressed-bytes": 3295600000,
        "target-index": "customrecords"
      }
      ]
    }
  ],
  "schedule": [
 {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 100
      },     
      "warmup-time-period": 120,
      "clients": 8
    }
  ]
}

Sorry, I misunderstood your original question.

The strategy I mentioned earlier will work for corpora that include the ACTION_AND_METADATA line which specifies the target index. So if you could have one corpus file that contains ALL docs that should be ingested, and specify "includes-action-and-meta-data": true in the corpora section you can achieve what you want with just one corpus file.

However, if you must work with separate corpus docs, you can use the parallel approach you mentioned earlier. This example was missing two parallel bulk tasks. I came up with the following quick example which does what you described: (btw I created the two corpus files from the respective corpora of geonames and http_logs tracks using head -100000 ~/.rally/benchmarks/data/geonames/documents-2.json >geodocs.json and head -100000 ~/.rally/benchmarks/data/http_logs/documents-181998.json >logsdocs.json):

{
  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "customlogs",
      "body": ""
    },
    {
      "name": "customgeo",
      "body": ""
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "./geodocs.json",
          "document-count": 100000,
          "target-index": "customgeo"
        },
        {
          "source-file": "./logsdocs.json",
          "document-count": 100000,
          "target-index": "customlogs"
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "yellow"
        }
      }
    },
    {
      "parallel": {
        "tasks": [
          {
            "name": "bulk1",
            "operation": {
              "operation-type": "bulk",
              "indices": "customgeo",
              "bulk-size": 100
            },
            "warmup-time-period": 0,
            "clients": 1,
            "target-throughput": 800
          },
          {
            "name": "bulk2",
            "operation": {
              "operation-type": "bulk",
              "indices": "customlogs",
              "bulk-size": 100
            },
            "warmup-time-period": 0,
            "clients": 1,
            "target-throughput": 800
          }
        ]
      }
    }
  ]
}
1 Like

Thank you very much. it worked like a charm, you saved me from a lot of trouble.

You are welcome @Sameera_De_Silva, that's great to hear.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.