Esrally ingesting to two indices parallelly

Sameera_De_Silva · July 29, 2021, 5:36am

I want to add documents to two indices simultaneously. For that, I tried with parallel option . But it stills run sequentially and add data to firstly mentioned index and later to the second one . Is this supported is Esrally ? If so could you please guide me . I already managed for the parallel search.

Below is my track.json

  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "customrecords",
      "body": "",
      "types": [
        "docs"
      ]
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 12109130,
          "uncompressed-bytes": 12258880607,
          "target-index": "geocustom"
        },
        {
          "source-file": "samples.json",
          "document-count": 12218269,
          "uncompressed-bytes": 7440925751,
          "target-index": "customrecords"
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "parallel": {
        "tasks": [
          {
            "operation": {
              "operation-type": "bulk",
              "bulk-size": 100
            },
            "warmup-time-period": 120,
            "clients": 8,
            "target-throughput": 800
          }
        ]
      }
    }
  ]
}
type or paste code here

dliappis · July 29, 2021, 7:39am

Please check the docs: bulk operation, corpora and an example in the http_logs standard track: indices+corpora and index-append operation definition.

Sameera_De_Silva · July 29, 2021, 10:35am

Thank you , I after revisiting the suggestion, I modified the track,json as below. Sadly, still its ingest data sequentially. Kindly help.

{
  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "customrecords",
      "body": "",
      "types": [ "docs" ]
    },
	    {
      "name": "geocustom",
      "body": "",
      "types": [ "docs" ]
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "samples.json",
          "document-count": 100000,
          "uncompressed-bytes": 3295600000,
          "target-index": "geocustom"
        }
		,
      {
        "source-file": "samples.json",
        "document-count": 100000,
	    "uncompressed-bytes": 3295600000,
        "target-index": "customrecords"
      }
      ]
    }
  ],
  "schedule": [
 {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        }
      }
    },
    {
      "operation": {
        "operation-type": "bulk",
        "bulk-size": 100
      },     
      "warmup-time-period": 120,
      "clients": 8
    }
  ]
}

dliappis · July 29, 2021, 11:53am

Sorry, I misunderstood your original question.

The strategy I mentioned earlier will work for corpora that include the ACTION_AND_METADATA line which specifies the target index. So if you could have one corpus file that contains ALL docs that should be ingested, and specify "includes-action-and-meta-data": true in the corpora section you can achieve what you want with just one corpus file.

However, if you must work with separate corpus docs, you can use the parallel approach you mentioned earlier. This example was missing two parallel bulk tasks. I came up with the following quick example which does what you described: (btw I created the two corpus files from the respective corpora of geonames and http_logs tracks using head -100000 ~/.rally/benchmarks/data/geonames/documents-2.json >geodocs.json and head -100000 ~/.rally/benchmarks/data/http_logs/documents-181998.json >logsdocs.json):

{
  "version": 2,
  "description": "Tutorial benchmark for Rally",
  "indices": [
    {
      "name": "customlogs",
      "body": ""
    },
    {
      "name": "customgeo",
      "body": ""
    }
  ],
  "corpora": [
    {
      "name": "rally-tutorial",
      "documents": [
        {
          "source-file": "./geodocs.json",
          "document-count": 100000,
          "target-index": "customgeo"
        },
        {
          "source-file": "./logsdocs.json",
          "document-count": 100000,
          "target-index": "customlogs"
        }
      ]
    }
  ],
  "schedule": [
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "yellow"
        }
      }
    },
    {
      "parallel": {
        "tasks": [
          {
            "name": "bulk1",
            "operation": {
              "operation-type": "bulk",
              "indices": "customgeo",
              "bulk-size": 100
            },
            "warmup-time-period": 0,
            "clients": 1,
            "target-throughput": 800
          },
          {
            "name": "bulk2",
            "operation": {
              "operation-type": "bulk",
              "indices": "customlogs",
              "bulk-size": 100
            },
            "warmup-time-period": 0,
            "clients": 1,
            "target-throughput": 800
          }
        ]
      }
    }
  ]
}

Sameera_De_Silva · July 29, 2021, 1:25pm

Thank you very much. it worked like a charm, you saved me from a lot of trouble.

dliappis · July 29, 2021, 1:44pm

You are welcome @Sameera_De_Silva, that's great to hear.

system · August 26, 2021, 1:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can I run indexing-task and searching-task in parallel？ Elasticsearch rally	5	940	January 17, 2017
Esrally when gave "index": "_all", it executes search queries on all the indices instead of search only the defined indices Elasticsearch rally	5	673	September 1, 2021
Any esrally track which can index and query at the same time Elasticsearch rally	2	571	December 26, 2022
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	873	July 5, 2019
Multiple indices are indexing in sequence Elasticsearch rally	2	710	May 7, 2019

Esrally ingesting to two indices parallelly

Related topics