Index different source files to different indices?

Hi,

currently my first trial of a track looks like this:

{
  "description": "my first benchmark",
  "indices": [
    {
      "name": "activemq_queue_count",
	  "auto-managed": false,
	  "body": "mappings/activemq_queue_count.json",
      "types": [
        {
          "name": "docs"
        }
      ]
    }
  ],
  "corpora": [
    {
      "name": "mytest",
      "documents": [
        {
          "source-file": "activemq_queue_count.json",
          "document-count": 63495,
          "uncompressed-bytes": 34230751
        }
      ]
    }
  ],
  "challenge": {
    "name": "index-only",
    "schedule": [
      {
        "operation": "delete-index"
      },
      {
        "operation": {
          "operation-type": "create-index",
          "settings": {
            "index.number_of_replicas": 0
          }
        }
      },
      {
        "operation": {
          "operation-type": "cluster-health",
          "request-params": {
            "wait_for_status": "yellow"
          }
        }
      },
      {
        "operation": {
          "operation-type": "bulk",
          "bulk-size": 5000
        },
        "warmup-time-period": 5,
        "clients": 8
      }
    ]
  }
}

We have many different logs. Coming from ES 5.x we had multiple types in a single index. Now with ES 6.x I would like to test the differences (especially index / query performance, storage needs, etc.) between 1 type per index and all types of logs in one index (but stored as one type (docs) and filterable by a custom field logType).

In the example above I only call bulk. Since there is only one index and only one document source, there is no problem. But how can I define which file should be bulk indexed to which index if I define more than one indices / files?

Hello,

My understanding is that you'd like to benchmark the performance between two scenarios, one having multiple indices with only one type and the second using one index with a custom type field, as explained in the elasticsearch docs here.

I have prepared a complete gist with an example track.json, mappings and document files, loosely based on your example above, and commands to test the two scenarios as separate challenges.

Below I have broken down my suggestions for the track file per scenario, with references in the gist code:

1. Many indices, one type, separate document files per index

For this case you'll define your indices separately in the indices array and specify your document files in the documents array of a corpus. Each element in the documents array can specify an index. In my gist this is accomplished here.

You can then have a dedicated challenge for this scenario. The bulk operation to index the documents to the corresponding indices references the corpus defined earlier, as shown in my gist section.

2. One index with custom type field

In this case it's sufficient to specify the custom type field in the combined document, pretty much mirroring the custom type field example in the elasticsearch docs.

In the same tracks file this can be implemented with an additional corpus targeting the index and a dedicated challenge; the bulk operation again specifies the corpus which we've already defined which index to target.

Dimitris

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.