Index different source files to different indices?

asp · March 15, 2018, 9:23am

Hi,

currently my first trial of a track looks like this:

{
  "description": "my first benchmark",
  "indices": [
    {
      "name": "activemq_queue_count",
	  "auto-managed": false,
	  "body": "mappings/activemq_queue_count.json",
      "types": [
        {
          "name": "docs"
        }
      ]
    }
  ],
  "corpora": [
    {
      "name": "mytest",
      "documents": [
        {
          "source-file": "activemq_queue_count.json",
          "document-count": 63495,
          "uncompressed-bytes": 34230751
        }
      ]
    }
  ],
  "challenge": {
    "name": "index-only",
    "schedule": [
      {
        "operation": "delete-index"
      },
      {
        "operation": {
          "operation-type": "create-index",
          "settings": {
            "index.number_of_replicas": 0
          }
        }
      },
      {
        "operation": {
          "operation-type": "cluster-health",
          "request-params": {
            "wait_for_status": "yellow"
          }
        }
      },
      {
        "operation": {
          "operation-type": "bulk",
          "bulk-size": 5000
        },
        "warmup-time-period": 5,
        "clients": 8
      }
    ]
  }
}

We have many different logs. Coming from ES 5.x we had multiple types in a single index. Now with ES 6.x I would like to test the differences (especially index / query performance, storage needs, etc.) between 1 type per index and all types of logs in one index (but stored as one type (docs) and filterable by a custom field logType).

In the example above I only call bulk. Since there is only one index and only one document source, there is no problem. But how can I define which file should be bulk indexed to which index if I define more than one indices / files?

dliappis · March 16, 2018, 8:34am

Hello,

My understanding is that you'd like to benchmark the performance between two scenarios, one having multiple indices with only one type and the second using one index with a custom type field, as explained in the elasticsearch docs here.

I have prepared a complete gist with an example track.json, mappings and document files, loosely based on your example above, and commands to test the two scenarios as separate challenges.

Below I have broken down my suggestions for the track file per scenario, with references in the gist code:

1. Many indices, one type, separate document files per index

For this case you'll define your indices separately in the indices array and specify your document files in the documents array of a corpus. Each element in the documents array can specify an index. In my gist this is accomplished here.

You can then have a dedicated challenge for this scenario. The bulk operation to index the documents to the corresponding indices references the corpus defined earlier, as shown in my gist section.

2. One index with custom type field

In this case it's sufficient to specify the custom type field in the combined document, pretty much mirroring the custom type field example in the elasticsearch docs.

In the same tracks file this can be implemented with an additional corpus targeting the index and a dedicated challenge; the bulk operation again specifies the corpus which we've already defined which index to target.

Dimitris

system · April 13, 2018, 8:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single index for different sources Elasticsearch	5	1077	July 5, 2017
Index design for hardware test results with custom doc type. Need help Elasticsearch	2	338	April 8, 2019
Parallel Bulk from multiple source files Elasticsearch rally	2	696	September 17, 2019
Do not use lot of types per index? Elasticsearch	10	953	July 5, 2017
One index or seperate indices for logfiles Elasticsearch	3	408	September 23, 2019

Index different source files to different indices?

1. Many indices, one type, separate document files per index

2. One index with custom type field

Related topics