How to index routing id based document in rally

Hi

I want to do a routing id based bulk index in rally ,i know how to do in rest call ,but using rally i dont know how to pass routing id,can you please guide me how to do it

This way i used routing based index in rest call

Mapping

{
"_routing": {
"required": true,
"path": "customerID"
}
}

rest call

test/type1/_bulk?routing=2
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{"firstname":"first_name1","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{"firstname":"first_name2","lastname":"last_name2","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{"firstname":"first_name3","lastname":"last_name3","client":200,"customerID":2}

The same how to do in rally ????

Thanks
Thennarasu

Hi @thennamca,

unfortunately, Rally does not support request parameters for the bulk API. However, you can generate the action and meta-data line yourself and tell Rally about it. If you generate this line yourself, you can add the _routing parameter there, e.g.:

{ "index" : { "_index": "test", "_type": "type1", "_id": "1", "_routing": 2 } }

So instead of:

 {“firstname”:“first_name1”,“lastname”:“last_name1”,“client”:100,“customerID”:2}
 {“firstname”:“first_name2”,“lastname”:“last_name2”,“client”:100,“customerID”:2}
 {“firstname”:“first_name3”,“lastname”:“last_name3”,“client”:200,“customerID”:2}

your file with the source documents would look like this:

{ “index” : { “_index” : “test”, “_type” : “type1”, “_id” : “1”, “_routing : 2 } }
{“firstname”:“first_name1”,“lastname”:“last_name1”,“client”:100,“customerID”:2}
{ “index” : { “_index” : “test”, “_type” : “type1”, “_id” : “2”, “_routing : 2 } }
{“firstname”:“first_name2”,“lastname”:“last_name2”,“client”:100,“customerID”:2}
{ “index” : { “_index” : “test”, “_type” : “type1”, “_id” : “3”, “_routing : 2 } }
{“firstname”:“first_name3”,“lastname”:“last_name3”,“client”:200,“customerID”:2}

As I've mentioned, you also need to tell Rally that your source data contains the action and meta-data line already. E.g.:

{
  "name": "index-append",
  "operation-type": "index",
  "action-and-meta-data": "sourcefile",
  "bulk-size": 5000
}

The property action-and-meta-data is set to sourcefile which tells Rally that your data already contain this line. Hence Rally will not generate it on the fly (also see the track reference in the docs).

Btw, Rally will not consider the action and meta-data line in the bulk size or in any statistics. So the bulk size of 5.000 is really 5.000 documents (which is 10.000 lines in the source file: 5.000 documents and 5.000 action and meta-data lines).

I hope that helps.

Daniel

Hi @danielmitterdorfer

I did above ways ,but index not happening in my server please point me where i did mistakes

following steps i did
track.json

{% import "rally.helpers" as rally with context %}

{
"short-description": "PI benchmark for Rally",
"description": "This test indexes",
"indices": [
{
"auto-managed": false,
"name": "test",
"types": [
{
"name": "type1",
"mapping": "mappings.json",
"documents": "documents.json.bz2",
"document-count": 8,
"compressed-bytes": 210,
"uncompressed-bytes": 661
}
]
}
],
"operations": [
{{ rally.collect(parts="operations/.json") }}
],
"challenges": [
{{ rally.collect(parts="challenges/
.json") }}
]
}

Challenges/Default.json

{
"name": "query stats",
"description": "are append only. After that a couple of queries are run.",
"default": true,
"index-settings": {
"index.number_of_replicas": 0
},
"schedule": [
{
"operation": "index-append",
"warmup-time-period": 120,
"clients": 1

    }
  ]
}

Operation/defalut.json

{
"name": "index-append",
"operation-type": "index",
"action-and-meta-data": "sourcefile",
"bulk-size": 5000
}
and my documents.json
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" , "_routing" : 2} }
{"firstname":"first_name1","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" , "_routing" : 2} }
{"firstname":"first_name2","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" , "_routing" : 2 } }
{"firstname":"first_name3","lastname":"last_name1","client":100,"customerID":2}

The following result i got in rally

Lap Metric Operation Value Unit
All Total Young Gen GC 0 s
All Total Old Gen GC 0 s
All Heap used for segments 203.815 MB
All Heap used for doc values 0.362514 MB
All Heap used for terms 197.345 MB
All Heap used for norms 0.019165 MB
All Heap used for stored fields 6.08783 MB
All Segment count 157
All Min Throughput index-append ops/s
All Median Throughput index-append ops/s
All Max Throughput index-append ops/s
All error rate index-append 0 %

[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

The index not happend ,if i use automanage true only creating index name ,but document not index

Hi @thennamca,

in general this looks fine to me. In your case you should let Rally manage the index ("auto-managed": true). This means that the index gets created by Rally explicitly based on the mapping that you provide.

The problem is that you specified a warmup time-period of 2 minutes but your benchmark finishes instantly (it just needs to index 8 documents). Rally does not show warmup results in the report. Hence, for test purposes you should set "warmup-time-period": 0 so Rally does not do any warmup.

Daniel

@danielmitterdorfer

I tried with "warmup-time-period": 0 but no luck index name and mapping created in my server,the document alone not index

Hi @thennamca,

I have just run your example track successfully locally. I notice in your posts that you use left and right double quotation marks instead of quotations marks (Unicode character U+0022). After I've corrected that, I was able to run the track just fine.

For completeness sake, here are the files:

track.json:

{
  "short-description": "PI benchmark for Rally",
  "description": "This test indexes",
  "indices": [
    {
      "auto-managed": true,
      "name": "test",
      "types": [
        {
          "name": "type1",
          "mapping": "mappings.json",
          "documents": "documents.json.bz2",
          "document-count": 3,
          "compressed-bytes": 191,
          "uncompressed-bytes": 492
        }
      ]
    }
  ],
  "operations": [
    {
      "name": "index-append",
      "operation-type": "index",
      "action-and-meta-data": "sourcefile",
      "bulk-size": 5000
    }
  ],
  "challenges": [
    {
      "name": "query stats",
      "description": "are append only. After that a couple of queries are run.",
      "default": true,
      "index-settings": {
        "index.number_of_replicas": 0
      },
      "schedule": [
        {
          "operation": "index-append",
          "warmup-time-period": 0,
          "clients": 1
        }
      ]
    }
  ]
}

mapping.json:

{
  "my_type": {
    "_routing": {
      "required": true
    }
  }
}

documents.json:

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" , "_routing" : 2} }
{"firstname":"first_name1","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" , "_routing" : 2} }
{"firstname":"first_name2","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" , "_routing" : 2 } }
{"firstname":"first_name3","lastname":"last_name1","client":100,"customerID":2}

When I run it, e.g. with esrally --distribution-version=5.5.0 --track=tmp --preserve-install=true (I called your track tmp on my machine; --preserve-install=true is just there to allow you to launch the cluster again after the benchmark and inspect its state), then I get:

|   Lap |                        Metric |    Operation |       Value |   Unit |
|------:|------------------------------:|-------------:|------------:|-------:|
|   All |                 Indexing time |              | 0.000633333 |    min |
|   All |                  Refresh time |              |  0.00103333 |    min |
|   All |              Median CPU usage |              |        1.95 |      % |
|   All |            Total Young Gen GC |              |           0 |      s |
|   All |              Total Old Gen GC |              |        0.02 |      s |
|   All |                    Index size |              |  7.1805e-06 |     GB |
|   All |               Totally written |              | 0.000415802 |     GB |
|   All |        Heap used for segments |              |  0.00269985 |     MB |
|   All |      Heap used for doc values |              |  8.7738e-05 |     MB |
|   All |           Heap used for terms |              |  0.00212955 |     MB |
|   All |           Heap used for norms |              | 0.000183105 |     MB |
|   All |          Heap used for points |              | 1.90735e-06 |     MB |
|   All |   Heap used for stored fields |              | 0.000297546 |     MB |
|   All |                 Segment count |              |           1 |        |
|   All |                Min Throughput | index-append |     24.7482 | docs/s |
|   All |             Median Throughput | index-append |     24.7482 | docs/s |
|   All |                Max Throughput | index-append |     24.7482 | docs/s |
|   All |      100th percentile latency | index-append |     120.738 |     ms |
|   All | 100th percentile service time | index-append |     120.738 |     ms |
|   All |                    error rate | index-append |           0 |      % |

If you still do not get any results, the full output of the race log file would be helpful.

Daniel

@danielmitterdorfer

Thank you very much for your help ,it is working now for me,

I have done mistakes in mapping file level

if i give above mapping it is working but my orginal mapping file will not working,
attched my document.json also

my mapping file is

{
"top_off_contract_opportunity": {
"_routing": {
"required": true
},
"date_detection": false,
"properties": {
"payload": {
"properties": {
"pohSpend": {
"include_in_all": false,
"store": true,
"type": "double"
},
"contractPriceRange": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"facilityId": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"quantity": {
"include_in_all": false,
"store": true,
"type": "double"
},
"organizationName": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"contractSource": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"csdFlag": {
"type": "boolean"
},
"vendorName": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"xchgFlag": {
"type": "boolean"
},
"purchaseUom": {
"search_analyzer": "default_search",
"analyzer": "default_index",
"type": "string"
},
"savingsOpportunity": {
"include_in_all": false,
"store": true,
"type": "double"
},
"contractExpiration": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"contractOrganizations": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"eboId": {
"include_in_all": false,
"store": true,
"type": "integer"
},
"unspscClassDescription": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"poMonth": {
"format": "dateOptionalTime||yyy-MM-ddHH:mm:ss||yyyy-MM-dd||MM-dd-yyyy",
"store": true,
"type": "date"
},
"priceFrom": {
"include_in_all": false,
"store": true,
"type": "double"
},
"partDescription": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"unspscCode": {
"norms": {
"enabled": false
},
"analyzer": "keyword",
"store": true,
"type": "string"
},
"vendorPartNumber": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"buyerPrice": {
"include_in_all": false,
"store": true,
"type": "double"
},
"facilityName": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"priceTo": {
"include_in_all": false,
"store": true,
"type": "double"
},
"providerKey": {
"norms": {
"enabled": false
},
"analyzer": "keyword",
"store": true,
"type": "string"
}
}
}
}
}
}

using above mapping file not working ,but if i give

{
"top_off_contract_opportunity": {
"_routing": {
"required": true
}
}
}


working fine

also one small doubt _id is mandarory to give ,es will automatic will take na?

Hi @thennamca,

well, that is nothing Rally specific. Rally does not do any magic here:

  1. It creates the index with the mapping you've defined
  2. It takes the file that you provide it and calls the Elasticsearch bulk API

I suggest that you create an index with your mapping and use then the bulk API via curl (or whatever you usually use). If that is working I see no reason why Rally should fail.

In that respect the bulk API works just like the index API. If you populate _id, then Elasticsearch will use it, if you don't specify it, Elasticsearch will automatically generate one. Actually, we prefer that you do let Elasticsearch generate the id because we can apply more optimizations then (specifically: Optimize indexing for the autogenerated ID append-only case by s1monw · Pull Request #20211 · elastic/elasticsearch · GitHub).

So the two options are:

You provide an id in the _id property:

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }

You let Elasticsearch auto-generate an id by not specifying _id at all (preferred option):

{ "index" : { "_index" : "test", "_type" : "type1" } }

Daniel

@danielmitterdorfer

Thanks a lot working perfect now

Hi,

glad to hear that! :slight_smile:

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.