How to index routing id based document in rally

thennamca · August 1, 2017, 10:54am

Hi

I want to do a routing id based bulk index in rally ,i know how to do in rest call ,but using rally i dont know how to pass routing id,can you please guide me how to do it

This way i used routing based index in rest call

Mapping

{
"_routing": {
"required": true,
"path": "customerID"
}
}

rest call

test/type1/_bulk?routing=2
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{"firstname":"first_name1","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{"firstname":"first_name2","lastname":"last_name2","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{"firstname":"first_name3","lastname":"last_name3","client":200,"customerID":2}

The same how to do in rally ????

Thanks
Thennarasu

danielmitterdorfer · August 1, 2017, 3:32pm

Hi @thennamca,

unfortunately, Rally does not support request parameters for the bulk API. However, you can generate the action and meta-data line yourself and tell Rally about it. If you generate this line yourself, you can add the _routing parameter there, e.g.:

{ "index" : { "_index": "test", "_type": "type1", "_id": "1", "_routing": 2 } }

So instead of:

 {“firstname”:“first_name1”,“lastname”:“last_name1”,“client”:100,“customerID”:2}
 {“firstname”:“first_name2”,“lastname”:“last_name2”,“client”:100,“customerID”:2}
 {“firstname”:“first_name3”,“lastname”:“last_name3”,“client”:200,“customerID”:2}

your file with the source documents would look like this:

{ “index” : { “_index” : “test”, “_type” : “type1”, “_id” : “1”, “_routing : 2 } }
{“firstname”:“first_name1”,“lastname”:“last_name1”,“client”:100,“customerID”:2}
{ “index” : { “_index” : “test”, “_type” : “type1”, “_id” : “2”, “_routing : 2 } }
{“firstname”:“first_name2”,“lastname”:“last_name2”,“client”:100,“customerID”:2}
{ “index” : { “_index” : “test”, “_type” : “type1”, “_id” : “3”, “_routing : 2 } }
{“firstname”:“first_name3”,“lastname”:“last_name3”,“client”:200,“customerID”:2}

As I've mentioned, you also need to tell Rally that your source data contains the action and meta-data line already. E.g.:

{
  "name": "index-append",
  "operation-type": "index",
  "action-and-meta-data": "sourcefile",
  "bulk-size": 5000
}

The property action-and-meta-data is set to sourcefile which tells Rally that your data already contain this line. Hence Rally will not generate it on the fly (also see the track reference in the docs).

Btw, Rally will not consider the action and meta-data line in the bulk size or in any statistics. So the bulk size of 5.000 is really 5.000 documents (which is 10.000 lines in the source file: 5.000 documents and 5.000 action and meta-data lines).

I hope that helps.

Daniel

thennamca · August 2, 2017, 6:25am

Hi @danielmitterdorfer

I did above ways ,but index not happening in my server please point me where i did mistakes

following steps i did
track.json

{% import "rally.helpers" as rally with context %}

{
"short-description": "PI benchmark for Rally",
"description": "This test indexes",
"indices": [
{
"auto-managed": false,
"name": "test",
"types": [
{
"name": "type1",
"mapping": "mappings.json",
"documents": "documents.json.bz2",
"document-count": 8,
"compressed-bytes": 210,
"uncompressed-bytes": 661
}
]
}
],
"operations": [
{{ rally.collect(parts="operations/.json") }}
],
"challenges": [
{{ rally.collect(parts="challenges/.json") }}
]
}

Challenges/Default.json

{
"name": "query stats",
"description": "are append only. After that a couple of queries are run.",
"default": true,
"index-settings": {
"index.number_of_replicas": 0
},
"schedule": [
{
"operation": "index-append",
"warmup-time-period": 120,
"clients": 1

    }
  ]
}

Operation/defalut.json

{
"name": "index-append",
"operation-type": "index",
"action-and-meta-data": "sourcefile",
"bulk-size": 5000
}
and my documents.json
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" , "_routing" : 2} }
{"firstname":"first_name1","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" , "_routing" : 2} }
{"firstname":"first_name2","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" , "_routing" : 2 } }
{"firstname":"first_name3","lastname":"last_name1","client":100,"customerID":2}

The following result i got in rally

Lap	Metric	Operation	Value	Unit
All	Total Young Gen GC		0	s
All	Total Old Gen GC		0	s
All	Heap used for segments		203.815	MB
All	Heap used for doc values		0.362514	MB
All	Heap used for terms		197.345	MB
All	Heap used for norms		0.019165	MB
All	Heap used for stored fields		6.08783	MB
All	Segment count		157
All	Min Throughput	index-append		ops/s
All	Median Throughput	index-append		ops/s
All	Max Throughput	index-append		ops/s
All	error rate	index-append	0	%

[WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

The index not happend ,if i use automanage true only creating index name ,but document not index

danielmitterdorfer · August 2, 2017, 7:32am

Hi @thennamca,

in general this looks fine to me. In your case you should let Rally manage the index ("auto-managed": true). This means that the index gets created by Rally explicitly based on the mapping that you provide.

The problem is that you specified a warmup time-period of 2 minutes but your benchmark finishes instantly (it just needs to index 8 documents). Rally does not show warmup results in the report. Hence, for test purposes you should set "warmup-time-period": 0 so Rally does not do any warmup.

Daniel

thennamca · August 2, 2017, 1:13pm

@danielmitterdorfer

I tried with "warmup-time-period": 0 but no luck index name and mapping created in my server,the document alone not index

danielmitterdorfer · August 2, 2017, 1:34pm

Hi @thennamca,

I have just run your example track successfully locally. I notice in your posts that you use left and right double quotation marks instead of quotations marks (Unicode character U+0022). After I've corrected that, I was able to run the track just fine.

For completeness sake, here are the files:

track.json:

{
  "short-description": "PI benchmark for Rally",
  "description": "This test indexes",
  "indices": [
    {
      "auto-managed": true,
      "name": "test",
      "types": [
        {
          "name": "type1",
          "mapping": "mappings.json",
          "documents": "documents.json.bz2",
          "document-count": 3,
          "compressed-bytes": 191,
          "uncompressed-bytes": 492
        }
      ]
    }
  ],
  "operations": [
    {
      "name": "index-append",
      "operation-type": "index",
      "action-and-meta-data": "sourcefile",
      "bulk-size": 5000
    }
  ],
  "challenges": [
    {
      "name": "query stats",
      "description": "are append only. After that a couple of queries are run.",
      "default": true,
      "index-settings": {
        "index.number_of_replicas": 0
      },
      "schedule": [
        {
          "operation": "index-append",
          "warmup-time-period": 0,
          "clients": 1
        }
      ]
    }
  ]
}

mapping.json:

{
  "my_type": {
    "_routing": {
      "required": true
    }
  }
}

documents.json:

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" , "_routing" : 2} }
{"firstname":"first_name1","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" , "_routing" : 2} }
{"firstname":"first_name2","lastname":"last_name1","client":100,"customerID":2}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" , "_routing" : 2 } }
{"firstname":"first_name3","lastname":"last_name1","client":100,"customerID":2}

When I run it, e.g. with esrally --distribution-version=5.5.0 --track=tmp --preserve-install=true (I called your track tmp on my machine; --preserve-install=true is just there to allow you to launch the cluster again after the benchmark and inspect its state), then I get:

|   Lap |                        Metric |    Operation |       Value |   Unit |
|------:|------------------------------:|-------------:|------------:|-------:|
|   All |                 Indexing time |              | 0.000633333 |    min |
|   All |                  Refresh time |              |  0.00103333 |    min |
|   All |              Median CPU usage |              |        1.95 |      % |
|   All |            Total Young Gen GC |              |           0 |      s |
|   All |              Total Old Gen GC |              |        0.02 |      s |
|   All |                    Index size |              |  7.1805e-06 |     GB |
|   All |               Totally written |              | 0.000415802 |     GB |
|   All |        Heap used for segments |              |  0.00269985 |     MB |
|   All |      Heap used for doc values |              |  8.7738e-05 |     MB |
|   All |           Heap used for terms |              |  0.00212955 |     MB |
|   All |           Heap used for norms |              | 0.000183105 |     MB |
|   All |          Heap used for points |              | 1.90735e-06 |     MB |
|   All |   Heap used for stored fields |              | 0.000297546 |     MB |
|   All |                 Segment count |              |           1 |        |
|   All |                Min Throughput | index-append |     24.7482 | docs/s |
|   All |             Median Throughput | index-append |     24.7482 | docs/s |
|   All |                Max Throughput | index-append |     24.7482 | docs/s |
|   All |      100th percentile latency | index-append |     120.738 |     ms |
|   All | 100th percentile service time | index-append |     120.738 |     ms |
|   All |                    error rate | index-append |           0 |      % |

If you still do not get any results, the full output of the race log file would be helpful.

Daniel

thennamca · August 2, 2017, 2:50pm

@danielmitterdorfer

Thank you very much for your help ,it is working now for me,

I have done mistakes in mapping file level

if i give above mapping it is working but my orginal mapping file will not working,
attched my document.json also

my mapping file is

{
"top_off_contract_opportunity": {
"_routing": {
"required": true
},
"date_detection": false,
"properties": {
"payload": {
"properties": {
"pohSpend": {
"include_in_all": false,
"store": true,
"type": "double"
},
"contractPriceRange": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"facilityId": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"quantity": {
"include_in_all": false,
"store": true,
"type": "double"
},
"organizationName": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"contractSource": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"csdFlag": {
"type": "boolean"
},
"vendorName": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"xchgFlag": {
"type": "boolean"
},
"purchaseUom": {
"search_analyzer": "default_search",
"analyzer": "default_index",
"type": "string"
},
"savingsOpportunity": {
"include_in_all": false,
"store": true,
"type": "double"
},
"contractExpiration": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"contractOrganizations": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"eboId": {
"include_in_all": false,
"store": true,
"type": "integer"
},
"unspscClassDescription": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"poMonth": {
"format": "dateOptionalTime||yyy-MM-ddHH:mm:ss||yyyy-MM-dd||MM-dd-yyyy",
"store": true,
"type": "date"
},
"priceFrom": {
"include_in_all": false,
"store": true,
"type": "double"
},
"partDescription": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"unspscCode": {
"norms": {
"enabled": false
},
"analyzer": "keyword",
"store": true,
"type": "string"
},
"vendorPartNumber": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"buyerPrice": {
"include_in_all": false,
"store": true,
"type": "double"
},
"facilityName": {
"search_analyzer": "default_search",
"norms": {
"enabled": false
},
"analyzer": "default_index",
"store": true,
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
},
"priceTo": {
"include_in_all": false,
"store": true,
"type": "double"
},
"providerKey": {
"norms": {
"enabled": false
},
"analyzer": "keyword",
"store": true,
"type": "string"
}
}
}
}
}
}

using above mapping file not working ,but if i give

{
"top_off_contract_opportunity": {
"_routing": {
"required": true
}
}
}

working fine

also one small doubt _id is mandarory to give ,es will automatic will take na?

danielmitterdorfer · August 3, 2017, 7:11am

Hi @thennamca,

well, that is nothing Rally specific. Rally does not do any magic here:

It creates the index with the mapping you've defined
It takes the file that you provide it and calls the Elasticsearch bulk API

I suggest that you create an index with your mapping and use then the bulk API via curl (or whatever you usually use). If that is working I see no reason why Rally should fail.

In that respect the bulk API works just like the index API. If you populate _id, then Elasticsearch will use it, if you don't specify it, Elasticsearch will automatically generate one. Actually, we prefer that you do let Elasticsearch generate the id because we can apply more optimizations then (specifically: Optimize indexing for the autogenerated ID append-only case by s1monw · Pull Request #20211 · elastic/elasticsearch · GitHub).

So the two options are:

You provide an id in the _id property:

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }

You let Elasticsearch auto-generate an id by not specifying _id at all (preferred option):

{ "index" : { "_index" : "test", "_type" : "type1" } }

Daniel

thennamca · August 4, 2017, 5:47am

@danielmitterdorfer

Thanks a lot working perfect now

danielmitterdorfer · August 4, 2017, 6:09am

Hi,

glad to hear that!

Daniel

system · September 1, 2017, 6:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Defining IDs and routing keys for documents in custom workloads in Rally Elasticsearch rally	4	725	July 3, 2019
How to Index duplicate documents with the different routing id Elasticsearch	2	329	May 11, 2022
Routing in search operation Elasticsearch rally	4	632	May 1, 2018
_bulk request with routing parameter Elasticsearch reindex	3	312	December 7, 2021
Is possible to extract _id doc with Rally and custom track? Elasticsearch rally	6	421	December 15, 2022

How to index routing id based document in rally

Related topics