Rally data load fails 80/20 all the time

Running on Elastic Search data load using Rally and it is failing 80/20 all the time.

esrally --track=contactdata --target-hosts=esdb01-cqs01.db.us-west-1a.stg1.ebs.ebcolo.com:9200 --track-params="bulk_size:2000" --pipeline=benchmark-only --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:''"

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Writing logs to /root/.rally/logs/rally_out_20180418T190653Z.log

************************************************************************
************** WARNING: A dark dungeon lies ahead of you  **************
************************************************************************

Rally does not have control over the configuration of the benchmarked
Elasticsearch cluster.

Be aware that results may be misleading due to problems with the setup.
Rally is also not able to gather lots of metrics at all (like CPU usage
of the benchmarked cluster) or may even produce misleading metrics (like
the index size).

************************************************************************
****** Use this pipeline only if you are aware of the tradeoffs.  ******
*************************** Watch your step! ***************************
************************************************************************

[INFO] Racing on track [contactdata], challenge [bulk-index-throughput-test] and car ['external'] with version [6.2.2].

[WARNING] flush_total_time is 9256 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 3048348 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 212097 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 3105643 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_time is 6453799 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running index-append                                                           [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|   Lap |                        Metric |         Task |    Value |   Unit |
|------:|------------------------------:|-------------:|---------:|-------:|
|   All |                 Indexing time |              |  50.8068 |    min |
|   All |        Indexing throttle time |              |        0 |    min |
|   All |                    Merge time |              |  107.565 |    min |
|   All |                  Refresh time |              |   51.764 |    min |
|   All |                    Flush time |              | 0.154267 |    min |
|   All |           Merge throttle time |              |  3.53495 |    min |
|   All |            Total Young Gen GC |              |    0.217 |      s |
|   All |              Total Old Gen GC |              |        0 |      s |
|   All |        Heap used for segments |              |  25.8756 |     MB |
|   All |      Heap used for doc values |              |  1.88224 |     MB |
|   All |           Heap used for terms |              |   21.436 |     MB |
|   All |           Heap used for norms |              | 0.234131 |     MB |
|   All |          Heap used for points |              | 0.908391 |     MB |
|   All |   Heap used for stored fields |              |  1.41486 |     MB |
|   All |                 Segment count |              |      395 |        |
|   All |                Min Throughput | index-append |    27.55 | docs/s |
|   All |             Median Throughput | index-append |    27.55 | docs/s |
|   All |                Max Throughput | index-append |    27.55 | docs/s |
|   All |       50th percentile latency | index-append |  247.041 |     ms |
|   All |      100th percentile latency | index-append |  251.092 |     ms |
|   All |  50th percentile service time | index-append |  247.041 |     ms |
|   All | 100th percentile service time | index-append |  251.092 |     ms |
|   All |                    error rate | index-append |       20 |      % |


--------------------------------
[INFO] SUCCESS (took 26 seconds)

(EDIT: added code block by wrapping in ``` )

Error on Elastic Search :
2018-04-18T17:23:51,885][DEBUG][o.e.a.b.TransportShardBulkAction] [contact_current][1] failed to execute bulk item (index) BulkShardRequest [[contact_current][1]] containing [13] requests
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [joinField]

curl -u elastic: -XGET 'https://esdb01-cqs01.db.us-west-1a.stg1.ebs.ebcolo.com:9200/contact_current/_search/?pretty&pretty' -k
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.0,
"hits" : [
{
"_index" : "contact_current",
"_type" : "contact_location",
"_id" : "7",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"joinField" : {
"name" : "location",
"parent" : "1"
},
"city" : "Duryea",
"gisPoint" : {
"lat" : 83.5028147032,
"lon" : -45.9428289424
},
"polygons" : [
313,
42,
319
],
"state" : "Maine",
"organizationId" : 1,
"gisLocation" : {
"type" : "point",
"coordinates" : [
-45.9428289424,
83.5028147032
]
},
"streetAddress" : "775 Howard Avenue",
"roomNo" : 688,
"locationType" : "Expected",
"floorNo" : 79,
"assetId" : 246242,
"contactId" : 1,
"expireDate" : [
"2018-08-11T12:54:29Z"
],
"locationSourceId" : 80908,
"country" : "MEX",
"postalCode" : 68390,
"suite" : "79th Floor",
"arriveDate" : [
"2018-01-19T02:08:08Z"
],
"type" : "location",
"locationName" : "Duryea Office 79"
}
},
{
"_index" : "contact_current",
"_type" : "contact_location",
"_id" : "5",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"joinField" : {
"name" : "location",
"parent" : "1"
},
"city" : "Farmers",
"gisPoint" : {
"lat" : 3.7824604754,
"lon" : 13.4260769842
},
"polygons" : [
742,
183,
212
],
"state" : "Minnesota",
"organizationId" : 1,
"gisLocation" : {
"type" : "point",
"coordinates" : [
13.4260769842,
3.7824604754
]
},
"streetAddress" : "689 Linden Boulevard",
"roomNo" : 506,
"locationType" : "LastKnown",
"floorNo" : 91,
"assetId" : 163109,
"contactId" : 1,
"expireDate" : [
"2018-06-18T03:16:53Z"
],
"locationSourceId" : 206661,
"country" : "GBR",
"postalCode" : 59204,
"suite" : "91th Floor",
"arriveDate" : [
"2018-01-07T14:23:11Z"
],
"type" : "location",
"locationName" : "Farmers Office 91"
}
},

Hello @sarba,

The error org.elasticsearch.index.mapper.MapperParsingException: failed to parse [joinField] indicates there's either some problem in the mapping or a mismatch between some of the docs in your corpus/corpora and the defined index mapping.

I'd suggest you double check that the documents in your corpus can be indexed with the defined mapping using the bulk api successfully first.

One small request also, please use the preformatted text button (enclosed in ``) or code blocks (enclosed between triple backtick characters) to make reading easier when you paste code/json.

Dimitris

cat track.json {# Define variables to use throughout the template #} {# Maximum number of indexing threads to use #} {#% set clients = 5 %#} {# Number of primary shards and replicas to index into #} {#% set shard_count = 1 %#} {#% set replica_count = 0 %#} {# Name of index #} {#% set index_name = "contact_current" %#} {#% set bulk_size = 500 %#} {% import "rally.helpers" as rally with context %} { "version": 1, "description": "Testing bulk-indexing", "indices": [ { "name": "{{ index_name | default('contact_current') }}", "body": "mappings.json", "auto-managed": false, "types": ["contact_location"] } ], "corpora": [ { "name": "contactdata", "includes-action-and-meta-data": true, "documents": [ { "source-file": "new.json", "document-count": 4 } ] }], "challenges": [ {{ rally.collect(parts="challenges/*.json") }} ] }

cat mappings.json { "settings": { "analysis": { "normalizer": { "lowercase_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase"] } } }, "index" : { "number_of_shards" : 5, "number_of_replicas" : 2 }, "index.requests.cache.enable": true }, "mappings": { "contact_location": { "dynamic": "false", "_routing":{ "required": true }, "properties": { "type":{ "type": "keyword", "normalizer": "lowercase_normalizer" }, "joinField": { "type": "join", "relations": { "contact": "location" } }, "organizationId": { "type": "long" }, "status": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "defaultSortField": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "firstName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "lastName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "middleInitial": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "suffix": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "externalId": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "recordTypeId": { "type": "long" }, "language": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "country": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "groups": { "type": "long" }, "contactAttributes": { "type": "nested", "properties": { "orgAttrId": { "type": "long" }, "textValues": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "dateValues": { "type": "date" }, "doubleValues": { "type": "double" } } }, "paths": { "type": "nested", "properties": { "pathId": { "type": "long" }, "value": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "countryCode": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "waitTime": { "type": "long" }, "systemRequirement": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "phoneExt": { "type": "keyword", "normalizer": "lowercase_normalizer" } } }, "topics": { "type": "long" }, "userId": { "type": "long" }, "individualAccountId": { "type": "long" }, "registerEmail": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "registerDate": { "type": "date" }, "registerType": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "inviteDate": { "type": "date" }, "notifyPeriods": { "type": "nested", "properties": { "periodStart": { "type": "integer" }, "periodEnd": { "type": "integer" } } }, "timezoneId": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "ssoIdentity": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "travelArrangers": { "type": "long" }, "smUserId": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "createdId": { "type": "long" }, "createdDate": { "type": "date" }, "createdName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "createdProxyName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "lastModifiedId": { "type": "long" }, "lastModifiedDate": { "type": "date" }, "lastModifiedName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "lastModifiedProxyName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "locationType": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "locationName": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "streetAddress": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "suite": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "city": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "state": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "postalCode": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "addressGeoCodingSource": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "assetId": { "type": "long" }, "gisLocation": { "type": "geo_shape", "tree": "quadtree", "precision": "1m" }, "polygons": { "type": "integer" }, "gisPoint":{ "type":"geo_point" }, "arriveDate": { "type": "date" }, "expireDate": { "type": "date" }, "iata": { "type": "keyword", "normalizer": "lowercase_normalizer" }, "locationSourceId": { "type": "long" }, "contactId": { "type": "long" } }} }}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.