Degrading Performance due to Nested Fields

Hello,

We operate an Elasticsearch 6.3.0 cluster where we have (unless explicitly required) no control over the incoming data and executed queries against it. We combine JSON documents from multiple queues and let business query them using a custom DSL. (This DSL abstraction is required for a couple of reasons.) Since incoming documents contain array of objects that are queried by business, we apply a dynamic template that marks each object as nested. This fantastic combination allowing us to operate oblivious to the incoming data and queries come with the cost of Elasticsearch GC hiccups and degrading performance over time due to excessive segmentation. Check out the following statistics I collected from a production cluster:

GET /_cat/indices?v
# health status ... pri rep docs.count docs.deleted store.size pri.store.size
# green  open   ... 8   2 4603047133   1897804178      1.7tb        567.3gb

GET /product/product/_count
# {
#   "count": 59692799,
#   "_shards": {
#     "total": 8,
#     "successful": 8,
#     "skipped": 0,
#     "failed": 0
#   }
}

GET /product-20180822-113000-100/_settings
# {
#   "product-20180822-113000-100": {
#     "settings": {
#       "index": {
#         "mapping": {
#           "nested_fields": {
#             "limit": "10000"
#           },
#           "total_fields": {
#             "limit": "50000"
#           }
#         },
#         "refresh_interval": "30s",
#         "number_of_shards": "8",
#         "translog": {
#           "flush_threshold_size": "1gb",
#           "durability": "async"
#         },
#         "provided_name": "product-20180822-113000-100",
#         "creation_date": "1534930215357",
#         "number_of_replicas": "2",
#         "uuid": "x7_xDwLqQjeHt-hd15cGCw",
#         "version": {
#           "created": "6030099"
#         }
#       }
#     }
#   }
# }

GET /_cat/segments/product-20180822-113000-100?v
# See below for link to output.

GET /product/product/_mapping
# 1) "nested" keyword appears 61 times, including the one mentioned
#    in dynamic template.
# 2) Nesting level from root at max. goes up to 7, e.g.,
#    "rpg.object.productActions.actions.awardActions.awardAction.actionSubType".

(Click for GET /_cat/segments/product-20180822-113000-100?v output output.)

Is it possible to configure Elasticsearch to perform more aggressive merging to alleviate the degrading performance problem? Or any other tips -- except periodic reindexing? (I am open to sacrificing latency and throughput in return of a more stable service.)

Thanks in advance,
Cheers.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.