Partial update into large document

Bongsakorn · October 10, 2018, 10:51am

Hello,

I'm facing the problem about performance. My application is about chatting.

I designed mapping index with nested object like below.

{
  "conversation_id-v1": {
    "mappings": {
      "stream": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "message": {
            "type": "text",
            "fields": {
              "analyzerName": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "analyzerName"
              },
              "language": {
                "type": "langdetect",
                "analyzer": "_keyword",
                languages: ["en", "ko", "ja"]
              }
            }
          },
          "comments": {
            "type": "nested",
            "properties": {
            "id": {
              "type": "keyword"
            },
            "message": {
              "type": "text",
              "fields": {
                "analyzerName": {
                  "type": "text",
                  "term_vector": "with_positions_offsets",
                  "analyzer": "analyzerName"
                },
                "language": {
                  "type": "langdetect",
                  "analyzer": "_keyword",
                  languages: ["en", "ko", "ja"]
                }
              }
            }
            }
          }
        }
      }
    }
  }
}

actually have a lot of fields

A document has around 4,000 nested objects. When I upsert data into document, It peak the cpu to 100% also disk i/o in case write. Input ratio around 1000/s.

How can I tuning to improve performance?

Hardware
3x 2vCPUs 13GB on GCP

dakrone · October 11, 2018, 8:28am

Hi Pongsakorn,

So, the issue with nested objects is that doing an update on any of the objects (the top level or one of the nested documents) requires that each of the nested documents be re-indexing (because they need to be indexed ajacent to each other), in your case, with 4000 nested objects, every update is really 4000 index operations.

One thing you could investigate is to use parent/child (if on an earlier version of ES) or a join field (if on a later version of ES): https://www.elastic.co/guide/en/elasticsearch/reference/6.4/parent-join.html by using this, you decouple the top level document from its children, so they can be updated independently.

Bongsakorn · October 11, 2018, 10:04am

Thanks for you reply. I'm using ES 5.3.2 So.. there are no way to improve it without change design, right?

dakrone · October 11, 2018, 10:26am

It sounds like yes, you'd need to change your mappings in order to improve it. You could always scale the cluster, but that's more targeting the symptoms rather than the overall cause.

Bongsakorn · October 12, 2018, 8:05am

If I change to join datatype instead, it's help in case of performance?

dakrone · October 14, 2018, 2:52am

If you change to use the parent/child system (or a join field in later versions of ES) then it will help in the case with your re-indexing performance (for your updates). There is a performance tradeoff at query time, however, so I recommend you try it with your data and see whether it will work for you.

system · November 11, 2018, 2:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Performance Issue Elasticsearch	7	568	September 4, 2020
Partial updates of nested documents Elasticsearch	3	1883	July 21, 2017
Improving nested document update performance Elasticsearch	1	676	January 2, 2017
Very slow insertions in nested (unable reindexing) Elasticsearch	2	450	November 26, 2018
Relational Data Modelling in Elasticsearch Elasticsearch	2	452	July 6, 2017

Partial update into large document

Related topics