Partial update into large document

Hello,

I'm facing the problem about performance. My application is about chatting.

I designed mapping index with nested object like below.

{
  "conversation_id-v1": {
    "mappings": {
      "stream": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "message": {
            "type": "text",
            "fields": {
              "analyzerName": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "analyzerName"
              },
              "language": {
                "type": "langdetect",
                "analyzer": "_keyword",
                languages: ["en", "ko", "ja"]
              }
            }
          },
          "comments": {
            "type": "nested",
            "properties": {
            "id": {
              "type": "keyword"
            },
            "message": {
              "type": "text",
              "fields": {
                "analyzerName": {
                  "type": "text",
                  "term_vector": "with_positions_offsets",
                  "analyzer": "analyzerName"
                },
                "language": {
                  "type": "langdetect",
                  "analyzer": "_keyword",
                  languages: ["en", "ko", "ja"]
                }
              }
            }
            }
          }
        }
      }
    }
  }
}
  • actually have a lot of fields

A document has around 4,000 nested objects. When I upsert data into document, It peak the cpu to 100% also disk i/o in case write. Input ratio around 1000/s.

How can I tuning to improve performance?

Hardware
3x 2vCPUs 13GB on GCP

Hi Pongsakorn,

So, the issue with nested objects is that doing an update on any of the objects (the top level or one of the nested documents) requires that each of the nested documents be re-indexing (because they need to be indexed ajacent to each other), in your case, with 4000 nested objects, every update is really 4000 index operations.

One thing you could investigate is to use parent/child (if on an earlier version of ES) or a join field (if on a later version of ES): https://www.elastic.co/guide/en/elasticsearch/reference/6.4/parent-join.html by using this, you decouple the top level document from its children, so they can be updated independently.

Thanks for you reply. I'm using ES 5.3.2 :frowning: So.. there are no way to improve it without change design, right?

It sounds like yes, you'd need to change your mappings in order to improve it. You could always scale the cluster, but that's more targeting the symptoms rather than the overall cause.

If I change to join datatype instead, it's help in case of performance?

If you change to use the parent/child system (or a join field in later versions of ES) then it will help in the case with your re-indexing performance (for your updates). There is a performance tradeoff at query time, however, so I recommend you try it with your data and see whether it will work for you.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.