Very slow insertions in nested (unable reindexing)

Hello, I have the next mapping :

{
    "my_index": {
        "aliases": {},
        "mappings": {
            "a_c": {
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "s": {
                        "type": "nested",
                        "properties": {
                            "c_r": {
                                "type": "nested",
                                "properties": {
                                    "c": {
                                        "type": "string"
                                    },
                                    "end": {
                                        "type": "long"
                                    },
                                    "id": {
                                        "type": "string"
                                    },
                                    "start": {
                                        "type": "long"
                                    }
                                }
                            },
                            "g": {
                                "type": "string"
                            },
                            "id": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1505476515647",
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "_0IiQCPrQ1i-kDP1481y8w",
                "version": {
                    "created": "2030099"
                }
            }
        },
        "warmers": {}
    }
}

I try to insert a new s, but each s carries 600,000 c_r's, and when I have a high number of s's the system goes very slow and dies. I know that when you index or update a document with nested structures, Elasticsearch created multiple documents behind the scenes. For every update (even if just adding one nested component to the document) all of these are reindexed, which means updating documents with a lot of nested components can require a lot of work.

Is there any solution like to unable this reindexing or similar? I need the nesteds, because I do queries that needs the nested option...

Thanks!!!!

I'm afraid there's really no way around the performance hit. That's just how nested fields; deeply nested fields will begin to show problems with indexing performance.

You could investigate using the Parent-Child model, which tends to be a bit more scalable. I would encourage you to try and denormalize as much as possible and only use the nested features where absolutely required. Remember, Elasticsearch is a search engine, not a relational database. The relational features are there to help with some data modeling problems, but it isn't a replacement for a true RDBM.

Often you can denormalize many of the fields to achieve the same thing as a deeply nested structure. It feels wasteful, but ES compresses these denormalized fields very well because they are "dense" and the terms are repeated across many documents.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.