ElasticSearch - nested query in batches?

Hello! I have a problem when I do a query. The mapping is very simple :

{
    "index": {
        "aliases": {},
        "mappings": {
            "level1": {
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "level2": {
                        "type": "nested",
                        "properties": {
                            "level3": {
                                "type": "nested",
                                "properties": {
                                    "value1": {
                                        "type": "string"
                                    },
                                    "value2": {
                                        "type": "long"
                                    },
                                    "id": {
                                        "type": "string"
                                    },
                                    "value3": {
                                        "type": "long"
                                    }
                                }
                            },
                            "id": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1505476515647",
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "_0IiQCPrQ1i-kDP1481y8w",
                "version": {
                    "created": "2030099"
                }
            }
        },
        "warmers": {}
    }
}

And the query is :

{"query": {"terms": {"_id": [ "value51" ] }}}

When I do the query in Python I receive data with this structure:

_source (dict)
  level1 (list)
     level2 (list)
        data1 (dict)
              id
              value1
              value2
              value3
        data2 (dict)
        data3 (dict)
        ...
        data65000 (dict)

The problem is that 65,000 data are too many, and I run out of memory, I would like to know if _search or ElasticSearch in general has some way of bringing that information (data1,data2,data3...) in batches, or if there is some way to make that query so that I do not run out of memory on the computer. Any idea?

Thank you. :slight_smile:

So you have one document in elastic search with 65000 nested sub documents - and when this one document is returned to python you run out of memory.

If elastic search could be configured to return part of the document in chunks - python would still run out of memory loading this one document.

Solution one - model your data differently - flatten out this nested document.

Solution two - get more memory :grinning:.

Thanks! Solution two for now is impossible. Would solution one be easy to do?

Are you loading the data using logstash? If so you could just use the split filter and you will get one document for each document in the level 2 list - with all the upper level fields repeated.

No,I don't use logstash, I am doing :

get_query = self.get(id=id, index=self.default_index, doc_type=self.default_doc_type)

With the get of ES library. And i receive all the data.

I don't understand the way to do that, can you give me any example ? Thanks!!!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.