ElasticSearch - nested query in batches?


(magorbe) #1

Hello! I have a problem when I do a query. The mapping is very simple :

{
    "index": {
        "aliases": {},
        "mappings": {
            "level1": {
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "level2": {
                        "type": "nested",
                        "properties": {
                            "level3": {
                                "type": "nested",
                                "properties": {
                                    "value1": {
                                        "type": "string"
                                    },
                                    "value2": {
                                        "type": "long"
                                    },
                                    "id": {
                                        "type": "string"
                                    },
                                    "value3": {
                                        "type": "long"
                                    }
                                }
                            },
                            "id": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1505476515647",
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "_0IiQCPrQ1i-kDP1481y8w",
                "version": {
                    "created": "2030099"
                }
            }
        },
        "warmers": {}
    }
}

And the query is :

{"query": {"terms": {"_id": [ "value51" ] }}}

When I do the query in Python I receive data with this structure:

_source (dict)
  level1 (list)
     level2 (list)
        data1 (dict)
              id
              value1
              value2
              value3
        data2 (dict)
        data3 (dict)
        ...
        data65000 (dict)

The problem is that 65,000 data are too many, and I run out of memory, I would like to know if _search or ElasticSearch in general has some way of bringing that information (data1,data2,data3...) in batches, or if there is some way to make that query so that I do not run out of memory on the computer. Any idea?

Thank you. :slight_smile:


(swarmee.net) #2

So you have one document in elastic search with 65000 nested sub documents - and when this one document is returned to python you run out of memory.

If elastic search could be configured to return part of the document in chunks - python would still run out of memory loading this one document.

Solution one - model your data differently - flatten out this nested document.

Solution two - get more memory :grinning:.


(magorbe) #3

Thanks! Solution two for now is impossible. Would solution one be easy to do?


(swarmee.net) #4

Are you loading the data using logstash? If so you could just use the split filter and you will get one document for each document in the level 2 list - with all the upper level fields repeated.


(magorbe) #5

No,I don't use logstash, I am doing :

get_query = self.get(id=id, index=self.default_index, doc_type=self.default_doc_type)

With the get of ES library. And i receive all the data.


(magorbe) #6

I don't understand the way to do that, can you give me any example ? Thanks!!!!


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.