Elasticsearch scroll


#1

Hi All,

I would like to ask the best approach to extract thousands of records in Elasticsearch? Also is it possible to perform 1000 concurrent request using scroll api to extract the data from the elasticsearch index?

Thank you,


(Christian Dahlqvist) #2

How large is your cluster? How much data are you looking to extract? How many indices and shards is this spread across?


#3

Hi Christian,

We are using AWS ES m4.large having 8GB memory and 300GB EBS. We are planning to extract 200 thousands of records. We also have 1000 plus indices having 5 shards each indices.

Thank you,


(Christian Dahlqvist) #4

How many nodes in the cluster? Just 1?


#5

We have 7 nodes in total. The details are as follows 3 master nodes and 4 datanodes

Thank you,


(Christian Dahlqvist) #6

The first thing I would like to point out is that you have far too many indices and shards for a cluster that size. This can be very inefficient. I recommend you read this blog post about shards as it provides some practical guidelines.

Given the relatively low amount of heap available I would recommend running a few school queries at a time so you can determine how much the cluster can handle. I would not be surprised if you are suffering from heap pressure given the number of shards in the cluster. Performing 1000 requests in parallel would likely make it fall over.