I am interested in optimizing the reading and writing performances in a system that works with Spark 1.6.2 and ES 2.4. I have read the existing posts & docs but there some minor moments I would like to clarify.
I am looking into the connector configuration. In particular, the following parameters:
I have three questions.
The meaning of the "es.scroll.size" param
The documentation says: "the total number of documents returned is LIMIT * NUMBER_OF_SCROLLS (OR TASKS)".
Am I right that if I have 4 shards in my ES, then I will have 4 scrolls (tasks) with multiple requests per scroll, where each request will contain "es.scroll.size" documents?
Impact of the "es.batch.size.bytes" and "es.batch.size.entries" params.
I have tried the next values: (5mb, 5.000), (10mb, 10.000), (15mb, 15.000), (25mb, 25.000). The first two give me nice performance benefits. However, the performance is roughly the same among the last three pairs.
Is there any limit on 10.000 for the "es.batch.size.entries" attribute?
- Are there any other attributes to improve the reading/writing performance?