I want to read big index( ten GB) from elasticsearch with spark. Is there any advices ?
This is a pretty open question. To that I will say that you'll want to make sure that your index is properly sharded so that your spark job can read with high parallelism. Also, be sure to filter any data that you do not need for your spark processing out ahead of time using the pushdown query properties. If you are using Spark SQL, it allows you to pushdown sql filters and projections to Elasticsearch automatically by using the optimizer.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.