I have been trying to index a large dataset (with about 13000 fields and thousands of documents. Each doc size= around 450kb) from Spark through ESHadoop. The indexing goes well for small number of documents (hundreds) but fails with thousands. The logs show 'jvm spent 700ms in last 1s' and then runs into 'jvm heap out of memory' and the cluster goes down and hence the spark job.
I have a es cluster with 8 nodes 32gb memory each (less than 50% of available) (256GB overall) and enough disk space.
Used these settings:
default bulk doc size, entries, threads and queue.
What am i doing wrong here?
Any help is appreciated.