Spark write parquet record to elasticsearch too slowly


(lizhen) #1

I want to write the parquet record to es use spark, 10 excutors and 7 billion records. In the beginning i submit the application , the write speed is 20000+ per second , but the speed change small with time. After 12 hour ,the speed is 2000+ per second. Is some configuration missing? Or someone could give me the suggestion to export such big data to elaticsearch


(Costin Leau) #2

There might be a variety of factors at hand. Typically is all about how big is the ES cluster and how big is your target index in terms of shard.
What version of ES are you using? Likely after 12h or so merging occurs (see the ES docs for more information). There are various guides on how to go about this but two advices for maintaining the speed is (besides making sure that the OS doesn't interfere) to disabling refresh on the target index and only enable it after the import has finished.


(lizhen) #3

Thanks to Costin, I read the docs about the ‘Indexing Performance Tips ’, and it is smooth running


(Costin Leau) #4

Glad to hear it.


(system) #5