Elasticsearch Bulk Write is slow using Scan and Scroll

AmitPandita · October 14, 2015, 6:38am

Hi Group,

I am currently running into an issue on which i am really stuck.
I am trying to work on a problem where I have to output the Elasticsearch documents and write them to csv. The docs range from 50,000 to 5 million.
I am experience serious performance issues and I get a feeling that I am missing something here.

Right now I have a dataset to 400,000 documents on which I am trying to scan and scroll and which would ultimately be formatted and written to csv. But the time taken to just output is 20 mins!! That is insane.

Here is my script:

import elasticsearch
import elasticsearch.exceptions
import elasticsearch.helpers as helpers
import time

es = elasticsearch.Elasticsearch(['http://XX.XXX.XX.XXX:9200'],retry_on_timeout=True)

scanResp = helpers.scan(client=es,scroll="50m",index='MyDoc',doc_type='MyDoc',timeout="50m",size=1000)

resp={}
start_time = time.time()
for resp in scanResp:
data = resp
print data.values()[3]

print("--- %s seconds ---" % (time.time() - start_time))

I am using a hosted AWS m3.medium server for Elasticsearch.

Can anyone please tell me what I might be doing wrong here?

warkolm · October 14, 2015, 9:54pm

So the size parameter is what it gets from each shard, so if you have (eg) 5 shards, that's 2 millions docs!
I'd start by reducing that to something considerably smaller and see if it helps.

AmitPandita · October 15, 2015, 5:27am

@warkolm Yes i did that already, in fact i started the size from 10, then 50,100,150,200,300,500,100 ...... The best result was at 200 where i got the result in 18 seconds that too for just 4000 documents. That is a really bad figure. What else apart from the size do u think i might be missing?

warkolm · October 15, 2015, 5:50am

Are you monitoring statistics on the cluster?
What do they tell you?

Topic		Replies	Views
Extremly slow troughput on large index Elasticsearch	8	957	July 6, 2017
ES write performance Elasticsearch	34	3175	July 6, 2017
ReadTimeout Elasticsearch	24	5574	August 11, 2018
Slow bulk indexing performance Elasticsearch	6	1365	December 11, 2018
Is there any package available to write the elasticsearch documents in bulk (more than 2 million) to a csv? Elasticsearch	2	420	July 5, 2017

Elasticsearch Bulk Write is slow using Scan and Scroll

Related topics