Hi, I'm using ES 1.7.3 and am attempting to re-index using scan/scroll. I'm noticing that scan is very fast at the beginning, but performance slowly degrades the further into the results I get. For example, the first batch of 100k docs takes 7s to query/iterate over, but by the 15 millionth doc, it's taking 10 minutes to query/iterate over 100k docs. Is this expected? From everything I've read, using scan
should solve this issue, but it doesn't appear to be having any affect.
I am using elasticsearch-py's reindex()
helper, so I initially filed a bug there, but I'm posting here because it's looking more and more like a core ES issue with scan
rather than something related to the Python client. I have many more details (graphs, benchmarks, hot_threads) posted in that bug: https://github.com/elastic/elasticsearch-py/issues/397
Getting over this hurdle is essential for our upgrade to ES2.3. We're basically blocked from reindexing a relatively modest-sized index due to this, so any help would be appreciated. Thx!