Hello,
I am trying to reindex one domain that has almost 10 million documents to another. The reason I am doing that is that I made a change in the analyzer, to split the dots, ignore leading zeroes and so forth.
I'm getting beaten HARD by the _reindex API. As I write that, I am executing the fourth tentative so far. The problem is that I keep getting this content_too_long_exception and keep decreasing the batch size.
The command I am running is:
/_reindex?pretty=true&scroll=10h&wait_for_completion=false
{
"source": {
"remote": {
"host": "xxxxxxxx",
"socket_timeout": "60m"
},
"index": "fulltext",
"size": 10
},
"dest": {
"index": "fulltext"
}
}
- I started running with the batch size of 1000, then 500, then 100. Now I got angry and tried 10, but I think every time I do that the total time increases because of the overhead.
- When the task fails isn't there a way to continue from where it stopped? I don't have a date field in my index.
- I don't know what the scroll parameter does
- I keep controlling the status with the GET /_tasks API. But right now is apparently stuck. This is so slow, I am getting desperate.
Any help would be immensely appreciated