Reindex using scroll api

cirot87 · January 21, 2016, 10:09am

Hi, The scope is that i want to reindex my document.

I have the last version of Elasticsearch, in the documentation i see that i need to use scroll api and bulk api (i know how to use them),

The first question is: Can i use "search_type=scan" on the first search or it is deprecated ?
i see this : https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_21_search_changes.html

The second question is: When i use scroll search wiht scroll id and i have less than 10 documents i have no results (ex. hits[] ) but if i have more than 10 doc i have results. How it's possible? There is some settings to change?

I'm doing a normal query:

POST /region/city/_search?scroll=1m
{
"size":1000,
"sort": ["_doc"],
"query" : {
"match_all" : {}
}
}

jpountz · January 21, 2016, 10:12pm

It is deprecated indeed. What you do with a scroll request that sorts on _doc is the right thing to do.

Maybe you are confused because a simple scroll already returns documents on the first request (on the contrary to scans which used to only count results on the first request and would only return hits on further calls to the scroll api).

cirot87 · January 22, 2016, 8:36am

ty for answer .

Now, i have a new question...if i use the scroll api or a normal search on my alias for a "match_all" query with the maximum size of the index...i have the same result...

So in this case the scroll api is useless becouse i can use a normal query on alias save the data then use the bulk api for insert them on the new index.

If i'm on the correct way....what is the utility to use the scroll api for reindex (recommended on the documentation of ES)?

jpountz · January 22, 2016, 8:50am

The utility is that a regular search operation is very bad at fetching lots of records at once. It might work in your case if you don't have many documents, but otherwise the fact that it needs to fetch all matching documents and put them in a single json document will likely make your system go out of memory.

cirot87 · January 22, 2016, 9:06am

right, but for use scroll api i need to do a lot number of query for take all results, this can be slowed; and for use _search/scroll...before i need to do another query _search?scroll=1m for active the scroll_id (this isn't consistent for me).

So i have a large number of scroll query or a big json document and in both cases my system can go out of memory.

I forget something or it's right?

Topic		Replies	Views
Scroll setting on reindex task Elasticsearch	1	788	October 18, 2019
How to index docs using Scan and Scroll Elasticsearch	2	540	July 5, 2017
Reindex with Scan-scroll and bulk_API Elasticsearch	11	4024	July 5, 2017
Scroll Search Bug? Elasticsearch	4	2605	July 6, 2017
Scroll Questions Elasticsearch	7	2960	July 5, 2017

Reindex using scroll api

Related topics