Hi there. I'd like and efficient and fault tolerant (connection problems, failed operations etc) way to do the following:
I need to "merge" two indexes. Lets call them A and B being merged into C;
For every entry in A, there are 1 or more entries in B which are associated with it. We could think of A as a flights and B its passengers, for instance. Let 's describe an example of such a group by (A', [B'1, B'2,...B'n]);
Each entry in C will consist of:
-Full data of an entry in A
-Full data of an entry in B
-A few more computed fields - lets call them Cf ;
Each entry in A will appear multiple times in C, one for each entry in B associated with it;
For the group (A', [B'1, B'2,...B'n]), the entries in C would be:
(A',B'1,Cf1), (A',B'2,Cf2),..,(A',B'n,Cfn)
Things I'm considering: Best way to scroll and search through A and B to produce the entries for C. If in this case would be use useful to create auxiliary fields en A and/or B to register which items were already processed, which ones
are being processed and etc and speed up the scrolling. If its worth to use sort _doc in the scrolling for the speed or the fact that I won't be able to use
"search_after" with '_doc' makes it not worth. The way to organize the process such that the work can be divided in threads and that it automatically deals
with failed create/index operations made withing bulks etc
The indexes A and B have around 20 million and 30 million entries respectivelly. Each entry has approx. 10kb in size.