Reindex, creating multiple destination documents from each source document

I want to build a new index from an existing, but split each source document into a set of destination documents. Is it possible to do this with reindex, or is there some other means which it can be done server-side?

On a whim, I tried doing this in script: ctx._source = [[:],[:]]; ... and then set properties
but unsurprisingly, it errors out with java.util.ArrayList cannot be cast to java.util.Map".


take this as an example, where you can just have a script that uses different logic to create an _id in the destination index.

DELETE test,output

PUT test/_doc/my_doc
  "first" : "first",
  "second" : "second",
  "third" : "third"

POST _reindex
  "source" : { "index":  "test"},
  "dest" : { "index" : "output" },
  "script" : {
    "lang": "painless",
    "source": """
def run = 3;
if (run == 1) {
  ctx._source.key = ctx._source.first;
  ctx._id = ctx._id + "_1";
} else if (run == 2) {
  ctx._source.key = ctx._source.second;
  ctx._id = ctx._id + "_2";
} else {
  ctx._source.key = ctx._source.third;
  ctx._id = ctx._id + "_3";

GET output/_search

This also means you have to run the reindex API n times (the number of different changes) and not once.


Sorry, I guess I wasn't clear. The number of documents depends on the data in the source document. We have some data in an array of arbitrary length. I want to transform it to a new index, with each element of the array being a new row (and containing some other data from the document). I guess I could find the max n, run it for each, and then filter those from the reindex who's length is less than that.

But is there no way to directly transform a document into 0 or more docs with a script in a single pass?

indeed, there is no way, reindex is a one to one action basically. Having a small python script that is doing a scroll search on the one hand and a bulk index on the other sounds like the way to go here from my perspective.

Re: scroll search: Yes, that's the road I was starting to going down. However, it now goes from an in-server process which (as I understand it) gets a snapshot of the data, to a client process, with all that additional communications overhead, and presumably not a snapshot. I think It would be useful if elasticsearch could provide reindex-like semantics, but allow for 0 or more documents created (ctx._docs, instead of ctx._source, ctx._id, or maybe ctxs?)

a scroll search is a point in time snapshot, from the moment you start it. Changes done after that will not be taken into account.

