I want to build a new index from an existing, but split each source document into a set of destination documents. Is it possible to do this with reindex, or is there some other means which it can be done server-side?
On a whim, I tried doing this in script: ctx._source = [[:],[:]]; ... and then set properties
but unsurprisingly, it errors out with java.util.ArrayList cannot be cast to java.util.Map".
Sorry, I guess I wasn't clear. The number of documents depends on the data in the source document. We have some data in an array of arbitrary length. I want to transform it to a new index, with each element of the array being a new row (and containing some other data from the document). I guess I could find the max n, run it for each, and then filter those from the reindex who's length is less than that.
But is there no way to directly transform a document into 0 or more docs with a script in a single pass?
indeed, there is no way, reindex is a one to one action basically. Having a small python script that is doing a scroll search on the one hand and a bulk index on the other sounds like the way to go here from my perspective.
Re: scroll search: Yes, that's the road I was starting to going down. However, it now goes from an in-server process which (as I understand it) gets a snapshot of the data, to a client process, with all that additional communications overhead, and presumably not a snapshot. I think It would be useful if elasticsearch could provide reindex-like semantics, but allow for 0 or more documents created (ctx._docs, instead of ctx._source, ctx._id, or maybe ctxs?)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.