I have series of documents that currently have a field that is a list of integers (date timestamps, but this doesn't really matter). Some have a single value, some have many (possibly hundreds). For example:
"date" : [1538508710]
or
"date" : [1538405989, 1550215297, 1550236107]
I would like to create a script that would update all these documents and change it from being a list to being a single integer, such that the integer is the smallest value in the list. So the fields above would instead look like:
"date" : 1538508710
or
"date" : 1538405989
Is it possible to do this with a scripted update? Can I do something like:
"inline": "ctx._source.date = min([d for d in ctx._source.date]),",
It looks like your suggested script is trying to use python generators. These are not possible in painless, but instead java streams are supported. It would look something like this:
Note however that it is best practice to make this operation idempotent, so that if some issues causes your query to fail you will be able to restart without handling any documents already processed.
A separate (temporary) boolean field would probably be easiest for this. You can then make a second update-by-query that removes the field (where your query searches for the field existing on a document). You could also instead make your script resilient to being re-run for the same document by checking the type of ctx._source.date (using instanceof List), but whether that is easier than the first option depends on your total data size and how costly it is to rerun your update-by-query on the entire dataset.
Do you mean tracking which documents have already had the script run against? Is there a way to do this easily? Or are you suggesting something like if you add a new field, use the match part of an update query to find the documents who are missing the new field?
Is there a good way to estimate this cost? We've got a 40-50 million rows to update.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.