Is there anyway to reindex the data and create a new field, from the data in the documents. For example, reindex an entire index and create a new field Hour(number) from a date type field(YYYY-mm-dd, HH-mm-ss). I Know that Lucene has an expresion for this(doc['@timestamp'].date.getHourOfDay()), but can this be used to create a new field when you reindex?
I need this new field for a search query and the scripted fields can't be used for this, the alternative is using a script on the query but it can be bad for the search time if the number of documents searched continue to increase.
There is an example of using reindex to modify the document in the docs here. It changes the name of a field but you can use the same script construct to do what you want to do. I suspect the script looks something like "script": "ctx._source['timestamp_hour'] = ctx._source['@timestamp'].getHourOfDay(). You are right that it will be much better at search time to use this new field. Watch the time zone, btw. I believe the hour that you get in this case is UTC.
You could also do this with _update_by_query because it just introduces a new field which is a change you can do on an existing index. You'll end up with deleted documents in your index but merge should remove them in time. Not all of them, but a bunch. That might be easier to deal with depending on what you are doing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.