Hello guys,
We need to remove part of the fields from the documents to shrink the index size.
After some googling my conclusions are:
there is no choice and we have to reindex documents.
it worse to do it using scan and bulk update api's.
My questions is - what is the best way to edit document:
One way I see is to get the _source field and edit it.
While another way I'm considering is to use the partial_fields option, which actually looks a way more convenient to deal with.
The problem is I'm not sure that the whole information that needed to reindex the document in the proper way will exist in the returned results. In other words can I use the returned fields to reindex without loosing the information or I have to use _source?
the question is: Do you really have to remove the fields from your index or
is it sufficient to mark them as not_analyzed in your mapping? I would test
that out, before removing needed data, which then needs to be queried maybe
from another data store, resulting in huge query roundtrips again.
Regarding changing your document. Getting the source, changing the JSON and
reindexing might be the best way to go here. I would always go with the
source, but this depends if you excluded something from it, when indexing
(I usually try not to do that to prevent exactly that situation).
Hello guys,
We need to remove part of the fields from the documents to shrink the index
size.
After some googling my conclusions are:
there is no choice and we have to reindex documents.
it worse to do it using scan and bulk update api's.
My questions is - what is the best way to edit document:
One way I see is to get the _source field and edit it.
While another way I'm considering is to use the partial_fields option,
which
actually looks a way more convenient to deal with.
The problem is I'm not sure that the whole information that needed to
reindex the document in the proper way will exist in the returned results.
In other words can I use the returned fields to reindex without loosing the
information or I have to use _source?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.