Reindex while edit documents using scan api

sashao · October 31, 2013, 5:04pm

Hello guys,
We need to remove part of the fields from the documents to shrink the index size.
After some googling my conclusions are:

there is no choice and we have to reindex documents.
it worse to do it using scan and bulk update api's.

My questions is - what is the best way to edit document:
One way I see is to get the _source field and edit it.
While another way I'm considering is to use the partial_fields option, which actually looks a way more convenient to deal with.
The problem is I'm not sure that the whole information that needed to reindex the document in the proper way will exist in the returned results. In other words can I use the returned fields to reindex without loosing the information or I have to use _source?

Thanks!!!

spinscale · November 14, 2013, 8:49am

Hey,

the question is: Do you really have to remove the fields from your index or
is it sufficient to mark them as not_analyzed in your mapping? I would test
that out, before removing needed data, which then needs to be queried maybe
from another data store, resulting in huge query roundtrips again.

Regarding changing your document. Getting the source, changing the JSON and
reindexing might be the best way to go here. I would always go with the
source, but this depends if you excluded something from it, when indexing
(I usually try not to do that to prevent exactly that situation).

--Alex

On Thu, Oct 31, 2013 at 6:04 PM, sashao alexander.ostrikov@gmail.comwrote:

Hello guys,
We need to remove part of the fields from the documents to shrink the index
size.
After some googling my conclusions are:

there is no choice and we have to reindex documents.

it worse to do it using scan and bulk update api's.

My questions is - what is the best way to edit document:
One way I see is to get the _source field and edit it.
While another way I'm considering is to use the partial_fields option,
which
actually looks a way more convenient to deal with.
The problem is I'm not sure that the whole information that needed to
reindex the document in the proper way will exist in the returned results.
In other words can I use the returned fields to reindex without loosing the
information or I have to use _source?

Thanks!!!

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/reindex-while-edit-documents-using-scan-api-tp4043548.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Reindex while edit documents using scan api Elasticsearch	1	304	July 6, 2017
Reindexing based on returned fields without loosing the information Elasticsearch	1	319	July 6, 2017
Why Elastic search need to reindex the entire document when there is a change in the mapping for one field? Elasticsearch	2	560	November 19, 2017
[Reindex in Java API] how to reindex subset of fields Elasticsearch	1	382	May 14, 2019
How to ignore missing fields while reindexing in Elasticsearch Elasticsearch	1	845	February 12, 2020

Reindex while edit documents using scan api

Related topics