Reindex while edit documents using scan api


(sashao) #1

Hello guys,
We need to remove part of the fields from the documents to shrink the index size.
After some googling my conclusions are:

  1. there is no choice and we have to reindex documents.
  2. it worse to do it using scan and bulk update api's.

My questions is - what is the best way to edit document:
One way I see is to get the _source field and edit it.
While another way I'm considering is to use the partial_fields option, which actually looks a way more convenient to deal with.
The problem is I'm not sure that the whole information that needed to reindex the document in the proper way will exist in the returned results. In other words can I use the returned fields to reindex without loosing the information or I have to use _source?

Thanks!!!


(Alexander Reelsen) #2

Hey,

the question is: Do you really have to remove the fields from your index or
is it sufficient to mark them as not_analyzed in your mapping? I would test
that out, before removing needed data, which then needs to be queried maybe
from another data store, resulting in huge query roundtrips again.

Regarding changing your document. Getting the source, changing the JSON and
reindexing might be the best way to go here. I would always go with the
source, but this depends if you excluded something from it, when indexing
(I usually try not to do that to prevent exactly that situation).

--Alex

On Thu, Oct 31, 2013 at 6:04 PM, sashao alexander.ostrikov@gmail.comwrote:

Hello guys,
We need to remove part of the fields from the documents to shrink the index
size.
After some googling my conclusions are:

  1. there is no choice and we have to reindex documents.
  2. it worse to do it using scan and bulk update api's.

My questions is - what is the best way to edit document:
One way I see is to get the _source field and edit it.
While another way I'm considering is to use the partial_fields option,
which
actually looks a way more convenient to deal with.
The problem is I'm not sure that the whole information that needed to
reindex the document in the proper way will exist in the returned results.
In other words can I use the returned fields to reindex without loosing the
information or I have to use _source?

Thanks!!!

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/reindex-while-edit-documents-using-scan-api-tp4043548.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3