Frequently updating index entries

Michael_Korbakov · July 8, 2010, 1:42pm

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

Is there any simpler way to update index entry other then fetching
of whole document by _source and then reindexing modified version with
the same id?
How bad index fragmentation in this scenario could possibly hit me?
Any recommendations on index options for frequently updating fields?

Thanks everyone!

-- Michael Korbakov

kimchy · July 8, 2010, 2:11pm

On Thu, Jul 8, 2010 at 4:42 PM, Mykhailo Korbakov rmihael@gmail.com wrote:

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

Is there any simpler way to update index entry other then fetching
of whole document by _source and then reindexing modified version with
the same id?

There is no way to do partial update, so you need to fetch, update and index
back.

How bad index fragmentation in this scenario could possibly hit me?

There will be fragmentation, but it will slowly be merged out.

Any recommendations on index options for frequently updating fields?

Nothing special for this case, its a very valid case.

Thanks everyone!

-- Michael Korbakov

Michael_Korbakov · July 8, 2010, 2:26pm

On Thu, Jul 8, 2010 at 5:11 PM, Shay Banon shay.banon@elasticsearch.com wrote:

On Thu, Jul 8, 2010 at 4:42 PM, Mykhailo Korbakov rmihael@gmail.com wrote:

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

Is there any simpler way to update index entry other then fetching
of whole document by _source and then reindexing modified version with
the same id?

There is no way to do partial update, so you need to fetch, update and index
back.

How bad index fragmentation in this scenario could possibly hit me?

There will be fragmentation, but it will slowly be merged out.

Any recommendations on index options for frequently updating fields?

Nothing special for this case, its a very valid case.

Thank you for answering, Shay.

Just to make my soul completely calm down: is there any way to monitor
fragmentation? May be I'll had to tune merger somehow to reduce it,
etc.

kimchy · July 8, 2010, 2:30pm

There isn't currently an API to return its value, but you can go to each
shard storage, and check the number of files, they reflect the number of
segments. There are parameters to control it (such as the merge_factor), and
there is an API to force "optimization".

Post 0.9 I am going to provide a full set of API for index level "info" and
"stats", in a similar manner current version provides for node. In them,
this information will be exposed.

-shay.banon

On Thu, Jul 8, 2010 at 5:26 PM, Mykhailo Korbakov rmihael@gmail.com wrote:

On Thu, Jul 8, 2010 at 5:11 PM, Shay Banon shay.banon@elasticsearch.com
wrote:

On Thu, Jul 8, 2010 at 4:42 PM, Mykhailo Korbakov rmihael@gmail.com
wrote:

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

Is there any simpler way to update index entry other then fetching
of whole document by _source and then reindexing modified version with
the same id?

There is no way to do partial update, so you need to fetch, update and
index
back.

How bad index fragmentation in this scenario could possibly hit me?

There will be fragmentation, but it will slowly be merged out.

Any recommendations on index options for frequently updating fields?

Nothing special for this case, its a very valid case.

Thank you for answering, Shay.

Just to make my soul completely calm down: is there any way to monitor
fragmentation? May be I'll had to tune merger somehow to reduce it,
etc.

Topic		Replies	Views
Best strategy for often updated documents Elasticsearch	3	5481	July 6, 2017
Strategies for working with often updated documents Elasticsearch	1	390	July 6, 2017
ES: Ways to work with frequently updated fields of document Elasticsearch	4	752	July 6, 2017
Frequently updated int field Elasticsearch	5	1003	July 6, 2017
Frequently Updated Documents Handling Elasticsearch	1	538	March 16, 2017

Frequently updating index entries

Related topics