Frequently updating index entries


(Michael Korbakov) #1

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

  1. Is there any simpler way to update index entry other then fetching
    of whole document by _source and then reindexing modified version with
    the same id?
  2. How bad index fragmentation in this scenario could possibly hit me? :slight_smile:
  3. Any recommendations on index options for frequently updating fields?

Thanks everyone!

-- Michael Korbakov


(Shay Banon) #2

On Thu, Jul 8, 2010 at 4:42 PM, Mykhailo Korbakov rmihael@gmail.com wrote:

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

  1. Is there any simpler way to update index entry other then fetching
    of whole document by _source and then reindexing modified version with
    the same id?

There is no way to do partial update, so you need to fetch, update and index
back.

  1. How bad index fragmentation in this scenario could possibly hit me? :slight_smile:

There will be fragmentation, but it will slowly be merged out.

  1. Any recommendations on index options for frequently updating fields?

Nothing special for this case, its a very valid case.

Thanks everyone!

-- Michael Korbakov


(Michael Korbakov) #3

On Thu, Jul 8, 2010 at 5:11 PM, Shay Banon shay.banon@elasticsearch.com wrote:

On Thu, Jul 8, 2010 at 4:42 PM, Mykhailo Korbakov rmihael@gmail.com wrote:

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

  1. Is there any simpler way to update index entry other then fetching
    of whole document by _source and then reindexing modified version with
    the same id?

There is no way to do partial update, so you need to fetch, update and index
back.

  1. How bad index fragmentation in this scenario could possibly hit me? :slight_smile:

There will be fragmentation, but it will slowly be merged out.

  1. Any recommendations on index options for frequently updating fields?

Nothing special for this case, its a very valid case.

Thank you for answering, Shay.

Just to make my soul completely calm down: is there any way to monitor
fragmentation? May be I'll had to tune merger somehow to reduce it,
etc.


(Shay Banon) #4

There isn't currently an API to return its value, but you can go to each
shard storage, and check the number of files, they reflect the number of
segments. There are parameters to control it (such as the merge_factor), and
there is an API to force "optimization".

Post 0.9 I am going to provide a full set of API for index level "info" and
"stats", in a similar manner current version provides for node. In them,
this information will be exposed.

-shay.banon

On Thu, Jul 8, 2010 at 5:26 PM, Mykhailo Korbakov rmihael@gmail.com wrote:

On Thu, Jul 8, 2010 at 5:11 PM, Shay Banon shay.banon@elasticsearch.com
wrote:

On Thu, Jul 8, 2010 at 4:42 PM, Mykhailo Korbakov rmihael@gmail.com
wrote:

Hi everyone.

In our project we're going to have several millions of documents
indexed by Elastic Search. Every day about 10% of all documents are
updated. Fields updated are numerical values like view counts that
we're going to use for scoring.

I have some questions regarding this situation:

  1. Is there any simpler way to update index entry other then fetching
    of whole document by _source and then reindexing modified version with
    the same id?

There is no way to do partial update, so you need to fetch, update and
index
back.

  1. How bad index fragmentation in this scenario could possibly hit me?
    :slight_smile:

There will be fragmentation, but it will slowly be merged out.

  1. Any recommendations on index options for frequently updating fields?

Nothing special for this case, its a very valid case.

Thank you for answering, Shay.

Just to make my soul completely calm down: is there any way to monitor
fragmentation? May be I'll had to tune merger somehow to reduce it,
etc.


(system) #5