Shard selection

Hi,

I have a question pertaining to shard selection when initially indexing a
doc and later updating it.

Let's assume I have two shards named A and B.

When a doc is first indexed, a shard is selected based on a routing value
derived from a field, let's say shard A is selected.

If a doc with the same Id is later re-indexed but with field values which
would lead to shard B beging selected, how does Elastic Search handle the
deletion of the doc in shard A prior to adding the new version of the doc
in shard B?

My guess is that prior to adding a doc, docs with the same Id are deleted
from all shards, is that correct?

Mathias.

--

Hi,

hmm... may be if you can provide a simple testing curl script (aka
recreation) it would help understand how exactly you do sharding/routing.

From what you say I am afraid you probably have to delete the document
first. AFAIK ES does not delete the document from different shards when it
is updated. But don't take me for granted on this (that is why I said a
small recreation script would help).

Regards,
Lukas

On Fri, Dec 14, 2012 at 9:55 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

Hi,

I have a question pertaining to shard selection when initially indexing a
doc and later updating it.

Let's assume I have two shards named A and B.

When a doc is first indexed, a shard is selected based on a routing value
derived from a field, let's say shard A is selected.

If a doc with the same Id is later re-indexed but with field values which
would lead to shard B beging selected, how does Elastic Search handle the
deletion of the doc in shard A prior to adding the new version of the doc
in shard B?

My guess is that prior to adding a doc, docs with the same Id are deleted
from all shards, is that correct?

Mathias.

--

--

The idea would be to shard by geographical region, but occasionally, a
moving point might move from one region to another (think transatlantic
flight, phone is shut down in London, turned back on in New York), if ES
does not delete from all shards, then once the location has shifted region,
it would be indexed in a new shard and a doc with the same Id would now be
indexed in two shards.

On Fri, Dec 14, 2012 at 10:00 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

hmm... may be if you can provide a simple testing curl script (aka
recreation) it would help understand how exactly you do sharding/routing.

From what you say I am afraid you probably have to delete the document
first. AFAIK ES does not delete the document from different shards when it
is updated. But don't take me for granted on this (that is why I said a
small recreation script would help).

Regards,
Lukas

On Fri, Dec 14, 2012 at 9:55 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

Hi,

I have a question pertaining to shard selection when initially indexing a
doc and later updating it.

Let's assume I have two shards named A and B.

When a doc is first indexed, a shard is selected based on a routing value
derived from a field, let's say shard A is selected.

If a doc with the same Id is later re-indexed but with field values which
would lead to shard B beging selected, how does Elastic Search handle the
deletion of the doc in shard A prior to adding the new version of the doc
in shard B?

My guess is that prior to adding a doc, docs with the same Id are deleted
from all shards, is that correct?

Mathias.

--

--

--

Hi,

I think you might have a look at parent/child instead for such use case. If
you store the geographical info into the child then you can update just the
child document when this information is changed.

Regards,
Lukas

On Fri, Dec 14, 2012 at 10:32 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

The idea would be to shard by geographical region, but occasionally, a
moving point might move from one region to another (think transatlantic
flight, phone is shut down in London, turned back on in New York), if ES
does not delete from all shards, then once the location has shifted region,
it would be indexed in a new shard and a doc with the same Id would now be
indexed in two shards.

On Fri, Dec 14, 2012 at 10:00 AM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,

hmm... may be if you can provide a simple testing curl script (aka
recreation) it would help understand how exactly you do sharding/routing.

From what you say I am afraid you probably have to delete the document
first. AFAIK ES does not delete the document from different shards when it
is updated. But don't take me for granted on this (that is why I said a
small recreation script would help).

Regards,
Lukas

On Fri, Dec 14, 2012 at 9:55 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

Hi,

I have a question pertaining to shard selection when initially indexing
a doc and later updating it.

Let's assume I have two shards named A and B.

When a doc is first indexed, a shard is selected based on a routing
value derived from a field, let's say shard A is selected.

If a doc with the same Id is later re-indexed but with field values
which would lead to shard B beging selected, how does Elastic Search handle
the deletion of the doc in shard A prior to adding the new version of the
doc in shard B?

My guess is that prior to adding a doc, docs with the same Id are
deleted from all shards, is that correct?

Mathias.

--

--

--

--

Yes, in this scenario, Elasticsearch might create two records with the same
id in different shards since it assumes that record's routing doesn't
change. A workaround for this issue would be to explicitly delete the old
version of the record before creating a new version. If you don't know the
old routing when you create a new records, you can add "_routing":
{ "required": true} to your mapping, and then execute delete command
without routing specified, which will broadcast delete to all shards.

On Friday, December 14, 2012 1:32:15 AM UTC-8, Mathias Herberts wrote:

The idea would be to shard by geographical region, but occasionally, a
moving point might move from one region to another (think transatlantic
flight, phone is shut down in London, turned back on in New York), if ES
does not delete from all shards, then once the location has shifted region,
it would be indexed in a new shard and a doc with the same Id would now be
indexed in two shards.

On Fri, Dec 14, 2012 at 10:00 AM, Lukáš Vlček <lukas...@gmail.com<javascript:>

wrote:

Hi,

hmm... may be if you can provide a simple testing curl script (aka
recreation) it would help understand how exactly you do sharding/routing.

From what you say I am afraid you probably have to delete the document
first. AFAIK ES does not delete the document from different shards when it
is updated. But don't take me for granted on this (that is why I said a
small recreation script would help).

Regards,
Lukas

On Fri, Dec 14, 2012 at 9:55 AM, Mathias Herberts <mathias....@gmail.com<javascript:>

wrote:

Hi,

I have a question pertaining to shard selection when initially indexing
a doc and later updating it.

Let's assume I have two shards named A and B.

When a doc is first indexed, a shard is selected based on a routing
value derived from a field, let's say shard A is selected.

If a doc with the same Id is later re-indexed but with field values
which would lead to shard B beging selected, how does Elastic Search handle
the deletion of the doc in shard A prior to adding the new version of the
doc in shard B?

My guess is that prior to adding a doc, docs with the same Id are
deleted from all shards, is that correct?

Mathias.

--

--

--