0.90.2 _update or _bulk update causing NullPointerException in logs and I start losing shards

Eric_Sites · August 4, 2013, 11:34pm

I am getting java.lang.NullPointerException exception in my ElasticSearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Boaz_Leskes · August 5, 2013, 2:35pm

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them
might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my Elasticsearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Eric_Sites · August 5, 2013, 4:40pm

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before in
es 0.90.0.
I also changed the name of new_tracking to match the name of the action in
the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException in
logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them
might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my Elasticsearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Boaz_Leskes · August 5, 2013, 9:38pm

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_sites@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before in
es 0.90.0.
I also changed the name of new_tracking to match the name of the action in
the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them
might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my Elasticsearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose
a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/**EricSites/6152468 https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Eric_Sites · August 6, 2013, 3:09am

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0 and
wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert. The
fix was to use a different param name for different scripts, about 10 unique
scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates, never
the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index with 1
replication, so 200 shards total over 6 servers. Each shard is about 10.4 GB
in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException in
logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_sites@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before in es
0.90.0.
I also changed the name of new_tracking to match the name of the action in the
params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException in
logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them might
help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my Elasticsearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub
https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google
Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch%2Bunsubscribe@googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Boaz_Leskes · August 7, 2013, 4:39pm

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert.
The fix was to use a different param name for different scripts, about 10
unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates, never
the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index with
1 replication, so 200 shards total over 6 servers. Each shard is about 10.4
GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes <b.le...@gmail.com <javascript:>>
Reply-To: <elasti...@googlegroups.com <javascript:>>
Date: Monday, August 5, 2013 5:38 PM
To: <elasti...@googlegroups.com <javascript:>>
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites <eric_...@mac.com <javascript:>

wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before
in es 0.90.0.
I also changed the name of new_tracking to match the name of the action
in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes <b.le...@gmail.com <javascript:>>
Reply-To: <elasti...@googlegroups.com <javascript:>>
Date: Monday, August 5, 2013 10:35 AM
To: <elasti...@googlegroups.com <javascript:>>
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I lose
a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/**EricSites/6152468 https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

rohit_jaiswal · June 16, 2014, 8:26pm

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub . Thanks!! The
fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert.
The fix was to use a different param name for different scripts, about 10
unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before
in es 0.90.0.
I also changed the name of new_tracking to match the name of the action
in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecf3d1cf-8b21-44cb-8b17-76f818805a7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Boaz_Leskes · June 22, 2014, 7:50pm

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request. You
should indeed set the retry_on_conflict option to make the update request
succeed. PS - you should really upgrade - a lot has happened and was fixed
since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert.
The fix was to use a different param name for different scripts, about 10
unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before
in es 0.90.0.
I also changed the name of new_tracking to match the name of the action
in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 22, 2014, 8:01pm

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Boaz_Leskes · June 22, 2014, 8:10pm

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 22, 2014, 8:18pm

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our 20-node
production cluster and have suffered data loss on other aliases (The alias
filter being the user-id) as well. We think the data loss is because of
this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 }
}

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Boaz_Leskes · June 22, 2014, 8:30pm

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:

github.com/elastic/elasticsearch

Don't delete local shard data when its allocated on a node that doesn't exists

opened 10:35AM - 18 Dec 13 UTC

closed 10:37AM - 18 Dec 13 UTC

kimchy

>bug v1.0.0.RC1 v0.90.8

This is an extreme case, exposed by a bug we had in our allocation in local gate…way, causing a cluster state that doesn't include a node in the nodes list, but still has the shard in the routing table pointing at the non existent node. Then, when a node on the same box comes back, it will cause the local shard data to be deleted because it thinks its fully allocated on other nodes.

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our 20-node
production cluster and have suffered data loss on other aliases (The alias
filter being the user-id) as well. We think the data loss is because of
this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1
} }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error
I lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 22, 2014, 9:27pm

Is there a stack trace or just INFO/TRACE messages that indicate the data
loss due to the error you described? -

github.com/elastic/elasticsearch

Don't delete local shard data when its allocated on a node that doesn't exists

opened 10:35AM - 18 Dec 13 UTC

closed 10:37AM - 18 Dec 13 UTC

kimchy

>bug v1.0.0.RC1 v0.90.8

This is an extreme case, exposed by a bug we had in our allocation in local gate…way, causing a cluster state that doesn't include a node in the nodes list, but still has the shard in the routing table pointing at the non existent node. Then, when a node on the same box comes back, it will cause the local shard data to be deleted because it thinks its fully allocated on other nodes.

What are the other, in fact all such edge cases where data loss might occur
due to node restarts in ES 0.90.2?

Also, after the node restarts in production, we saw another exception in
the ES logs. I found that ES fixed this in 0.18.0 (
Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub), however we
still encountered this in 0.90.2. This error was after the bulk update
error and also after node restart.

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start
shardorg.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)Caused by:
org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]Caused
by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)Caused by:
org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]Caused
by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb],
Unknown alias name was passed to alias Filter
at org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Thanks,
Rohit

On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes b.leskes@gmail.com wrote:

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:
Don't delete local shard data when its allocated on a node that doesn't exists · Issue #4502 · elastic/elasticsearch · GitHub

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our
20-node production cluster and have suffered data loss on other aliases
(The alias filter being the user-id) as well. We think the data loss is
because of this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <rohit.jaiswal@gmail.com

wrote:

Hi Boaz,
Thanks for replying. After we get this error, the
cluster health changes to Yellow with a replica shard in Unassigned state.
Is there a specific way to recover that shard? We dont want to lose other
data on that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire
request. You should indeed set the retry_on_conflict option to make the
update request succeed. PS - you should really upgrade - a lot has happened
and was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000
updates, never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com
wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1
} }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com
wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error
I lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8E7Ay4HbeV_AaB57Vi9mUL%3D%2BECnx9jFB0Cwbopk43YTrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 23, 2014, 8:02pm

Hi Boaz,
How can we fix this issue? (
Don't delete local shard data when its allocated on a node that doesn't exists · Issue #4502 · elastic/elasticsearch · GitHub)

               Will this work -
                1. Take a backup of the data and local gateway

directory of each ES node prior to node restart.
2. Disable routing allocation on each node.
3. Restart the node
4. Copy data and gateway from backup to node's data and
gateway directory.
5. Enable routing allocation
6. Based on recovery settings, after
gateway.recover_after_time seconds, index recovery will start from gateway.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes b.leskes@gmail.com wrote:

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:
Don't delete local shard data when its allocated on a node that doesn't exists · Issue #4502 · elastic/elasticsearch · GitHub

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our
20-node production cluster and have suffered data loss on other aliases
(The alias filter being the user-id) as well. We think the data loss is
because of this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <rohit.jaiswal@gmail.com

wrote:

Hi Boaz,
Thanks for replying. After we get this error, the
cluster health changes to Yellow with a replica shard in Unassigned state.
Is there a specific way to recover that shard? We dont want to lose other
data on that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire
request. You should indeed set the retry_on_conflict option to make the
update request succeed. PS - you should really upgrade - a lot has happened
and was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000
updates, never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com
wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1
} }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com
wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error
I lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8G%2BLhU4adb5nJ8V-7PWv%2BiwEK5PZKSV-rZYNsXyFR3yfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 23, 2014, 8:23pm

To further assist in this, here are our gateway settings -

gateway:
recover_after_nodes: 20
recover_after_time: 5m
expected_nodes: 20

Thanks,
Rohit

On Mon, Jun 23, 2014 at 1:02 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
How can we fix this issue? (
Don't delete local shard data when its allocated on a node that doesn't exists · Issue #4502 · elastic/elasticsearch · GitHub)
               Will this work -
                1. Take a backup of the data and local gateway
directory of each ES node prior to node restart.
2. Disable routing allocation on each node.
3. Restart the node
4. Copy data and gateway from backup to node's data
and gateway directory.
5. Enable routing allocation
6. Based on recovery settings, after
gateway.recover_after_time seconds, index recovery will start from gateway.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes b.leskes@gmail.com wrote:

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:
Don't delete local shard data when its allocated on a node that doesn't exists · Issue #4502 · elastic/elasticsearch · GitHub

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our
20-node production cluster and have suffered data loss on other aliases
(The alias filter being the user-id) as well. We think the data loss is
because of this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <
rohit.jaiswal@gmail.com> wrote:

Hi Boaz,
Thanks for replying. After we get this error, the
cluster health changes to Yellow with a replica shard in Unassigned state.
Is there a specific way to recover that shard? We dont want to lose other
data on that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire
request. You should indeed set the retry_on_conflict option to make the
update request succeed. PS - you should really upgrade - a lot has happened
and was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000
updates, never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com
wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" :
1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com
wrote:

I am getting java.lang.NullPointerException exception in my
Elasticsearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this
error I lose a shard and it has to be recreated.

version 0.90.2

gist: I am getting java.lang.NullPointerException exception in my ElasticSearch cluster logs when I am doing a _bulk update or just an _update. I am sending a lot of data to my clusters. After I get this error I lose a shard and it has to be recreated. · GitHub

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic
in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/
yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8F%2B5256w2oxgk7wW%2BRnNMS6WCQDkRipbBhoofEYFzZhdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Bulk API Indexing Status 500 Error NullPointerException[null] Elasticsearch	3	1867	July 6, 2017
TransportBulkAction - NullPointerException Elasticsearch	1	340	July 6, 2017
Failure during refresh-mapping 0.90.5 Elasticsearch	4	335	July 6, 2017
Bulk failure: [RemoteTransportException[[Dust][inet[/172.23.64.85:9300]][bulk/shard]]; nested: NullPointerException; ] Elasticsearch	2	409	July 6, 2017
NPE for bulk indexing via Java API Elasticsearch	4	969	July 6, 2017

0.90.2 _update or _bulk update causing NullPointerException in logs and I start losing shards

Related topics