0.90.2 _update or _bulk update causing NullPointerException in logs and I start losing shards


(Eric Sites) #1

I am getting java.lang.NullPointerException exception in my ElasticSearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #2

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them
might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my ElasticSearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Eric Sites) #3

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before in
es 0.90.0.
I also changed the name of new_tracking to match the name of the action in
the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException in
logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them
might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my ElasticSearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #4

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_sites@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before in
es 0.90.0.
I also changed the name of new_tracking to match the name of the action in
the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them
might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my ElasticSearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose
a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/**EricSites/6152468https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Eric Sites) #5

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0 and
wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert. The
fix was to use a different param name for different scripts, about 10 unique
scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates, never
the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index with 1
replication, so 200 shards total over 6 servers. Each shard is about 10.4 GB
in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException in
logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_sites@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before in es
0.90.0.
I also changed the name of new_tracking to match the name of the action in the
params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.leskes@gmail.com
Reply-To: elasticsearch@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasticsearch@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException in
logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the bulk
calls. Can you also post one from a failed _update? Cross checking them might
help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my ElasticSearch
cluster logs when I am doing a _bulk update or just an _update.
I am sending a lot of data to my clusters. After I get this error I lose a
shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468
https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists. There
should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the tracking
object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google
Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch%2Bunsubscribe@googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #6

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: https://github.com/elasticsearch/elasticsearch/issues/3448 . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert.
The fix was to use a different param name for different scripts, about 10
unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates, never
the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index with
1 replication, so 200 shards total over 6 servers. Each shard is about 10.4
GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes <b.le...@gmail.com <javascript:>>
Reply-To: <elasti...@googlegroups.com <javascript:>>
Date: Monday, August 5, 2013 5:38 PM
To: <elasti...@googlegroups.com <javascript:>>
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites <eric_...@mac.com <javascript:>

wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before
in es 0.90.0.
I also changed the name of new_tracking to match the name of the action
in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes <b.le...@gmail.com <javascript:>>
Reply-To: <elasti...@googlegroups.com <javascript:>>
Date: Monday, August 5, 2013 10:35 AM
To: <elasti...@googlegroups.com <javascript:>>
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I lose
a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/**EricSites/6152468https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx.source.tracking['some**action'] != null) {
ctx.source.tracking.some**action += param1;
} else {
ctx.source.tracking['some**action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(rohit.jaiswal@gmail.com) #7

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
https://github.com/elasticsearch/elasticsearch/issues/3448 . Thanks!! The
fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert.
The fix was to use a different param name for different scripts, about 10
unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing NullPointerException
in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the failed
logs from the _update (non bulk call). A failed script shouldn't cause
shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before
in es 0.90.0.
I also changed the name of new_tracking to match the name of the action
in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecf3d1cf-8b21-44cb-8b17-76f818805a7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Boaz Leskes) #8

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request. You
should indeed set the retry_on_conflict option to make the update request
succeed. PS - you should really upgrade - a lot has happened and was fixed
since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
https://github.com/elasticsearch/elasticsearch/issues/3448 . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an upcert.
The fix was to use a different param name for different scripts, about 10
unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed before
in es 0.90.0.
I also changed the name of new_tracking to match the name of the action
in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(rohit.jaiswal@gmail.com) #9

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
https://github.com/elasticsearch/elasticsearch/issues/3448 . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from 0.90.0
and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Boaz Leskes) #10

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
https://github.com/elasticsearch/elasticsearch/issues/3448 . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from the
bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(rohit.jaiswal@gmail.com) #11

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our 20-node
production cluster and have suffered data loss on other aliases (The alias
filter being the user-id) as well. We think the data loss is because of
this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
https://github.com/elasticsearch/elasticsearch/issues/3448 . Thanks!!
The fix is part of 0.90.3, so I'd recommend upgrading. This is a secondary
problem which occurs when two requests try to update the same document at
exactly the same time. One of them succeeds and the other fails with a
version conflict (that error was masked by the error you were seeing). You
can use (or increase) the retry_on_conflict parameter to make the failing
request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1 index
with 1 replication, so 200 shards total over 6 servers. Each shard is about
10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1 }
}

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error I
lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not exists.
There should only be one of these and it should not be an array of these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Boaz Leskes) #12

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our 20-node
production cluster and have suffered data loss on other aliases (The alias
filter being the user-id) as well. We think the data loss is because of
this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
Thanks for replying. After we get this error, the cluster
health changes to Yellow with a replica shard in Unassigned state. Is there
a specific way to recover that shard? We dont want to lose other data on
that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire request.
You should indeed set the retry_on_conflict option to make the update
request succeed. PS - you should really upgrade - a lot has happened and
was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed it:
https://github.com/elasticsearch/elasticsearch/issues/3448 .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000 updates,
never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1
} }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error
I lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(rohit.jaiswal@gmail.com) #13

Is there a stack trace or just INFO/TRACE messages that indicate the data
loss due to the error you described? -

What are the other, in fact all such edge cases where data loss might occur
due to node restarts in ES 0.90.2?

Also, after the node restarts in production, we saw another exception in
the ES logs. I found that ES fixed this in 0.18.0 (
https://github.com/elasticsearch/elasticsearch/issues/1198), however we
still encountered this in 0.90.2. This error was after the bulk update
error and also after node restart.

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start
shardorg.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)Caused by:
org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]Caused
by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)Caused by:
org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]Caused
by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb],
Unknown alias name was passed to alias Filter
at org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Thanks,
Rohit

On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes b.leskes@gmail.com wrote:

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:
https://github.com/elasticsearch/elasticsearch/issues/4502

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our
20-node production cluster and have suffered data loss on other aliases
(The alias filter being the user-id) as well. We think the data loss is
because of this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <rohit.jaiswal@gmail.com

wrote:

Hi Boaz,
Thanks for replying. After we get this error, the
cluster health changes to Yellow with a replica shard in Unassigned state.
Is there a specific way to recover that shard? We dont want to lose other
data on that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire
request. You should indeed set the retry_on_conflict option to make the
update request succeed. PS - you should really upgrade - a lot has happened
and was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: https://github.com/elasticsearch/elasticsearch/issues/3448 .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000
updates, never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com
wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1
} }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com
wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error
I lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8E7Ay4HbeV_AaB57Vi9mUL%3D%2BECnx9jFB0Cwbopk43YTrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(rohit.jaiswal@gmail.com) #14

Hi Boaz,
How can we fix this issue? (
https://github.com/elasticsearch/elasticsearch/issues/4502)

               Will this work -
                1. Take a backup of the data and local gateway

directory of each ES node prior to node restart.
2. Disable routing allocation on each node.
3. Restart the node
4. Copy data and gateway from backup to node's data and
gateway directory.
5. Enable routing allocation
6. Based on recovery settings, after
gateway.recover_after_time seconds, index recovery will start from gateway.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes b.leskes@gmail.com wrote:

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:
https://github.com/elasticsearch/elasticsearch/issues/4502

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our
20-node production cluster and have suffered data loss on other aliases
(The alias filter being the user-id) as well. We think the data loss is
because of this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <rohit.jaiswal@gmail.com

wrote:

Hi Boaz,
Thanks for replying. After we get this error, the
cluster health changes to Yellow with a replica shard in Unassigned state.
Is there a specific way to recover that shard? We dont want to lose other
data on that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire
request. You should indeed set the retry_on_conflict option to make the
update request succeed. PS - you should really upgrade - a lot has happened
and was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: https://github.com/elasticsearch/elasticsearch/issues/3448 .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000
updates, never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com
wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" : 1
} }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com
wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this error
I lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8G%2BLhU4adb5nJ8V-7PWv%2BiwEK5PZKSV-rZYNsXyFR3yfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(rohit.jaiswal@gmail.com) #15

To further assist in this, here are our gateway settings -

gateway:
recover_after_nodes: 20
recover_after_time: 5m
expected_nodes: 20

Thanks,
Rohit

On Mon, Jun 23, 2014 at 1:02 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Boaz,
How can we fix this issue? (
https://github.com/elasticsearch/elasticsearch/issues/4502)

               Will this work -
                1. Take a backup of the data and local gateway

directory of each ES node prior to node restart.
2. Disable routing allocation on each node.
3. Restart the node
4. Copy data and gateway from backup to node's data
and gateway directory.
5. Enable routing allocation
6. Based on recovery settings, after
gateway.recover_after_time seconds, index recovery will start from gateway.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes b.leskes@gmail.com wrote:

Not that I know of. But there is a known but very rare bug (fixed in
0.90.8) which can cause data loss upon a node restart:
https://github.com/elasticsearch/elasticsearch/issues/4502

Maybe you run into that?

On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Yes, it did when we restarted the node while trying to reproduce this
problem. We also were able to access the data using the Scan search api
after restarting the node.

However we have seen quite a few of the bulk update errors in our
20-node production cluster and have suffered data loss on other aliases
(The alias filter being the user-id) as well. We think the data loss is
because of this bulk update error.

Is there a chance of losing data on shards when enough of these bulk
updates happen concurrently on multiple aliases (users)?

Thanks

On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes b.leskes@gmail.com wrote:

If you restart the node it's on, it doesn't come back?

On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <
rohit.jaiswal@gmail.com> wrote:

Hi Boaz,
Thanks for replying. After we get this error, the
cluster health changes to Yellow with a replica shard in Unassigned state.
Is there a specific way to recover that shard? We dont want to lose other
data on that shard.

Thanks,
Rohit

On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes b.leskes@gmail.com
wrote:

Hi Rohit,

This issue means update fails anyway, but it breaks the entire
request. You should indeed set the retry_on_conflict option to make the
update request succeed. PS - you should really upgrade - a lot has happened
and was fixed since 0.90.2 ...

Cheers,
Boaz

On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:

Hi Boaz,
We are using 0.90.2 and run into this issue. As i
understand, one option is to upgrade to 0.90.3. If we continue using 0.90.2
and use (increase) retry_on_conflict, we will not see the problem? Please
clarify.

Thanks,
Rohit
On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:

HI Eric,

OK. Based on the gist you sent, i tracked down a problem at fixed
it: https://github.com/elasticsearch/elasticsearch/issues/3448 .
Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This is a
secondary problem which occurs when two requests try to update the same
document at exactly the same time. One of them succeeds and the other fails
with a version conflict (that error was masked by the error you were
seeing). You can use (or increase) the retry_on_conflict parameter to make
the failing request try again.

I'm still curious about your reporting of loosing replicas. Can you
elaborate more about what happens? Do you see anything in the logs?

Cheers,
Boaz

On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:

Boaz,

Sorry but I no longer have those logs, I upgraded to 0.90.2 from
0.90.0 and wiped the logs when I did.
I did the upgrade to use the _bulk api for my update.

Basically the "lang", "js" was not the issue.

I was using different scripts with the same set of params and an
upcert. The fix was to use a different param name for different scripts,
about 10 unique scripts in total.

I was losing replicated shards about every 10,000 to 30,000
updates, never the primary shard.

I have 185 million + large json documents, with 100 shards in 1
index with 1 replication, so 200 shards total over 6 servers. Each shard is
about 10.4 GB in size.
About 2 TB of data, 1 TB primary, 1 TB replicated.

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 5:38 PM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

Glad to hear you solved it. It would be great if you can share the
failed logs from the _update (non bulk call). A failed script shouldn't
cause shards to drop so I would like to research it some more.

Cheers,
Boaz

On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites eric_...@mac.com
wrote:

Boaz,

I found and fixed the problem.

I added the "lang", "js" to the update json, that was not needed
before in es 0.90.0.
I also changed the name of new_tracking to match the name of the
action in the params section.
So for example the script now looks like this:

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_some_action;
}

"params" : { "param1" : 1, "new_some_action" : { "some_action" :
1 } }

Cheers,
Eric Sites

From: Boaz Leskes b.le...@gmail.com
Reply-To: elasti...@googlegroups.com
Date: Monday, August 5, 2013 10:35 AM
To: elasti...@googlegroups.com
Subject: Re: 0.90.2 _update or _bulk update causing
NullPointerException in logs and I start losing shards

Hi Eric,

This is interesting. The log stack trace from the gist comes from
the bulk calls. Can you also post one from a failed _update? Cross checking
them might help pin pointing the issue.

Cheers,
Boaz

On Monday, August 5, 2013 1:34:16 AM UTC+2, eric_...@mac.com
wrote:

I am getting java.lang.NullPointerException exception in my
ElasticSearch cluster logs when I am doing a _bulk update or just an
_update.
I am sending a lot of data to my clusters. After I get this
error I lose a shard and it has to be recreated.

version 0.90.2

gist: https://gist.github.com/EricSites/6152468

I get this using the _bulk api or just normal _update api.

My update script is a little complicated.
I am adding a tracking object to my document if it does not
exists. There should only be one of these and it should not be an array of
these.
If the object does exists, I am trying to add a new field to the
tracking object to keep track on counts.
So if the field does not exists I create it, else just += to it.

if (ctx._source['tracking'] != null) {
if (ctx._source.tracking['some_action'] != null) {
ctx._source.tracking.some_action += param1;
} else {
ctx._source.tracking['some_action'] = 1;
}
} else {
ctx._source.tracking = new_tracking;
}

Here is my mapping for this:
{
"sample" : {
"index_options" : "docs",
"properties" : {
"tracking" : {
"type" : "object",
"dynamic" : true
}
}
}
}

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic
in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/
yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8F%2B5256w2oxgk7wW%2BRnNMS6WCQDkRipbBhoofEYFzZhdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #16