Primary vs. replica shard inconsistencies?

xavier1 · January 30, 2014, 10:28pm

We're currently running ElasticSearch 0.90.5. When doing the same search
across different query heads, I'm seeing an inconsistent number of results.
However, if I add preference=_primary (or _primary_first) I get the same
results across the board. I have checked that all query heads report the
same nodes in the cluster. This makes me think that the replica shards are
not consistent with the primary shards. This index is not actively being
written to, so I executed a flush and refresh manually. That didn't seem to
change anything. Anyone have any ideas on what this could be? Is this
possibly a bug?

Thanks in advance.

-Xavier

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82ae0c25-9603-401d-ad04-f26287d630f8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Binh_Ly · January 30, 2014, 10:44pm

Xavier, can you post an example of 1 full query and then also show how the
results of this one query is inconsistent? Just trying to understand what
is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul_Smith · January 31, 2014, 12:00am

If you can narrow down a specific few IDs of results that appear/disappear
based on the primary/replica shard, and confirm through an explicit GET of
that ID with the preference=_local on the primary shard & replica for that
result. To work out which shard # a specific ID belongs to, you can run
this query:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1
http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"fields" : ,
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica. You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result. You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10. The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results. We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread:
http://elasticsearch-users.115913.n3.nabble.com/Deleted-items-appears-when-searching-a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it is
to relocate the replica shard to another host. The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
which can help detect inconsistencies in your cluster. I also have a tool
not yet published to github that can help check these Primary/Replica
inconsistencies if that would help (you pass a list of IDs to it and it'll
check whether they're flappy between the primary & replica or not). It can
also help automate the rebuilding of just the replica shards by shunting
them around (rather than a full rolling restart of ALL the shards, just the
shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly binh@hibalo.com wrote:

Xavier, can you post an example of 1 full query and then also show how the
results of this one query is inconsistent? Just trying to understand what
is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHfYWB51LsR5XH6G5VX3KcGdPU8mVUc-eEiROPS1wjwQGkaobg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

xavier1 · January 31, 2014, 12:45am

We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
Interestingly query heads in the same rack give the same results. We don't
do deletes at all on these indices so that shouldn't be an issue.
Unfortunately at the moment I can't do preference=_local while getting the
_id(s) directly because we don't allow access on 9200 on our worker nodes.
I might be able to right some code to figure this out though. Either way
here's my id results from the different heads.

esq2.r6 gets 28 total results
esq3.r7 gets 9 total results

$curl -XGET
"http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19337 100 19337 0 0 1039k 0 --:--:-- --:--:-- --:--:--
1049k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

$curl -XGET
"http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 6808 100 6808 0 0 70082 0 --:--:-- --:--:-- --:--:--
70185
"1sAGREtMSfK8OIxZErm8RQ"
"7hFYs6y-QG6wGYEkoBKmdg"
"aELtGN6DQpmdRlQbr8i0uA"
"Fx4l6_axSGCxpyFm7C7BSQ"
"HAFmGcWuQAylxGjmnZZkSQ"
"H-eP-33FREOtq7v0uBPWbQ"
"QRmY8R2MQemuePb0EkYxWA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"

And here is es3.r7 with preference=_primary_first:

$curl -XGET
"http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first"
| jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19335 100 19335 0 0 871k 0 --:--:-- --:--:-- --:--:--
899k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

On Thursday, January 30, 2014 4:00:49 PM UTC-8, tallpsmith wrote:

If you can narrow down a specific few IDs of results that appear/disappear
based on the primary/replica shard, and confirm through an explicit GET of
that ID with the preference=_local on the primary shard & replica for that
result. To work out which shard # a specific ID belongs to, you can run
this query:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1
http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"fields" : ,
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica. You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result. You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10. The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results. We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread:
http://elasticsearch-users.115913.n3.nabble.com/Deleted-items-appears-when-searching-a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it is
to relocate the replica shard to another host. The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
which can help detect inconsistencies in your cluster. I also have a tool
not yet published to github that can help check these Primary/Replica
inconsistencies if that would help (you pass a list of IDs to it and it'll
check whether they're flappy between the primary & replica or not). It can
also help automate the rebuilding of just the replica shards by shunting
them around (rather than a full rolling restart of ALL the shards, just the
shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly <bi...@hibalo.com <javascript:>> wrote:

Xavier, can you post an example of 1 full query and then also show how
the results of this one query is inconsistent? Just trying to understand
what is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul_Smith · January 31, 2014, 12:59am

the flappy detection tool I have connects to the cluster using the standard
java autodiscovery mechanism, and, works out which shards are involved, and
then creates explicit TransportClient connection to each host, so would
need access to 9300 (the SMILE based protocol port). Would that help? (is
9300 accessible from a host that can run java ?

On 31 January 2014 11:45, xavier@gaikai.com wrote:

We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
Interestingly query heads in the same rack give the same results. We don't
do deletes at all on these indices so that shouldn't be an issue.
Unfortunately at the moment I can't do preference=_local while getting the
_id(s) directly because we don't allow access on 9200 on our worker nodes.
I might be able to right some code to figure this out though. Either way
here's my id results from the different heads.

esq2.r6 gets 28 total results
esq3.r7 gets 9 total results

$curl -XGET "
http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19337 100 19337 0 0 1039k 0 --:--:-- --:--:-- --:--:--
1049k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

$curl -XGET "
http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 6808 100 6808 0 0 70082 0 --:--:-- --:--:-- --:--:--
70185
"1sAGREtMSfK8OIxZErm8RQ"
"7hFYs6y-QG6wGYEkoBKmdg"
"aELtGN6DQpmdRlQbr8i0uA"
"Fx4l6_axSGCxpyFm7C7BSQ"
"HAFmGcWuQAylxGjmnZZkSQ"
"H-eP-33FREOtq7v0uBPWbQ"
"QRmY8R2MQemuePb0EkYxWA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"

And here is es3.r7 with preference=_primary_first:

$curl -XGET "
http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first"
| jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19335 100 19335 0 0 871k 0 --:--:-- --:--:-- --:--:--
899k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

On Thursday, January 30, 2014 4:00:49 PM UTC-8, tallpsmith wrote:

If you can narrow down a specific few IDs of results that
appear/disappear based on the primary/replica shard, and confirm through an
explicit GET of that ID with the preference=_local on the primary shard &
replica for that result. To work out which shard # a specific ID belongs
to, you can run this query:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1
http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"fields" : ,
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica. You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result. You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10. The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results. We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread: http://elasticsearch-users.
115913.n3.nabble.com/Deleted-items-appears-when-searching-
a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it
is to relocate the replica shard to another host. The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
which can help detect inconsistencies in your cluster. I also have a tool
not yet published to github that can help check these Primary/Replica
inconsistencies if that would help (you pass a list of IDs to it and it'll
check whether they're flappy between the primary & replica or not). It can
also help automate the rebuilding of just the replica shards by shunting
them around (rather than a full rolling restart of ALL the shards, just the
shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly bi...@hibalo.com wrote:

Xavier, can you post an example of 1 full query and then also show how
the results of this one query is inconsistent? Just trying to understand
what is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHfYWB50LGse5BghNJjo8a-RVkYtRiyVpS-p5d%3DH%3DZGWxk7PAg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

xavier1 · January 31, 2014, 1:28am

Yeah, that should work. I'll take a look at that and see if it can help
pinpoint problematic shards.

On Thursday, January 30, 2014 4:59:04 PM UTC-8, tallpsmith wrote:

the flappy detection tool I have connects to the cluster using the
standard java autodiscovery mechanism, and, works out which shards are
involved, and then creates explicit TransportClient connection to each
host, so would need access to 9300 (the SMILE based protocol port). Would
that help? (is 9300 accessible from a host that can run java ?

On 31 January 2014 11:45, <xav...@gaikai.com <javascript:>> wrote:

We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
Interestingly query heads in the same rack give the same results. We don't
do deletes at all on these indices so that shouldn't be an issue.
Unfortunately at the moment I can't do preference=_local while getting the
_id(s) directly because we don't allow access on 9200 on our worker nodes.
I might be able to right some code to figure this out though. Either way
here's my id results from the different heads.

esq2.r6 gets 28 total results
esq3.r7 gets 9 total results

$curl -XGET "
http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19337 100 19337 0 0 1039k 0 --:--:-- --:--:-- --:--:--
1049k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

$curl -XGET "
http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 6808 100 6808 0 0 70082 0 --:--:-- --:--:-- --:--:--
70185
"1sAGREtMSfK8OIxZErm8RQ"
"7hFYs6y-QG6wGYEkoBKmdg"
"aELtGN6DQpmdRlQbr8i0uA"
"Fx4l6_axSGCxpyFm7C7BSQ"
"HAFmGcWuQAylxGjmnZZkSQ"
"H-eP-33FREOtq7v0uBPWbQ"
"QRmY8R2MQemuePb0EkYxWA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"

And here is es3.r7 with preference=_primary_first:

$curl -XGET "
http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first"
| jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19335 100 19335 0 0 871k 0 --:--:-- --:--:-- --:--:--
899k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

On Thursday, January 30, 2014 4:00:49 PM UTC-8, tallpsmith wrote:

If you can narrow down a specific few IDs of results that
appear/disappear based on the primary/replica shard, and confirm through an
explicit GET of that ID with the preference=_local on the primary shard &
replica for that result. To work out which shard # a specific ID belongs
to, you can run this query:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1
http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"fields" : ,
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica. You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result. You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10. The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results. We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread: http://elasticsearch-users.
115913.n3.nabble.com/Deleted-items-appears-when-searching-
a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it
is to relocate the replica shard to another host. The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
which can help detect inconsistencies in your cluster. I also have a tool
not yet published to github that can help check these Primary/Replica
inconsistencies if that would help (you pass a list of IDs to it and it'll
check whether they're flappy between the primary & replica or not). It can
also help automate the rebuilding of just the replica shards by shunting
them around (rather than a full rolling restart of ALL the shards, just the
shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly bi...@hibalo.com wrote:

Xavier, can you post an example of 1 full query and then also show how
the results of this one query is inconsistent? Just trying to understand
what is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a61c780e-b26b-458d-8d0c-c7e19f1bab18%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul_Smith · January 31, 2014, 1:29am

if it helps at all, i've pushed the flappy item detector tool (cough)
here:

https://github.com/Aconex/es-flappyitem-detector

We have a simple 3-node cluster, 5 shards, 1 replica, so I'm sure there's
code in there that is built around those assumptions, but should be easily
modified to suit your purpose perhaps.

cheers,

Paul

On 31 January 2014 11:59, Paul Smith tallpsmith@gmail.com wrote:

the flappy detection tool I have connects to the cluster using the
standard java autodiscovery mechanism, and, works out which shards are
involved, and then creates explicit TransportClient connection to each
host, so would need access to 9300 (the SMILE based protocol port). Would
that help? (is 9300 accessible from a host that can run java ?

On 31 January 2014 11:45, xavier@gaikai.com wrote:

We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
Interestingly query heads in the same rack give the same results. We don't
do deletes at all on these indices so that shouldn't be an issue.
Unfortunately at the moment I can't do preference=_local while getting the
_id(s) directly because we don't allow access on 9200 on our worker nodes.
I might be able to right some code to figure this out though. Either way
here's my id results from the different heads.

esq2.r6 gets 28 total results
esq3.r7 gets 9 total results

$curl -XGET "
http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19337 100 19337 0 0 1039k 0 --:--:-- --:--:-- --:--:--
1049k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

$curl -XGET "
http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100" | jq
'.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 6808 100 6808 0 0 70082 0 --:--:-- --:--:-- --:--:--
70185
"1sAGREtMSfK8OIxZErm8RQ"
"7hFYs6y-QG6wGYEkoBKmdg"
"aELtGN6DQpmdRlQbr8i0uA"
"Fx4l6_axSGCxpyFm7C7BSQ"
"HAFmGcWuQAylxGjmnZZkSQ"
"H-eP-33FREOtq7v0uBPWbQ"
"QRmY8R2MQemuePb0EkYxWA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"

And here is es3.r7 with preference=_primary_first:

$curl -XGET "
http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first"
| jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19335 100 19335 0 0 871k 0 --:--:-- --:--:-- --:--:--
899k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

On Thursday, January 30, 2014 4:00:49 PM UTC-8, tallpsmith wrote:

If you can narrow down a specific few IDs of results that
appear/disappear based on the primary/replica shard, and confirm through an
explicit GET of that ID with the preference=_local on the primary shard &
replica for that result. To work out which shard # a specific ID belongs
to, you can run this query:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1
http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"fields" : ,
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica. You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result. You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10. The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results. We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread: http://elasticsearch-users.
115913.n3.nabble.com/Deleted-items-appears-when-searching-
a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it
is to relocate the replica shard to another host. The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
which can help detect inconsistencies in your cluster. I also have a tool
not yet published to github that can help check these Primary/Replica
inconsistencies if that would help (you pass a list of IDs to it and it'll
check whether they're flappy between the primary & replica or not). It can
also help automate the rebuilding of just the replica shards by shunting
them around (rather than a full rolling restart of ALL the shards, just the
shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly bi...@hibalo.com wrote:

Xavier, can you post an example of 1 full query and then also show how
the results of this one query is inconsistent? Just trying to understand
what is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHfYWB5%3DUVzDmvmNHHmhkGnFaEAV37-c2cnMPKSvuqsJ3MzC2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Paul_Smith · January 31, 2014, 1:38am

if you do use it, don't forget we build for ES 0.19, so change the pom.xml
to your ES version otherwise it won't connect...

On 31 January 2014 12:29, Paul Smith tallpsmith@gmail.com wrote:

if it helps at all, i've pushed the flappy item detector tool (cough)
here:

https://github.com/Aconex/es-flappyitem-detector

We have a simple 3-node cluster, 5 shards, 1 replica, so I'm sure there's
code in there that is built around those assumptions, but should be easily
modified to suit your purpose perhaps.

cheers,

Paul

On 31 January 2014 11:59, Paul Smith tallpsmith@gmail.com wrote:

the flappy detection tool I have connects to the cluster using the
standard java autodiscovery mechanism, and, works out which shards are
involved, and then creates explicit TransportClient connection to each
host, so would need access to 9300 (the SMILE based protocol port). Would
that help? (is 9300 accessible from a host that can run java ?

On 31 January 2014 11:45, xavier@gaikai.com wrote:

We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
Interestingly query heads in the same rack give the same results. We don't
do deletes at all on these indices so that shouldn't be an issue.
Unfortunately at the moment I can't do preference=_local while getting the
_id(s) directly because we don't allow access on 9200 on our worker nodes.
I might be able to right some code to figure this out though. Either way
here's my id results from the different heads.

esq2.r6 gets 28 total results
esq3.r7 gets 9 total results

$curl -XGET "
http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100" |
jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19337 100 19337 0 0 1039k 0 --:--:-- --:--:-- --:--:--
1049k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

$curl -XGET "
http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100" |
jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 6808 100 6808 0 0 70082 0 --:--:-- --:--:-- --:--:--
70185
"1sAGREtMSfK8OIxZErm8RQ"
"7hFYs6y-QG6wGYEkoBKmdg"
"aELtGN6DQpmdRlQbr8i0uA"
"Fx4l6_axSGCxpyFm7C7BSQ"
"HAFmGcWuQAylxGjmnZZkSQ"
"H-eP-33FREOtq7v0uBPWbQ"
"QRmY8R2MQemuePb0EkYxWA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"

And here is es3.r7 with preference=_primary_first:

$curl -XGET "
http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first"
| jq '.hits.hits._id' | sort
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 19335 100 19335 0 0 871k 0 --:--:-- --:--:-- --:--:--
899k
"0LcI_px4SZy5ZQkI_V7Qyw"
"1sAGREtMSfK8OIxZErm8RQ"
"6IV2v4TFTr-Gl1eC6hrj0Q"
"6nwMexTHQBmFxfykOgKqWA"
"7hFYs6y-QG6wGYEkoBKmdg"
"9MTM10SeQ2yqWIb08oPnFA"
"aELtGN6DQpmdRlQbr8i0uA"
"AUHUg6k0QZOf_oGjsjSsGA"
"Bo_u1eYGSF2LeU78kbcFZg"
"EWs1K8YsR9-IBSAWK6ld7A"
"Fx4l6_axSGCxpyFm7C7BSQ"
"gpCrAZrNTNezWPfensER3g"
"HAFmGcWuQAylxGjmnZZkSQ"
"HB4Kwz3RSWWH5NHvyH4JMg"
"H-eP-33FREOtq7v0uBPWbQ"
"_IH6W4DoTRmdms0FJNlg4g"
"iK_3TbzcSj2-MbMXip_XFg"
"J4bjPFIcQ1ewrQqjN2qz6Q"
"kfonMDBuR--UIhkyM2cWrg"
"Kr6-9-3uR9Wp2923n-O2NA"
"Nw_9rjwvQ62u-HsuWIm53A"
"QRmY8R2MQemuePb0EkYxWA"
"usloSJzQRzCpOQ8bxKi2vA"
"w9NGEWg-QiivMpjyurYKrA"
"wKy-YzB-TK2lnK86Sx2RBA"
"y2ZmJ-_GRAmi3eHy1y8jzw"
"ZmFj7w4hR5Cvy-owCLmZ1Q"
"ZmlndPBLT-ivuOxm_A7yDA"

On Thursday, January 30, 2014 4:00:49 PM UTC-8, tallpsmith wrote:

If you can narrow down a specific few IDs of results that
appear/disappear based on the primary/replica shard, and confirm through an
explicit GET of that ID with the preference=_local on the primary shard &
replica for that result. To work out which shard # a specific ID belongs
to, you can run this query:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1
http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"fields" : ,
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica. You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result. You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10. The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results. We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread: http://elasticsearch-users.
115913.n3.nabble.com/Deleted-items-appears-when-searching-
a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it
is to relocate the replica shard to another host. The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (Aconex · GitHub
scrutineer) which can help detect inconsistencies in your cluster. I
also have a tool not yet published to github that can help check these
Primary/Replica inconsistencies if that would help (you pass a list of IDs
to it and it'll check whether they're flappy between the primary & replica
or not). It can also help automate the rebuilding of just the replica
shards by shunting them around (rather than a full rolling restart of ALL
the shards, just the shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly bi...@hibalo.com wrote:

Xavier, can you post an example of 1 full query and then also show how
the results of this one query is inconsistent? Just trying to understand
what is inconsistent. Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHfYWB7J2%2B%2B5c6SC2AUm4jDiukLsmm8V7KN35PycSBOBMVsCoA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Primary and Replica shards are giving different results for same query Elasticsearch	1	621	May 23, 2019
Inconsistency between primary shard and replicas around _source Elasticsearch	2	569	October 9, 2017
How to fix primary-replica inconsistency? Elasticsearch	19	4821	July 6, 2017
Inconsistent results while querying on a index Elasticsearch	10	8451	July 5, 2017
Different results with/without preference=_primary_first/_replica_first using count API Elasticsearch	1	434	July 6, 2017

Primary vs. replica shard inconsistencies?

Related topics