How to fix primary-replica inconsistency?


(arta) #1

Hi,
I have a problem as same as described in here:
http://elasticsearch-users.115913.n3.nabble.com/BUG-Alternating-result-set-across-every-query-tt4021027.html

The same search query returns one document, then next time it returns none, and alternates.
If I add preference=_primary_first then I get one document every time.
So I think the primary shard has the document but the replica does not. (I have 1 replica)

My question here is how to fix this problem.
The discussion in the link above does not have a solution.

In addition, is there any config parameter that specifies how often or in what situation primary shards contents are reflected to their replicas?

Thanks for your help.


Docs missing from a replica
(qjh) #2

I have been seeing this same issue on a few indices. The primary and
replica are divergent and nothing seems to resolve it (they have been
refreshed and optimized). I've since worked around this by having to
recreate the index.

Does anyone have a good cluster state dump that could by used to open an
issue? There doesn't appear to be one for this yet:

On Thursday, September 13, 2012 5:29:48 PM UTC-4, arta wrote:

Hi,
I have a problem as same as described in here:

http://elasticsearch-users.115913.n3.nabble.com/BUG-Alternating-result-set-across-every-query-tt4021027.html

The same search query returns one document, then next time it returns
none,
and alternates.
If I add preference=_primary_first then I get one document every time.
So I think the primary shard has the document but the replica does not. (I
have 1 replica)

My question here is how to fix this problem.
The discussion in the link above does not have a solution.

In addition, is there any config parameter that specifies how often or in
what situation primary shards contents are reflected to their replicas?

Thanks for your help.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-fix-primary-replica-inconsistency-tp4022692.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


(Kurt Harriger) #3

Same issue here and same resolution at the moment. I just haven't had the
time yet to dig deep into this issue, but its something we will need to dig
into soon. At the moment we're limping along with a
?preference=_primary_first to prevent users from seeing the
inconsistencies, so our replica shard basically only acts as a hot standby
in case the master fails. We already had code that writes to
multiple indices simultaneously that we used to migrate from solr and/or
make schema changes, so we just use this code to keep two indices up to
date. This way when we discover an issue with one index we switch to the
other index drop the broken one and rebuild it without affecting our users.

We currently use the _status endpoint to identify if the replicas our out
of sync, I tried to write a script to do this and force an index failover
and reindex automatically but I wasn't how to determine if the numDocs
differed because the replica shard hasn't yet applied all the change sets
or if it was actually out of sync.

On Friday, September 14, 2012 8:59:29 AM UTC-6, qjh wrote:

I have been seeing this same issue on a few indices. The primary and
replica are divergent and nothing seems to resolve it (they have been
refreshed and optimized). I've since worked around this by having to
recreate the index.

Does anyone have a good cluster state dump that could by used to open an
issue? There doesn't appear to be one for this yet:
https://github.com/elasticsearch/elasticsearch/issues

On Thursday, September 13, 2012 5:29:48 PM UTC-4, arta wrote:

Hi,
I have a problem as same as described in here:

http://elasticsearch-users.115913.n3.nabble.com/BUG-Alternating-result-set-across-every-query-tt4021027.html

The same search query returns one document, then next time it returns
none,
and alternates.
If I add preference=_primary_first then I get one document every time.
So I think the primary shard has the document but the replica does not.
(I
have 1 replica)

My question here is how to fix this problem.
The discussion in the link above does not have a solution.

In addition, is there any config parameter that specifies how often or in
what situation primary shards contents are reflected to their replicas?

Thanks for your help.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-fix-primary-replica-inconsistency-tp4022692.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--


(arta) #4

Thanks for the reply, qjh, Kurt.
Seems like there is no way other than reindexing to fix the problem.

Kurt mentioned about numDocs, I suppose that means we can use curl with _count to determine whether there is inconsistency between shards.

I have millions of documents.
If there is a way to find out which documents are only in primary or only in replica, that information makes the reindexing a lot efficient.


(arta) #5

Kurt,
Can you please elaborate "We currently use the _status endpoint to identify if the replicas our out of sync" a little more?
So you compare each shard's num_docs?

As you mentioned, while indexing is running on, it will be difficult to to distinguish the cause of the number difference, i.e. by inconsistency or by replication delay.
What is your strategy to automatically discover the inconsistency?

Thanks again for your help!


(Kurt Harriger) #6

Yep, I basically just look to see how different the num_docs is. There is
also a translog/id which I would assume that if master and replica shard
have the same translog/id then they should have the same num_docs. I was
thinking perhaps writing a script that looks for any shards that have same
translog/id but different num_docs, but for now I just check it manually.
In any event here is a typical response I get back that appears out of
sync to me.

{

  • ok: true,
  • _shards:
    {
    • total: 20,
    • successful: 20,
    • failed: 0
      },
  • indices:
    {
    • default2:
      {
      • index:
        {
        • primary_size: "3.5gb",
        • primary_size_in_bytes: 3836722124,
        • size: "7gb",
        • size_in_bytes: 7606933703
          },
      • translog:
        {
        • operations: 1621
          },
      • docs:
        {
        • num_docs: 3065297,
        • max_doc: 3462692,
        • deleted_docs: 397395
          },
      • merges:
        {
        • current: 0,
        • current_docs: 0,
        • current_size: "0b",
        • current_size_in_bytes: 0,
        • total: 43004,
        • total_time: "3h",
        • total_time_in_millis: 10957007,
        • total_docs: 71491119,
        • total_size: "79.9gb",
        • total_size_in_bytes: 85824354912
          },
      • refresh:
        {
        • total: 226327,
        • total_time: "1.1h",
        • total_time_in_millis: 4198337
          },
      • flush:
        {
        • total: 3828,
        • total_time: "59.9m",
        • total_time_in_millis: 3596409
          },
      • shards:
        {
        • 0:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 0,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "373.5mb",
          - size_in_bytes: 391702523
          },
          - translog:
          {
          - id: 1347901831292,
          - operations: 74
          },
          - docs:
          {
          - num_docs: 307291,
          - max_doc: 353087,
          - deleted_docs: 45796
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2143,
          - total_time: "12.8m",
          - total_time_in_millis: 770853,
          - total_docs: 3179382,
          - total_size: "3.5gb",
          - total_size_in_bytes: 3843614285
          },
          - refresh:
          {
          - total: 11204,
          - total_time: "5.2m",
          - total_time_in_millis: 317926
          },
          - flush:
          {
          - total: 192,
          - total_time: "4.7m",
          - total_time_in_millis: 283969
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 0,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "362.3mb",
          - size_in_bytes: 379899395
          },
          - translog:
          {
          - id: 1347901831292,
          - operations: 74
          },
          - docs:
          {
          - num_docs: 303379,
          - max_doc: 343611,
          - deleted_docs: 40232
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2173,
          - total_time: "5.4m",
          - total_time_in_millis: 329683,
          - total_docs: 3825448,
          - total_size: "4.2gb",
          - total_size_in_bytes: 4573205157
          },
          - refresh:
          {
          - total: 11583,
          - total_time: "1.7m",
          - total_time_in_millis: 107880
          },
          - flush:
          {
          - total: 192,
          - total_time: "1.7m",
          - total_time_in_millis: 107054
          }
          }
          ],
        • 1:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 1,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "363.6mb",
          - size_in_bytes: 381298886
          },
          - translog:
          {
          - id: 1347901831415,
          - operations: 96
          },
          - docs:
          {
          - num_docs: 306295,
          - max_doc: 344589,
          - deleted_docs: 38294
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2148,
          - total_time: "13.5m",
          - total_time_in_millis: 815476,
          - total_docs: 3661059,
          - total_size: "4.1gb",
          - total_size_in_bytes: 4447223751
          },
          - refresh:
          {
          - total: 11265,
          - total_time: "5.2m",
          - total_time_in_millis: 312026
          },
          - flush:
          {
          - total: 191,
          - total_time: "4.3m",
          - total_time_in_millis: 258892
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 1,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "378.3mb",
          - size_in_bytes: 396779935
          },
          - translog:
          {
          - id: 1347901831415,
          - operations: 96
          },
          - docs:
          {
          - num_docs: 302098,
          - max_doc: 356028,
          - deleted_docs: 53930
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2168,
          - total_time: "4.9m",
          - total_time_in_millis: 297319,
          - total_docs: 3164854,
          - total_size: "3.5gb",
          - total_size_in_bytes: 3803854038
          },
          - refresh:
          {
          - total: 11656,
          - total_time: "1.7m",
          - total_time_in_millis: 107995
          },
          - flush:
          {
          - total: 191,
          - total_time: "1.8m",
          - total_time_in_millis: 109000
          }
          }
          ],
        • 2:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 2,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "341.9mb",
          - size_in_bytes: 358525806
          },
          - translog:
          {
          - id: 1347901772812,
          - operations: 90
          },
          - docs:
          {
          - num_docs: 305291,
          - max_doc: 328203,
          - deleted_docs: 22912
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2138,
          - total_time: "12.3m",
          - total_time_in_millis: 741442,
          - total_docs: 3817398,
          - total_size: "4.2gb",
          - total_size_in_bytes: 4560351643
          },
          - refresh:
          {
          - total: 11297,
          - total_time: "5.1m",
          - total_time_in_millis: 311758
          },
          - flush:
          {
          - total: 191,
          - total_time: "3.4m",
          - total_time_in_millis: 208984
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 2,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "355.3mb",
          - size_in_bytes: 372645152
          },
          - translog:
          {
          - id: 1347901772813,
          - operations: 90
          },
          - docs:
          {
          - num_docs: 306395,
          - max_doc: 339658,
          - deleted_docs: 33263
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2186,
          - total_time: "5.1m",
          - total_time_in_millis: 310026,
          - total_docs: 3408416,
          - total_size: "3.8gb",
          - total_size_in_bytes: 4086555557
          },
          - refresh:
          {
          - total: 11587,
          - total_time: "1.7m",
          - total_time_in_millis: 102844
          },
          - flush:
          {
          - total: 192,
          - total_time: "1.6m",
          - total_time_in_millis: 96582
          }
          }
          ],
        • 3:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 3,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "401.3mb",
          - size_in_bytes: 420864740
          },
          - translog:
          {
          - id: 1347901772798,
          - operations: 67
          },
          - docs:
          {
          - num_docs: 305251,
          - max_doc: 375754,
          - deleted_docs: 70503
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2124,
          - total_time: "12.4m",
          - total_time_in_millis: 747821,
          - total_docs: 3515161,
          - total_size: "3.9gb",
          - total_size_in_bytes: 4238118231
          },
          - refresh:
          {
          - total: 11146,
          - total_time: "5.2m",
          - total_time_in_millis: 317066
          },
          - flush:
          {
          - total: 191,
          - total_time: "3.9m",
          - total_time_in_millis: 235077
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 3,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "404.1mb",
          - size_in_bytes: 423736704
          },
          - translog:
          {
          - id: 1347901772799,
          - operations: 69
          },
          - docs:
          {
          - num_docs: 306550,
          - max_doc: 377493,
          - deleted_docs: 70943
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2175,
          - total_time: "5.2m",
          - total_time_in_millis: 316727,
          - total_docs: 3449317,
          - total_size: "3.8gb",
          - total_size_in_bytes: 4152370061
          },
          - refresh:
          {
          - total: 11438,
          - total_time: "1.8m",
          - total_time_in_millis: 109234
          },
          - flush:
          {
          - total: 192,
          - total_time: "2.1m",
          - total_time_in_millis: 126846
          }
          }
          ],
        • 4:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 4,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "381.6mb",
          - size_in_bytes: 400202939
          },
          - translog:
          {
          - id: 1347901831289,
          - operations: 98
          },
          - docs:
          {
          - num_docs: 305897,
          - max_doc: 359662,
          - deleted_docs: 53765
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2138,
          - total_time: "12.4m",
          - total_time_in_millis: 748422,
          - total_docs: 3747681,
          - total_size: "4.1gb",
          - total_size_in_bytes: 4482284498
          },
          - refresh:
          {
          - total: 11228,
          - total_time: "5.3m",
          - total_time_in_millis: 319209
          },
          - flush:
          {
          - total: 190,
          - total_time: "4m",
          - total_time_in_millis: 243675
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 4,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "350.3mb",
          - size_in_bytes: 367367654
          },
          - translog:
          {
          - id: 1347901831290,
          - operations: 97
          },
          - docs:
          {
          - num_docs: 302085,
          - max_doc: 331367,
          - deleted_docs: 29282
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2160,
          - total_time: "5.4m",
          - total_time_in_millis: 325400,
          - total_docs: 3663609,
          - total_size: "4gb",
          - total_size_in_bytes: 4395019498
          },
          - refresh:
          {
          - total: 11551,
          - total_time: "1.8m",
          - total_time_in_millis: 108807
          },
          - flush:
          {
          - total: 191,
          - total_time: "2.1m",
          - total_time_in_millis: 128884
          }
          }
          ],
        • 5:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 5,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "362.2mb",
          - size_in_bytes: 379848618
          },
          - translog:
          {
          - id: 1347901831423,
          - operations: 82
          },
          - docs:
          {
          - num_docs: 306784,
          - max_doc: 340679,
          - deleted_docs: 33895
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2125,
          - total_time: "13.2m",
          - total_time_in_millis: 795503,
          - total_docs: 3530900,
          - total_size: "3.9gb",
          - total_size_in_bytes: 4257253724
          },
          - refresh:
          {
          - total: 11088,
          - total_time: "5m",
          - total_time_in_millis: 302650
          },
          - flush:
          {
          - total: 192,
          - total_time: "4.6m",
          - total_time_in_millis: 277611
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 5,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "344.1mb",
          - size_in_bytes: 360850766
          },
          - translog:
          {
          - id: 1347901831423,
          - operations: 80
          },
          - docs:
          {
          - num_docs: 303711,
          - max_doc: 325543,
          - deleted_docs: 21832
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2154,
          - total_time: "5.6m",
          - total_time_in_millis: 337964,
          - total_docs: 3825832,
          - total_size: "4.2gb",
          - total_size_in_bytes: 4591861598
          },
          - refresh:
          {
          - total: 11433,
          - total_time: "1.7m",
          - total_time_in_millis: 104688
          },
          - flush:
          {
          - total: 192,
          - total_time: "1.5m",
          - total_time_in_millis: 91405
          }
          }
          ],
        • 6:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 6,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "361.9mb",
          - size_in_bytes: 379558406
          },
          - translog:
          {
          - id: 1347901831378,
          - operations: 54
          },
          - docs:
          {
          - num_docs: 306499,
          - max_doc: 343743,
          - deleted_docs: 37244
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2128,
          - total_time: "13.2m",
          - total_time_in_millis: 796938,
          - total_docs: 3601219,
          - total_size: "4gb",
          - total_size_in_bytes: 4305769584
          },
          - refresh:
          {
          - total: 11016,
          - total_time: "5m",
          - total_time_in_millis: 303480
          },
          - flush:
          {
          - total: 192,
          - total_time: "4.7m",
          - total_time_in_millis: 286890
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 6,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "391.3mb",
          - size_in_bytes: 410343610
          },
          - translog:
          {
          - id: 1347901831378,
          - operations: 54
          },
          - docs:
          {
          - num_docs: 302809,
          - max_doc: 366089,
          - deleted_docs: 63280
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2144,
          - total_time: "5.4m",
          - total_time_in_millis: 324305,
          - total_docs: 3584721,
          - total_size: "4gb",
          - total_size_in_bytes: 4300272618
          },
          - refresh:
          {
          - total: 11354,
          - total_time: "1.8m",
          - total_time_in_millis: 112237
          },
          - flush:
          {
          - total: 192,
          - total_time: "1.9m",
          - total_time_in_millis: 115561
          }
          }
          ],
        • 7:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 7,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "358.4mb",
          - size_in_bytes: 375896103
          },
          - translog:
          {
          - id: 1347901831301,
          - operations: 74
          },
          - docs:
          {
          - num_docs: 306144,
          - max_doc: 337143,
          - deleted_docs: 30999
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2125,
          - total_time: "13.2m",
          - total_time_in_millis: 796752,
          - total_docs: 3564015,
          - total_size: "3.9gb",
          - total_size_in_bytes: 4286994130
          },
          - refresh:
          {
          - total: 11048,
          - total_time: "5.2m",
          - total_time_in_millis: 316874
          },
          - flush:
          {
          - total: 191,
          - total_time: "3.9m",
          - total_time_in_millis: 236884
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 7,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "344.2mb",
          - size_in_bytes: 360935272
          },
          - translog:
          {
          - id: 1347901831301,
          - operations: 73
          },
          - docs:
          {
          - num_docs: 302416,
          - max_doc: 327545,
          - deleted_docs: 25129
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2163,
          - total_time: "5.6m",
          - total_time_in_millis: 336263,
          - total_docs: 3810258,
          - total_size: "4.2gb",
          - total_size_in_bytes: 4565132999
          },
          - refresh:
          {
          - total: 11416,
          - total_time: "1.7m",
          - total_time_in_millis: 105342
          },
          - flush:
          {
          - total: 191,
          - total_time: "1.3m",
          - total_time_in_millis: 80964
          }
          }
          ],
        • 8:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 8,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "330.7mb",
          - size_in_bytes: 346790325
          },
          - translog:
          {
          - id: 1347901772804,
          - operations: 66
          },
          - docs:
          {
          - num_docs: 305785,
          - max_doc: 315564,
          - deleted_docs: 9779
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2116,
          - total_time: "13.2m",
          - total_time_in_millis: 796557,
          - total_docs: 3818340,
          - total_size: "4.2gb",
          - total_size_in_bytes: 4556889217
          },
          - refresh:
          {
          - total: 11042,
          - total_time: "5m",
          - total_time_in_millis: 305252
          },
          - flush:
          {
          - total: 191,
          - total_time: "3.5m",
          - total_time_in_millis: 215837
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 8,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "349.5mb",
          - size_in_bytes: 366535393
          },
          - translog:
          {
          - id: 1347901772805,
          - operations: 64
          },
          - docs:
          {
          - num_docs: 306854,
          - max_doc: 334557,
          - deleted_docs: 27703
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2175,
          - total_time: "4.9m",
          - total_time_in_millis: 299631,
          - total_docs: 3358427,
          - total_size: "3.7gb",
          - total_size_in_bytes: 4019011835
          },
          - refresh:
          {
          - total: 11362,
          - total_time: "1.7m",
          - total_time_in_millis: 102257
          },
          - flush:
          {
          - total: 192,
          - total_time: "1.8m",
          - total_time_in_millis: 109720
          }
          }
          ],
        • 9:
          [

          {
          - routing:
          {
          - state: "STARTED",
          - primary: false,
          - node: "95gp_xKVRra472UxiDygiA",
          - relocating_node: null,
          - shard: 9,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "350.8mb",
          - size_in_bytes: 367854076
          },
          - translog:
          {
          - id: 1347901772820,
          - operations: 111
          },
          - docs:
          {
          - num_docs: 305293,
          - max_doc: 333894,
          - deleted_docs: 28601
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2130,
          - total_time: "12.3m",
          - total_time_in_millis: 740824,
          - total_docs: 3148931,
          - total_size: "3.5gb",
          - total_size_in_bytes: 3773382780
          },
          - refresh:
          {
          - total: 11139,
          - total_time: "5.5m",
          - total_time_in_millis: 330083
          },
          - flush:
          {
          - total: 191,
          - total_time: "4.1m",
          - total_time_in_millis: 247113
          }
          },

          {
          - routing:
          {
          - state: "STARTED",
          - primary: true,
          - node: "tdhHCSEBSaKLmsaE_E4Gzw",
          - relocating_node: null,
          - shard: 9,
          - index: "default2"
          },
          - state: "STARTED",
          - index:
          {
          - size: "348.3mb",
          - size_in_bytes: 365297400
          },
          - translog:
          {
          - id: 1347901772820,
          - operations: 112
          },
          - docs:
          {
          - num_docs: 306588,
          - max_doc: 332081,
          - deleted_docs: 25493
          },
          - merges:
          {
          - current: 0,
          - current_docs: 0,
          - current_size: "0b",
          - current_size_in_bytes: 0,
          - total: 2191,
          - total_time: "5.4m",
          - total_time_in_millis: 329101,
          - total_docs: 3816151,
          - total_size: "4.2gb",
          - total_size_in_bytes: 4585189708
          },
          - refresh:
          {
          - total: 11474,
          - total_time: "1.6m",
          - total_time_in_millis: 100729
          },
          - flush:
          {
          - total: 191,
          - total_time: "2.2m",
          - total_time_in_millis: 135461
          }
          }
          ]
          }
          }
          }

}

--


(arta) #7

Thanks, again, Kurt,
I looked at _status output and I think what I found was inconsistency disappears over time.
I wrote a simple ruby script to process _status?pretty=true output (see below)
I took a couple of sample data and saw many shards have different translog id but the num_docs seem ok.
What does translog id actually mean?
What is your concern if replicas have different translog id?

One more question for anybody knows ES well:
Some document seemed to take hours until it appeared in the replica (or maybe viceversa).
What brings this time lag?

----------------------------------------- here's my script --------------
if (ARGV.length < 1)
puts $PROGRAM_NAME + ' '
exit
end

@mode = :waiting_index
@info = {}

open(ARGV[0], 'r') {|f|
f.each_line {|line|
case @mode
when :waiting_index
if (line =~ /"(i[0-9]+)" : /)
@index = $1
@info[@index] = {}
@mode = :waiting_shard_no
end
when :waiting_shard_no
if (line =~ /"([0-9]+)" : [/)
@shard = $1
@info[@index][@shard] = {}
@mode = :handling_shard
@subshard = 0
@info[@index][@shard][@subshard] = {}
end
when :handling_shard
if (line =~ /"primary" : (\w+),/)
@info[@index][@shard][@subshard][:primary] = $1
elsif (line =~ /"node" : "(\S+)",/)
@info[@index][@shard][@subshard][:node] = $1
elsif (line =~ /"id" : ([0-9]+)/)
@info[@index][@shard][@subshard][:translog_id] = $1
elsif (line =~ /"num_docs" : ([0-9]+)/)
@info[@index][@shard][@subshard][:num_docs] = $1
elsif (line =~ /^\s+}, {\s?$/)
@subshard += 1
@info[@index][@shard][@subshard] = {}
elsif (line =~ /^\s+} ],\s?$/)
@mode = :waiting_shard_no
elsif (line =~ /^\s+} ]\s?$/)
@mode = :waiting_index
end
end
}
}

def dump_info(info)
"tranalog=#{info[:translog_id]} num_docs=#{info[:num_docs]} node=#{info[:node]}" +
(info[:primary] == "true" ? " primary" : " ")
end

def dump_diff(idx, shard, i, info, ref_i, ref_info)
idxShard = "#{idx}-#{shard}"
"#{idxShard} [#{ref_i}]: #{dump_info(ref_info)}\n" +
" " * idxShard.length + " [#{i}]: #{dump_info(info)}"
end

def sort_keys(col)
col.keys.sort {|a,b|
if (a.class != String || a.length == b.length)
a <=> b
else
a.length <=> b.length
end
}
end

sort_keys(@info).each {|idx| idx_info = @info[idx]
sort_keys(idx_info).each {|shard| shard_info = idx_info[shard]
translogs = []
sort_keys(shard_info).each {|i| info = shard_info[i]
if (translogs.empty?)
translogs << { i => info }
else
if (info[:num_docs] != translogs[0].values[0][:num_docs])
puts dump_diff(idx, shard, i, info, translogs[0].keys[0], translogs[0].values[0]) + " DIFFERENT COUNT"
elsif (info[:translog_id] != translogs[0].values[0][:translog_id])
puts dump_diff(idx, shard, i, info, translogs[0].keys[0], translogs[0].values[0])
end
end
}
}
}


(Kurt Harriger) #8

I assume that the translog/id represents the position in the transaction log. If the replica does not have the same id then I would assume it is still catching up to the master and one would expect that the num_docs may differ as the replica is still replaying changes. However, if they are at the same position in the transaction log and the num_docs is different then I would assume that the replica should have the same num_docs as the master and if not then why not… that is the question. I still haven't had any time to dig into the elastic search source code so that is just my assumptions, perhaps a committer could clarify.

--
Kurt Harriger
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Thursday, September 20, 2012 at 11:05 AM, arta wrote:

Thanks, again, Kurt,
I looked at _status output and I think what I found was inconsistency
disappears over time.
I wrote a simple ruby script to process _status?pretty=true output (see
below)
I took a couple of sample data and saw many shards have different translog
id but the num_docs seem ok.
What does translog id actually mean?
What is your concern if replicas have different translog id?

One more question for anybody knows ES well:
Some document seemed to take hours until it appeared in the replica (or
maybe viceversa).
What brings this time lag?

----------------------------------------- here's my script --------------
if (ARGV.length < 1)
puts $PROGRAM_NAME + ' '
exit
end

@mode = :waiting_index
@info = {}

open(ARGV[0], 'r') {|f|
f.each_line {|line|
case @mode
when :waiting_index
if (line =~ /"(i[0-9]+)" : /)
@index = $1
@info[@index] = {}
@mode = :waiting_shard_no
end
when :waiting_shard_no
if (line =~ /"([0-9]+)" : [/)
@shard = $1
@info[@index][@shard] = {}
@mode = :handling_shard
@subshard = 0
@info[@index][@shard][@subshard] = {}
end
when :handling_shard
if (line =~ /"primary" : (\w+),/)
@info[@index][@shard][@subshard][:primary] = $1
elsif (line =~ /"node" : "(\S+)",/)
@info[@index][@shard][@subshard][:node] = $1
elsif (line =~ /"id" : ([0-9]+)/)
@info[@index][@shard][@subshard][:translog_id] = $1
elsif (line =~ /"num_docs" : ([0-9]+)/)
@info[@index][@shard][@subshard][:num_docs] = $1
elsif (line =~ /^\s+}, {\s?$/)
@subshard += 1
@info[@index][@shard][@subshard] = {}
elsif (line =~ /^\s+} ],\s?$/)
@mode = :waiting_shard_no
elsif (line =~ /^\s+} ]\s?$/)
@mode = :waiting_index
end
end
}
}

def dump_info(info)
"tranalog=#{info[:translog_id]} num_docs=#{info[:num_docs]}
node=#{info[:node]}" +
(info[:primary] == "true" ? " primary" : " ")
end

def dump_diff(idx, shard, i, info, ref_i, ref_info)
idxShard = "#{idx}-#{shard}"
"#{idxShard} [#{ref_i}]: #{dump_info(ref_info)}\n" +
" " * idxShard.length + " [#{i}]: #{dump_info(info)}"
end

def sort_keys(col)
col.keys.sort {|a,b|
if (a.class != String || a.length == b.length)
a <=> b
else
a.length <=> b.length
end
}
end

sort_keys(@info).each {|idx| idx_info = @info[idx]
sort_keys(idx_info).each {|shard| shard_info = idx_info[shard]
translogs = []
sort_keys(shard_info).each {|i| info = shard_info[i]
if (translogs.empty?)
translogs << { i => info }
else
if (info[:num_docs] != translogs[0].values[0][:num_docs])
puts dump_diff(idx, shard, i, info, translogs[0].keys[0],
translogs[0].values[0]) + " DIFFERENT COUNT"
elsif (info[:translog_id] != translogs[0].values[0][:translog_id])
puts dump_diff(idx, shard, i, info, translogs[0].keys[0],
translogs[0].values[0])
end
end
}
}
}

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-fix-primary-replica-inconsistency-tp4022692p4022926.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com (http://Nabble.com).

--

--


(es_learner) #9

May I know which version of ES you are using? I'm on 0.19.2 and have been hitting primary only. BUT lately, because of increased traffic causing high CPU spikes, I am planning to load-balance reads across my 5 replicas, 3 servers cluster. Reading this thread gives me pause. Any new info will help me greatly.

Thanks.

curl localhost:9200/?version


(Shay Banon) #10

The translation id does not indicate "consistency" between shards, its
internal to each shard. Which version are you using? Also, anything
interesting in the logs (failures)?

On Thursday, September 20, 2012 7:34:00 PM UTC+2, Kurt Harriger wrote:

I assume that the translog/id represents the position in the transaction
log. If the replica does not have the same id then I would assume it is
still catching up to the master and one would expect that the num_docs may
differ as the replica is still replaying changes. However, if they are at
the same position in the transaction log and the num_docs is different then
I would assume that the replica should have the same num_docs as the master
and if not then why not… that is the question. I still haven't had any
time to dig into the elastic search source code so that is just my
assumptions, perhaps a committer could clarify.

--
Kurt Harriger
Sent with Sparrow http://www.sparrowmailapp.com/?sig

On Thursday, September 20, 2012 at 11:05 AM, arta wrote:

Thanks, again, Kurt,
I looked at _status output and I think what I found was inconsistency
disappears over time.
I wrote a simple ruby script to process _status?pretty=true output (see
below)
I took a couple of sample data and saw many shards have different translog
id but the num_docs seem ok.
What does translog id actually mean?
What is your concern if replicas have different translog id?

One more question for anybody knows ES well:
Some document seemed to take hours until it appeared in the replica (or
maybe viceversa).
What brings this time lag?

----------------------------------------- here's my script --------------
if (ARGV.length < 1)
puts $PROGRAM_NAME + ' '
exit
end

@mode = :waiting_index
@info = {}

open(ARGV[0], 'r') {|f|
f.each_line {|line|
case @mode
when :waiting_index
if (line =~ /"(i[0-9]+)" : /)
@index = $1
@info[@index] = {}
@mode = :waiting_shard_no
end
when :waiting_shard_no
if (line =~ /"([0-9]+)" : [/)
@shard = $1
@info[@index][@shard] = {}
@mode = :handling_shard
@subshard = 0
@info[@index][@shard][@subshard] = {}
end
when :handling_shard
if (line =~ /"primary" : (\w+),/)
@info[@index][@shard][@subshard][:primary] = $1
elsif (line =~ /"node" : "(\S+)",/)
@info[@index][@shard][@subshard][:node] = $1
elsif (line =~ /"id" : ([0-9]+)/)
@info[@index][@shard][@subshard][:translog_id] = $1
elsif (line =~ /"num_docs" : ([0-9]+)/)
@info[@index][@shard][@subshard][:num_docs] = $1
elsif (line =~ /^\s+}, {\s?$/)
@subshard += 1
@info[@index][@shard][@subshard] = {}
elsif (line =~ /^\s+} ],\s?$/)
@mode = :waiting_shard_no
elsif (line =~ /^\s+} ]\s?$/)
@mode = :waiting_index
end
end
}
}

def dump_info(info)
"tranalog=#{info[:translog_id]} num_docs=#{info[:num_docs]}
node=#{info[:node]}" +
(info[:primary] == "true" ? " primary" : " ")
end

def dump_diff(idx, shard, i, info, ref_i, ref_info)
idxShard = "#{idx}-#{shard}"
"#{idxShard} [#{ref_i}]: #{dump_info(ref_info)}\n" +
" " * idxShard.length + " [#{i}]: #{dump_info(info)}"
end

def sort_keys(col)
col.keys.sort {|a,b|
if (a.class != String || a.length == b.length)
a <=> b
else
a.length <=> b.length
end
}
end

sort_keys(@info).each {|idx| idx_info = @info[idx]
sort_keys(idx_info).each {|shard| shard_info = idx_info[shard]
translogs = []
sort_keys(shard_info).each {|i| info = shard_info[i]
if (translogs.empty?)
translogs << { i => info }
else
if (info[:num_docs] != translogs[0].values[0][:num_docs])
puts dump_diff(idx, shard, i, info, translogs[0].keys[0],
translogs[0].values[0]) + " DIFFERENT COUNT"
elsif (info[:translog_id] != translogs[0].values[0][:translog_id])
puts dump_diff(idx, shard, i, info, translogs[0].keys[0],
translogs[0].values[0])
end
end
}
}
}

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-fix-primary-replica-inconsistency-tp4022692p4022926.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--

--


(arta) #11

In my case, 0.19.3 it is.
I found the discrepancy disappeared after a while.
In some case it took days, but most of cases within an hour.
I don't see any related entry in logs.

--


(Kurt Harriger) #12

Using version 0.19.8 here. I haven't looked much deeper into the issue since adding preference=_primary_first. Without primary_first the hit count would change on nearly every query. Given that same transaction id does not indicate that the replica is caught up with the master its possible that the issue might resolve itself given enough time, however when we first encountered the hit counts alternating between values the issue persisted for two days on test accounts so I don't think it is likely that time alone would have resolved it.

How does one determine if the shards are inconsistent or just behind? I don't know, but it would be nice if there was a way to get a more definitive answer to this question.

--
Kurt Harriger
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Thursday, October 18, 2012 at 11:34 AM, arta wrote:

In my case, 0.19.3 it is.
I found the discrepancy disappeared after a while.
In some case it took days, but most of cases within an hour.
I don't see any related entry in logs.

--


(Clinton Gormley) #13

How does one determine if the shards are inconsistent or just behind?
I don't know, but it would be nice if there was a way to get a more
definitive answer to this question.

The shards should be neither inconsistent nor behind. The one exception
might be where you use 'async' indexing.

You can use the cluster_health API (with eg level=shards) to get a view
of your cluster.

clint

--


(Shay Banon) #14

One more thing, make sure not to confuse primary-replica inconsistency with "counts" being different. Let me explain, but first, note that 0.19.5 and later 0.19.7 fixed bugs that might resolve in inconsistencies (though under quite extreme cases, i.e. only managed to recreate it in a test that ran for 5 days with nodes constantly being killed).

Back to the "inconsistency" part, if you keep on indexing into an index, obviously, you will see some "inconsistencies" between calls as data keeps being added to the index. Also, those changes will be visible as shards will be refreshed (by default, 1s by default).

Also, when executing a search, with sorting, for example, based on _score (the default), some docs will have the same _score, and there isn't consistency there (unless using additional sort field). So, when executing it "once" and then another time, it might hit other shard copies, and sorting there based on same _score value would be different.

Note, the above problems, even though both shards have the same data, it might seem like they are giving different results.

How do you solve it? The simplest way that I personally like is to use the preference option, but using a dynamic value. If you do preference=[user_id] (for example), then for the same user id, the same shard copies will be hit.

On Oct 19, 2012, at 12:37 PM, Clinton Gormley clint@traveljury.com wrote:

How does one determine if the shards are inconsistent or just behind?
I don't know, but it would be nice if there was a way to get a more
definitive answer to this question.

The shards should be neither inconsistent nor behind. The one exception
might be where you use 'async' indexing.

You can use the cluster_health API (with eg level=shards) to get a view
of your cluster.

clint

--

--


(Filirom1) #15

I confirm the issue on ElasticSearch 0.19.11 .

With the bulk API, I reindex all my data. Now the indexation has finished
but the total number of hits is different on primary and replica :

?preference=_primary_first
hits.total: 12209124

?preference=_replica_first
hits.total: 12209202

I think this problem is the same as this one :

When I try to index the same bulk of documents (on an empty index)
sometimes I have an error and sometimes not.

I think this is the cause of inconsitency between primary and replica.

Cheers
Romain

2012/10/21 kimchy@gmail.com

One more thing, make sure not to confuse primary-replica inconsistency
with "counts" being different. Let me explain, but first, note that 0.19.5
and later 0.19.7 fixed bugs that might resolve in inconsistencies (though
under quite extreme cases, i.e. only managed to recreate it in a test that
ran for 5 days with nodes constantly being killed).

Back to the "inconsistency" part, if you keep on indexing into an index,
obviously, you will see some "inconsistencies" between calls as data keeps
being added to the index. Also, those changes will be visible as shards
will be refreshed (by default, 1s by default).

Also, when executing a search, with sorting, for example, based on _score
(the default), some docs will have the same _score, and there isn't
consistency there (unless using additional sort field). So, when executing
it "once" and then another time, it might hit other shard copies, and
sorting there based on same _score value would be different.

Note, the above problems, even though both shards have the same data, it
might seem like they are giving different results.

How do you solve it? The simplest way that I personally like is to use the
preference option, but using a dynamic value. If you do
preference=[user_id] (for example), then for the same user id, the same
shard copies will be hit.

On Oct 19, 2012, at 12:37 PM, Clinton Gormley clint@traveljury.com
wrote:

How does one determine if the shards are inconsistent or just behind?
I don't know, but it would be nice if there was a way to get a more
definitive answer to this question.

The shards should be neither inconsistent nor behind. The one exception
might be where you use 'async' indexing.

You can use the cluster_health API (with eg level=shards) to get a view
of your cluster.

clint

--

--

--


(Jörg Prante) #16

Hi Filirom1,

if you see mapping exceptions, something in your JSON data style is
inconsistent, see my comment

Jörg

--


(Filirom1) #17

Yes I know, but I can't change what the users inject in ElasticSearch.

The point is that sometimes an inconsistent JSON is accepted by ES.

2012/11/16 Jörg Prante joergprante@gmail.com

Hi Filirom1,

if you see mapping exceptions, something in your JSON data style is
inconsistent, see my comment

https://github.com/elasticsearch/elasticsearch/issues/2354#issuecomment-10453428

Jörg

--

--


(Tal Shemesh) #18

Hi,

we are facing the same issue with 0.90.11.
we have a shard that it's primary size is 1.9gb and the replica is 1gb.
did you manage to solve the problem?
if so, how can we fix it?

On Friday, November 16, 2012 7:21:16 PM UTC+2, Filirom1 wrote:

Yes I know, but I can't change what the users inject in ElasticSearch.

The point is that sometimes an inconsistent JSON is accepted by ES.

2012/11/16 Jörg Prante <joerg...@gmail.com <javascript:>>

Hi Filirom1,

if you see mapping exceptions, something in your JSON data style is
inconsistent, see my comment

https://github.com/elasticsearch/elasticsearch/issues/2354#issuecomment-10453428

Jörg

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86314b57-e292-491e-94ac-92a5a35b8344%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #19

Hi,

A difference in disk size doesn't mean that they don't have the same
content since one of the replicas might just have run a large merge that
saved disk space. Nevertheless, you can force shards to be re-replicated by
using the update setting API[1] to temporarily set the number of replicas
to 0 (this will deallocate replicas) and then back to the original value
(which will cause replicas to be bulk-copied from the primaries).

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#indices-update-settings

On Sun, Apr 13, 2014 at 8:41 AM, Tal Shemesh tal.shemesh@gmail.com wrote:

Hi,

we are facing the same issue with 0.90.11.
we have a shard that it's primary size is 1.9gb and the replica is 1gb.
did you manage to solve the problem?
if so, how can we fix it?

On Friday, November 16, 2012 7:21:16 PM UTC+2, Filirom1 wrote:

Yes I know, but I can't change what the users inject in ElasticSearch.

The point is that sometimes an inconsistent JSON is accepted by ES.

2012/11/16 Jörg Prante joerg...@gmail.com

Hi Filirom1,

if you see mapping exceptions, something in your JSON data style is
inconsistent, see my comment
https://github.com/elasticsearch/elasticsearch/issues/2354#issuecomment-
10453428

Jörg

--

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/86314b57-e292-491e-94ac-92a5a35b8344%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/86314b57-e292-491e-94ac-92a5a35b8344%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6ofT81ApiPMnx%2BFvogfp_sHQyqKWMmTStDWFq%2B5Cvttg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #20