Excess shards

Bug or bad config ?

  1. Claster configured on 2 nodes for:
    index.number_of_shards: 5
    index.number_of_replicas: 2

  2. Ok, logstash create index and work ok (purged late)
    logstash-2015.06.16 0 r STARTED 0 144b 10.1.1.9 elk
    logstash-2015.06.16 0 p STARTED 0 108b 10.1.1.8 graylog2-server
    logstash-2015.06.16 1 r STARTED 0 144b 10.1.1.9 elk
    logstash-2015.06.16 1 p STARTED 0 108b 10.1.1.8 graylog2-server
    .kibana 0 r STARTED 4 17.1kb 10.1.1.9 elk
    .kibana 0 p STARTED 4 17.1kb 10.1.1.8 graylog2-server

  3. Next day created node 3, tested and stoped. No new logstash indexes

  4. Next days late Claster restarted on old 2 nodes and have status:green

  5. logstash create index and Claster have "status":"yellow"
    5.1 http://192.168.120.84:9200/_cluster/health
    {"cluster_name":"pagent2","status":"yellow","timed_out":false,"number_of_nodes":3,"number_of_data_nodes":2,"active_primary_shards":9,"active_shards":16,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":4,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}

5.2 http://192.168.120.84:9200/_cat/shards

logstash-2015.06.16 0 r STARTED     0   144b 10.1.1.9 elk             
logstash-2015.06.16 0 p STARTED     0   108b 10.1.1.8 graylog2-server 
logstash-2015.06.16 1 r STARTED     0   144b 10.1.1.9 elk             
logstash-2015.06.16 1 p STARTED     0   108b 10.1.1.8 graylog2-server 
logstash-2015.06.22 0 r STARTED     9 36.7kb 10.1.1.9 elk             
logstash-2015.06.22 0 p STARTED     9 29.1kb 10.1.1.8 graylog2-server 
logstash-2015.06.22 0 r UNASSIGNED                                    
logstash-2015.06.22 1 r STARTED    12 38.8kb 10.1.1.9 elk             
logstash-2015.06.22 1 p STARTED    12 38.9kb 10.1.1.8 graylog2-server 
logstash-2015.06.22 1 r UNASSIGNED                                    
.kibana             0 r STARTED     4 17.1kb 10.1.1.9 elk             
.kibana             0 p STARTED     4 17.1kb 10.1.1.8 graylog2-server 
... skipped

6 . As we can see elasticsearch lost node, but create excess UNASSIGNED shard

  • We can't fix it by "reroute" without restore node 3.
  • We can't delete UNASSIGNED shard
  • We can only use like curl -XPUT 'localhost:9200/logstash-2015.06.22/_settings' -d '{"index" : {"number_of_replicas" : 1 } }'
    {"acknowledged":true}

and got not empty:
logstash-2015.06.22 0 r STARTED 9 36.7kb 10.1.1.9 elk
logstash-2015.06.22 0 p STARTED 9 29.1kb 10.1.1.8 graylog2-server
logstash-2015.06.22 1 r STARTED 12 38.8kb 10.1.1.9 elk
logstash-2015.06.22 1 p STARTED 12 38.9kb 10.1.1.8 graylog2-server

Claster - green

BUT now Kibana Not found any data from ALL indecies like logstash-2015.06.23.

in logs:

node1

[2015-06-23 04:06:00,566][INFO ][cluster.metadata         ] [graylog2-server] updating number_of_replicas to [1] for indices [logstash-2015.06.22]

node 2

    [2015-06-23 03:45:56,494][DEBUG][action.search.type       ] [elk] [logstash-2015.06.22][0], node[Fv-QuHW2TsKMJSXvtRxNuQ], [R], s[STARTED]: Failed to execute [org.elasticse
java.lang.ClassCastException: org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData
        at org.elasticsearch.search.aggregations.support.AggregationContext.numericField(AggregationContext.java:160)
        at org.elasticsearch.search.aggregations.support.AggregationContext.valuesSource(AggregationContext.java:137)
        at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.create(ValuesSourceAggregatorFactory.java:53)
        at org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
        at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:157)
        at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:79)
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:100)
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:289)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:300)
        at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
        at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
        at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

7 . Ok. curl -XDELETE 'http://localhost:9200/logstash-2015.06.23/'
{"acknowledged":true}

8 . restarts all - shards ok
logstash-2015.06.22 0 r STARTED 9 29.1kb 10.1.1.8 graylog2-server
logstash-2015.06.22 0 p STARTED 9 36.7kb 10.1.1.9 elk
logstash-2015.06.22 1 r STARTED 12 38.9kb 10.1.1.8 graylog2-server
logstash-2015.06.22 1 p STARTED 12 38.8kb 10.1.1.9 elk

Kibana Not found any data -\

9 . Refresh site

  • logstash-2015.06.23 - recreate 10 ok and 5(!) UNASSIGNED shards (from config = 5 ?)
  • Kibana- Courier Fetch: 9 of 12 shards failed.
  • set again "number_of_replicas" : 1. claster- green

10 . Now kibana have strange discovers:

I will wait new UNASSIGNED shards in future and new Failed to execute [org.elasticse
org.elasticsearch.transport.RemoteTransportException in logs.

How fix it ?

p.s. small bug - we need use "unset http_proxy" because curl ignore any "host_name:9200"

This isn't a bug :smile:

You cannot assign more than one replica to a node, just like you cannot assign a primary and a replica to the same node, which is why those shards are unassigned. However you can delete the unassigned shards by setting replicas to 1, as you have done.

How are you checking this?

Also, why are you running with 2 replicas? It seems a bit excessive.

But 3 shards was auto created 2 assigned by elasticsearch for new index. The question was - why did it ?
And additional in health: "number_of_nodes":3,"number_of_data_nodes":2. But i have only 2 work nodes. It seems 1 node used as master and data at the same time.

I tried search saved indexes or * and got nothing

For test "1 node crashed - data still reading". Failed, because state changed to red.

ES will create as many shards and replicas as you tell it to.
So you'd need to check the creation curl, the mapping and/or template, as well as the ES config.

To check what indices you have use _cat/indices.

This is true, but the idea the ES must control a lack of nodes and not create UNASSIGNED shards.