N of N shards failed on Kibana after upgrading from 6.3.2 to 6.4.2


(Li Cui) #1

Hello there,

We recently upgraded our ELK from 6.3.2 to 6.4.2 and I noticed many of the dashboards started having errors like "N of M shards failed".
Also in the elasticsearch log, I saw the following:

==============
[2018-11-03T12:05:15,657][DEBUG][o.e.a.s.TransportSearchAction] [Elastinode03] [metricbeat-6.3.2-2018.11.03][2], node[9BK8ywvQRwyKT90OXGddww], [P], s[STARTED], a[id=l5wMagQhQnSD7-rrosofPQ]
: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[metricbeat-6.4.1-2018.10.01, metricbeat-6.4.1-2018.10.02, metricbeat-6.4.1-2018.10.03, metricbeat-6.3.2-2018.09.20,
metricbeat-6.3.2-2018.10.09, metricbeat-6.4.1-2018.09.19, metricbeat-6.3.2-2018.10.08, metricbeat-6.4.2-2018.10.29, metricbeat-6.3.2-2018.10.07, metricbeat-6.3.2-2018.10.06, metricbeat-6.3.
2-2018.10.05... (almost all metribeat 6.3.2 indices)...
indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards
_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=, routing='null', preference='1541264530052', requestCache=null, scroll=nul
l, maxConcurrentShardRequests=15, batchedReduceSize=512, preFilterShardSize=64, allowPartialSearchResults=true, source={ many fields...)...
"aggregations":{"1":{"avg":{"field":"system.cpu.user.pct"}}}}}}}}}] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: [hlsoelse1a-02][10.100.35.177:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [beat.name] in order to load fielddata in memory by uninverting the inv
erted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

Is there anything we should do to fix this?

Thanks a lot in advance

Li


(Li Cui) #2

This is output on the same elasticnode:

curl -X GET "https://elasticnode3:9200/_cluster/stats?human&pretty" -k
{
"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"cluster_name" : "hlsmtc",
"timestamp" : 1541266150537,
"status" : "green",
"indices" : {
"count" : 707,
"shards" : {
"total" : 4311,
"primaries" : 2155,
"replication" : 1.0004640371229698,
"index" : {
"shards" : {
"min" : 2,
"max" : 10,
"avg" : 6.097595473833097
},
"primaries" : {
"min" : 1,
"max" : 5,
"avg" : 3.048090523338048
},
"replication" : {
"min" : 1.0,
"max" : 2.0,
"avg" : 1.0014144271570014
}
}
},
"docs" : {
"count" : 1706884793,
"deleted" : 1118462
},
"store" : {
"size" : "636.9gb",
"size_in_bytes" : 683960705355
},
"fielddata" : {
"memory_size" : "6.3mb",
"memory_size_in_bytes" : 6641160,
"evictions" : 0
},
"query_cache" : {
"memory_size" : "328.7mb",
"memory_size_in_bytes" : 344737278,
"total_count" : 12051049,
"hit_count" : 189111,
"miss_count" : 11861938,
"cache_size" : 3855,
"cache_count" : 6781,
"evictions" : 2926
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 30826,
"memory" : "2.2gb",
"memory_in_bytes" : 2422690768,
"terms_memory" : "1.6gb",
"terms_memory_in_bytes" : 1801506514,
"stored_fields_memory" : "352.5mb",
"stored_fields_memory_in_bytes" : 369670176,
"term_vectors_memory" : "0b",
"term_vectors_memory_in_bytes" : 0,
"norms_memory" : "29.4mb",
"norms_memory_in_bytes" : 30908480,
"points_memory" : "103.5mb",
"points_memory_in_bytes" : 108629958,
"doc_values_memory" : "106.7mb",
"doc_values_memory_in_bytes" : 111975640,
"index_writer_memory" : "95.9mb",
"index_writer_memory_in_bytes" : 100587218,
"version_map_memory" : "21.1mb",
"version_map_memory_in_bytes" : 22133189,
"fixed_bit_set" : "1mb",
"fixed_bit_set_memory_in_bytes" : 1085256,
"max_unsafe_auto_id_timestamp" : 1541231846997,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 3,
"data" : 3,
"coordinating_only" : 0,
"master" : 3,
"ingest" : 3
},
"versions" : [
"6.4.2"
],
"os" : {
"available_processors" : 12,
"allocated_processors" : 12,
"names" : [
{
"name" : "Linux",
"count" : 3
}
],
"mem" : {
"total" : "46.5gb",
"total_in_bytes" : 49968709632,
"free" : "508.5mb",
"free_in_bytes" : 533209088,
"used" : "46gb",
"used_in_bytes" : 49435500544,
"free_percent" : 1,
"used_percent" : 99
}
},
"process" : {
"cpu" : {
"percent" : 48
},
"open_file_descriptors" : {
"min" : 9784,
"max" : 25861,
"avg" : 19880
}
},
"jvm" : {
"max_uptime" : "2.7d",
"max_uptime_in_millis" : 237214523,
"versions" : [
{
"version" : "1.8.0_172",
"vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
"vm_version" : "25.172-b11",
"vm_vendor" : "Oracle Corporation",
"count" : 1
},
{
"version" : "1.8.0_191",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "25.191-b12",
"vm_vendor" : "Oracle Corporation",
"count" : 2
}
],
"mem" : {
"heap_used" : "12.8gb",
"heap_used_in_bytes" : 13752094352,
"heap_max" : "23.9gb",
"heap_max_in_bytes" : 25665208320
},
"threads" : 463
},
"fs" : {
"total" : "1.4tb",
"total_in_bytes" : 1610455449600,
"free" : "843.4gb",
"free_in_bytes" : 905666670592,
"available" : "843.4gb",
"available_in_bytes" : 905666670592
},
"plugins" : [
{
"name" : "ingest-geoip",
"version" : "6.4.2",
"elasticsearch_version" : "6.4.2",
"java_version" : "1.8",
"description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
"classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
"extended_plugins" : ,
"has_native_controller" : false
},
{
"name" : "discovery-ec2",
"version" : "6.4.2",
"elasticsearch_version" : "6.4.2",
"java_version" : "1.8",
"description" : "The EC2 discovery plugin allows to use AWS API for the unicast discovery mechanism.",
"classname" : "org.elasticsearch.discovery.ec2.Ec2DiscoveryPlugin",
"extended_plugins" : ,
"has_native_controller" : false
},
{
"name" : "repository-s3",
"version" : "6.4.2",
"elasticsearch_version" : "6.4.2",
"java_version" : "1.8",
"description" : "The S3 repository plugin adds S3 repositories",
"classname" : "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
"extended_plugins" : ,
"has_native_controller" : false
}
],
"network_types" : {
"transport_types" : {
"security4" : 3
},
"http_types" : {
"security4" : 3
}
}
}
}

=============
I ran the following query and got {}.... what did this mean?
curl -XGET 'https://elasticnode3:9200/_template/metricbeat?pretty=true' -k
{ }

This is very important, after upgrade, almost all of our dashboards stopped working, they are all the default dashboards from V6.3.2...

Please help. Thanks

Li


(Li Cui) #3

curl -X PUT "https://elasticnode03:9200/metricbeat-*/_mapping/_doc" -H 'Content-Type: application/json' -d'
{
"properties": {
"beat.name": {
"type": "text",
"fielddata": true
}
}
}
'

On all Elasticnodes... and got:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[elasticnode01][xx.xxx.xx.xxx:9300 [indices:admin/mapping/put]"}],"type":"illegal_argument_exception","reason":"Rejecting mapping update to [metricbeat-6.4.2-2018.10.31] as the final mapping would have more than 1 type: [_doc, doc]"},"status":400}

Still have the same errors...

And the similar errors occurred on all index patterns, filebeat-, packetbeat-....
All we did was to upgraded from 6.3.2 to 6.4.2... nothing else was changed but after upgrade, almost all dashboards started as various errors..

Please help and thanks

Li


(Li Cui) #4

Any updates on this please?


(Abdon Pijpelink) #5

I don't think enabling fielddata is the solution here. The bigger issue is that the beat.name field has been mapped as a text field rather than a keyword field in your indexes. This suggests that the Metricbeat index template has not been loaded.

Does your Metricbeat have setup.template.enabled set to false? If so, you need to manually load the index template:

metricbeat setup --template

Note that this will only fix future indexes. Any existing metricbeat indexes would have to be reindexed.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.