Elasticsearch index mapping parameter ignore_malformed setting

We are using Elasticsearch 2.1.1 with a 3-node cluster. Recently we see Elasticsearch constantly initializing many shards in multiple indexes and stuck in that mode. The cluster health stays red with no obvious methods to recover from it, other than deleting all the indexes and restart the Elasticsearch cluster.

By researching the logs and from internet, we found that this is probably caused by dynamically changing data types of certain fields in the indexes, which caused the discrepancy of mapping of primary and standby shards. Here are some of the exceptions reported in ES logs:

MergeMappingException[Merge failed with failures {[mapper
[cpuPercent] of different type, current_type [long], merged_type [double]]}]

IllegalArgumentExceptions, occur in ‘mona’ mapping:
java.lang.IllegalArgumentException: Mapper for [response]
conflicts with existing mapping in other types [Can't merge a non object
mapping [response.headers] with an object mapping [response.headers]]

Subsequently we found that ES allows to configure at index level for all mappings to set ignore_malformed to be true, by doing so ignoring the 'bad' data types and accept all other 'normal' fields of a document so that the system will not fall into the red mode of unable to allocate shards.

However, we are only able to set this at index level, after the index has been created daily (via Logstash). This is the REST API we use to set: curl 'http://elk:9200/logstash-2016.06.02/_settings' -d '{
"index" : {
"mapping.ignore_malformed" : true
}
}

Since the logstash index is created daily, we cannot use this approach to change the settings.

We tried with modifying the elasticsearch.yml as follows:

......
#################################### Index ####################################

You can set a number of options (such as shard/replica options, mapping

or analyzer definitions, translog settings, ...) for indices globally,

in this file.

Note, that it makes more sense to configure index settings specifically for

a certain index, either when creating it or by using the index templates API.

See http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html and

http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

for more information.

index.mapping.ignore_malformed: true

Set the number of shards (splits) of an index (5 by default):

index.number_of_shards: 5

Set the number of replicas (additional copies) of an index (1 by default):

index.number_of_replicas: 1

......

But this seems does not take effect. A REST query of newly created index settings only shows the following:

{
"logstash-2016.06.02": {
"settings": {
"index": {
"creation_date": "1464874876688",
"refresh_interval": "5s",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "7nXLQdWQQc6Bue-XhODcMA",
"version": {
"created": "2010199"
}
}
}
}
}

So the question is that is there a global way of setting the ignore_malformed? And if not, how to solve the situation?

Check out at index templates: Index Templates | Elasticsearch Guide [2.3] | Elastic

Thanks for the suggestion. We tried the ES REST API to add a template for 'logstash', by cut and paste the existing 'logstash' template and then added the following into the 'settings'

"index": {
"mapping": {
"ignore_malformed": "true"
}

And all subsequent logstash-* indexes generated are created with the new template.

However, I think we still face two issues:

  1. I have to call the ES REST API to add the new template. There is always a possibility that some 'bad' data could sneak into before the new template, thus leaving the system in bad state.
  2. I'm not sure exactly how Logstash establish its index template, will it overwrite the new template I put in when it injects data into ES? (My guess is not since I saw all subsequent index settings contains the new changes, but I'd like to confirm and understand better)

In summary, is there a 'static' way to load the template, so that I won't worry about issue 1 and 2?