Sporadic index corruption... Fielddata is disabled on text fields by default

jasona · February 27, 2017, 6:42pm

After the aws hardware failure, we've been encountering a new error that we've never seen before on our elastic.co cloud cluster.

Reference: https://discuss.elastic.co/t/problem-cluster-is-under-maintenance-recovery-for-the-last-hour/76090

I can blow away our mapping and template and rebuild my indexes from scratch, but then I appear to be encountering sporadic index corruption, even with the dual-data-center capability now turned on.

The issue manifests itself as follows. This is on cluster, cluster ID "a02c49" as the forum topic reference above.

The query:

curl -XPOST 'https://user:pw@host.us-west-1.aws.found.io:port/silver_jobs-*/_search' -d '{"query":{"bool":{"should":[{"term":{"consolidated_status":"complete"}},{"term":{"consolidated_status":"passed"}},{"term":{"consolidated_status":"failed"}},{"term":{"consolidated_status":"errored"}}],"filter":[{"range":{"at0_creation_time":{"gte":1487592000000,"lte":1488218400000,"format":"epoch_millis"}}}],"minimum_number_should_match":1}},"aggs":{"byDate":{"date_histogram":{"field":"at0_creation_time","interval":"6h"},"aggs":{"byStatus":{"terms":{"field":"consolidated_status"}}}}},"size":0,"sort":{"at0_creation_time":{"order":"desc"}}}}'

The output:

{
"took": 1208,
"timed_out": false,
"_shards": {
"total": 23,
"successful": 13,
"failed": 10,
"failures": [
{
"shard": 0,
"index": "silver_jobs-2017-02-26",
"node": "jkQeLJ8VQ2WDFfSBdZpkew",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [consolidated_status] in order to load fieldda
ta in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
]
},
"hits": {
"total": 8049211,
"max_score": 0,
"hits":
},
"aggregations": { ...

The reason this looks like sporadic corruption to me is that the field that is being complained about, consolidated_status, is a keyword field as defined in the mapping:

           "consolidated_status": {                                                                                                   
               "type": "keyword",                                                                                                     
               "copy_to": "ft"                                                                                                        
           },

The ft field is analyzed, but not consolidated_status itself.

I had thought that the sporadic index corruption was left behind in the 2.4.1 series, but we appear to be seeing in with the 5.1.1 Elasticsearch as well.

jasona · February 27, 2017, 6:44pm

@joegallo could you take a look for me

jasona · February 27, 2017, 9:50pm

if someone has email for off-forum, I can provide the fully detailed (password, exact url, etc) query that reproduces the error at will.

jasona · March 1, 2017, 3:00am

Ok, I tracked this down. My bad: one template shared a name with the next, so I was blowing away a template by accident.

jasona · March 1, 2017, 3:12am

The fact that elasticsearch will let you overwrite one template with another, without having to delete the first one, makes it rather error prone. Having to do two steps (delete, then write) in order to change an established template, would have caught this issue much, much sooner.

system · March 29, 2017, 3:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.