Howdy, we're having some trouble with Elasticsearch bulk indexing, which
appears to slow to a virtual stand-still about once a week or more. We're
using a cached data structure that is backed by Elasticsearch, so I don't
know exactly how often we're hitting it. Whenever there is a cache miss,
and the cached data structure is updated thousands of times per second,
Elasticsearch is queried and updated.
We're running a four-node cluster using Elasticsearch 19.8
Right before things slow to a stand-still, we see these warnings in the
logs:
[19:43:57,851][WARN ][index.merge.scheduler ] [Droom, Doctor Anthony]
[analytics][11] failed to merge
java.io.FileNotFoundException: _guwj_1.del
at
org.elasticsearch.index.store.Store$StoreDirectory.fileLength(Store.java:448)
at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:303)
at
org.apache.lucene.index.MergePolicy$OneMerge.totalBytesSize(MergePolicy.java:174)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:81)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
The cluster then slows, although it continues to report "green" status for
all indices and respond to searches.
It's always a ".del" file, and it's always a different one. I have no idea
what that means, but I should note that we don't delete data on an ongoing
basis on this cluster. The shard name, 11 in this case, always changes.
Without exception, nodes that have these messages at the end of their logs
do not respond to SIGTERM and must be forcibly killed, while other nodes
are just fine. After a restart, the cluster is always happy and responsive
again.
Here's the result of curling localhost:9200/_settings. As you can see, we
have two replicas on the analytics index.
{
"ad_activity_test" : {
"settings" : {
"index.number_of_shards" : "12",
"index.number_of_replicas" : "1",
"index.version.created" : "190899"
}
},
"ad_activity" : {
"settings" : {
"index.number_of_shards" : "12",
"index.number_of_replicas" : "1",
"index.version.created" : "190899"
}
},
"analytics_20130207" : {
"settings" : {
"index.number_of_shards" : "12",
"index.number_of_replicas" : "2",
"index.version.created" : "190899",
"index.routing.allocation.total_shards_per_node" : "12"
}
},
"analytics" : {
"settings" : {
"index.merge.policy.segments_per_tier" : "5",
"index.number_of_replicas" : "2",
"index.version.created" : "190899",
"index.number_of_shards" : "12",
"index.routing.allocation.total_shards_per_node" : "12"
}
},
"activity" : {
"settings" : {
"index.number_of_replicas" : "2",
"index.version.created" : "190899",
"index.number_of_shards" : "12",
"index.routing.allocation.total_shards_per_node" : "12"
}
}
}
Does anyone have any ideas? Is this a better question for the Lucene
mailing list? Thanks for any help.
Best,
Josh
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.