Hello,
After trobleshooting this problem for days, I decided to ask for help elasticsearch community. We are running 5 node elastic cluster, on Oracle linux 7.4. All machines have 64G RAM, and each one has 10 disks with about 500 GB. The data is sent to elasticsearch through 3 nodes graylog cluster.
The setup was working just fine whn we had less load, like 1000msg per second, mostly DNS type of messages. But as we added addtional traffic which has arround 15 000 msg per second(from firewalls), troubles started.
I created new index for these messages in a config : 4 shards per index, 1 replica and it worked but elasticsearch could not process messages quickly enough, so there was a long queue in graylog building.
I increased configuraiton to 8 shards per nodes, 1 replica and this caused better speed and elastcisearch could process messages in time. But after this was runnig for something like 5-10 hours, individual elasticsearch processes just hang up (elastic process there, but not responding and only kil -9 stops it) or they just out of the blue stopped. Yesterday I also set replicas to 0. It seem it worked a bit longer, like 18 hours but then again elastic process has stopped on one of the nodes.
I am sorry for very long post, but I wanted to give as much details as possible. Thanks for any ideas.
All the time, even when processing is OK, there are some messages in the elastic logs like:
- GC messages: mostly of INFO nature, rarely WARNINGS.
[2018-02-06T09:53:00,560][INFO ][o.e.m.j.JvmGcMonitorService] [RhfJyb1] [gc][18] overhead, spent [297ms] collecting in the last [1s]
[2018-02-06T10:01:16,006][INFO ][o.e.m.j.JvmGcMonitorService] [RhfJyb1] [gc][503] overhead, spent [289ms] collecting in the last [1.1s]
easch elasticsearch is using 30gm of memory, so that can not be increased.
- All shards failed errors:
[2018-02-06T00:01:12,260][DEBUG][o.e.a.s.TransportSearchAction] [RhfJyb1] All shards failed for phase: [query]
org.elasticsearch.ElasticsearchException$1: value source config is invalid; must have either a field context or a script or marked as unwrapped
at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:618) ~[elasticsearch-5.6.6.jar:5.6.6]
at ..
Caused by: java.lang.IllegalStateException: value source config is invalid; must have either a field context or a script or marked as unwrapped
at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.toValuesSource(ValuesSourceConfig.java:227) ~[elasticsearch-5.6.6.jar:5.6.6]
-
DEBUG messages: TransportSearcAction and RemoteTransportException
[2018-02-06T00:01:12,235][DEBUG][o.e.a.s.TransportSearchAction] [RhfJyb1] [srx_49][6], node[aJ5VQPYbQZmCuVMkbSPQLQ], [P], s[STARTED], a[id=ur1bjvESQOixmyGuCeWtMg]
: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[srx_49], indicesOptions=IndicesOptions[id=39, ignore_unavailable=true, allow_no_indices=t
rue, expand_wildcards_open=true, expand_wildcards_closed=false, allow_alisases_to_multiple_indices=true, forbid_closed_indices=true], types=[message], routing='nu
ll', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=25, batchedReduceSize=512, preFilterShardSize=64, source={
"from" : 0,
"query" : {
"bool" : {
"must" : [
{
"match_all" : {
"boost" : 1.0
}
}
],
"filter" : [
{
"bool" : {
"must" : [
{
"range" : {
"timestamp" : {
"from" : "2018-02-05 23:00:12.227",
"to" : "2018-02-05 23:01:12.227",
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
{
"query_string" : {
"query" : "streams:59dce05bac68b10ce707c2c1",
"fields" : [ ],
"use_dis_max" : true,
"tie_breaker" : 0.0,
"default_operator" : "or",
"auto_generate_phrase_queries" : false,
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"split_on_whitespace" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"aggregations" : {
"gl2_filter" : {
"filter" : {
"match_all" : {
"boost" : 1.0
}
},
"aggregations" : {
"gl2_terms" : {
"terms" : {
"size" : 100,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
}
}
},
"missing" : {
"missing" : { }
}
}
}}] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: [aJ5VQPY][10.16.10.208:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalStateException: value source config is invalid; must have either a field context or a script or marked as unwrapped
at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.toValuesSource(ValuesSourceConfig.java:227) ~[elasticsearch-5.6.6.jar:5.6.6]
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:51) ~[elasticsearch-5.6.6.jar:5.6.6] -
These Warning in file ._deprecation.log : [2018-02-06T10:00:07,585][WARN ][o.e.d.i.m.TypeParsers ] Expected a boolean [true/false] for property [index] but got [not_analyzed]
-
I installed elasticHQ tool that monitors cluster status. I will try to attach photo from it here.
-
I also didi a heap dump of one of the elastic processes, but have yet to analyze it. I also have