Hi all,
my Elasticsearch cluster keeps crashing whenever I do the following thing:
go to the "Discover" section of Kibana (or any visualization or dashboard) and apply a filter to a field of type keyword (through the "Add filter" button) with an input consisting in a string containing a very large number of words (the likes that you can generate through https://onlinerandomtools.com/generate-random-string, for example).
These are the only error messages that I can find in the Elasticsearch logs:
[2020-06-08T16:04:07,795][ERROR][o.e.ExceptionsHelper ] [my-es-node] fatal error
at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:300)
at java.base/java.util.Optional.ifPresent(Optional.java:176)
at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:290)
at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.exceptionCaught(Netty4HttpRequestHandler.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:831)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:376)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:58)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1421)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:597)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:551)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:830)
[2020-06-08T16:04:07,798][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [my-es-node] fatal error in thread [Thread-31], exiting
java.lang.StackOverflowError: null
at org.apache.lucene.util.automaton.RegExp.next(RegExp.java:1018) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
at org.apache.lucene.util.automaton.RegExp.parseCharExp(RegExp.java:1165) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
at org.apache.lucene.util.automaton.RegExp.parseSimpleExp(RegExp.java:1160) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1092) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
at org.apache.lucene.util.automaton.RegExp.parseComplExp(RegExp.java:1080) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:1049) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1042) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56]
...
Here are some information about my infrastructure.
host:
- CPU: 4 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
- available memory for ES: 8 GB
Elasticsearch:
- 1 node, version 7.4, JVM heap 4 GB
Kibana:
- version 7.4
Elasticsearch is crashing whenever the number or words in the input of "Add filter" is around 200.
Applying filters on keyoword fields with input text consisting in less than 200 words does not result in a crash.
I tried adding this property in the index settings of the indices that are involved in this error:
{
"settings": {
"index": {
"max_regex_length": 10
}
}
}
in order to limit the length of the regex expression used in Regex Queries, as it seems that the error is somehow related to a parse failure of a regex (as indicated by the stacktrace).
However, nothing changed.
I could still reproduce this exact fatal error by injecting a very large string into the "Add filter" input in Kibana.
I tried to disable the so called expensive queries by trying to set the property
search.allow_expensive_queries: false
in the ES YAML file, as indicated in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html#query-dsl-allow-expensive-queries.
But strangely ES does not recognize this property.
I tried to set this property in the cluster settings through the REST API, but again it does not seem to be recognized for my version of Elasticsearch.
The strange thing is that this fatal error does not occur if I perform the same match query (with the same very long string as input) by directly issuing ES API, either by using curl or through Kibana Console tool.
For example, I inspected the query that is produced when I apply a filter on Kibana Discover and tried to perform the same query in Kibana Console tool:
GET my-index/_search
{
"version": true,
"size": 500,
"sort": [
{
"@timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "12h",
"min_doc_count": 1
}
}
},
"stored_fields": [
"*"
],
"docvalue_fields": [
{
"field": "@timestamp",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"match_phrase": {
"my_keyword_field": {
"query": "<text with ~ 200 words>"
}
}
},
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2020-06-09T09:01:31.321Z",
"lte": "2020-06-09T09:16:31.321Z"
}
}
}
],
"should": [],
"must_not": []
}
}
}
The above query does not cause the ES fatal error.
So it seems it is a problem related to the way used by Kibana to sends the query requests to ES when you apply a filter in a saved search or a visualization.
This fatal error is very annoying since any Kibana user can potentially crash the Elasticsearch cluster by simply making a filter on Discover with a very large text as input.
Is there a way to tell Kibana to reject filter inputs that are too large?
NB: this happens only when I apply the filter to a keyword field