StackOverflowError in Lucene

Hi!

I have cluster, that includes 4 nodes:

  • ubuntu-es5 (rack-1),
  • ubuntu-es6 (rack-1),
  • ubuntu-es7 (rack-2),
  • ubuntu-es8 (rack-2).

In ubuntu-es8 StackOverflowError occured, that broke the whole cluster:

[2018-05-29T12:46:47,707][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ubuntu-es8] fatal error in thread [elasticsearch[ubuntu-es8][search][T#8]], exiting
java.lang.StackOverflowError: null
at org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings.articulationPointsRecurse(GraphTokenStreamFiniteStrings.java:278) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings.articulationPointsRecurse(GraphTokenStreamFiniteStrings.java:278) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings.articulationPointsRecurse(GraphTokenStreamFiniteStrings.java:278) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]

What was wrong? How to avoid this behavior. As i remember, this error is know, and was prommised to fix in Elastcisearch 6.0

Elasticsearch version:

{
  "name" : "ubuntu-es5",
  "cluster_name" : "company-search-cluster-2",
  "cluster_uuid" : "SAwP5j5jSS6bzZGxl6GLaQ",
  "version" : {
    "number" : "6.2.3",
    "build_hash" : "c59ff00",
    "build_date" : "2018-03-13T10:06:29.741383Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Thanks!

Do you know that search query that triggered this error?

Unfortunately, don't know. We have index with 765 lines of mapping, 10K requests per day. We are using .NET library and message in error-log of our application is not meaningful: "System.NullReferenceException: Object reference not set to an instance of an object.". We don't use regex-queries, but our analyzers use regex. For example:

public class DigitCharFilter : PatternReplaceCharFilter
{
	public DigitCharFilter()
	{
		Pattern = "\\D+";
		Replacement = "";
	}
}

public class WordDigitCharFilter : PatternReplaceCharFilter
{
	public WordDigitCharFilter()
	{
		Pattern = "[^0-9a-zA-Zа-яА-ЯёЁ]";
		Replacement = "";
	}
}

public class WordDigitTokenizer : PatternTokenizer
{
	public WordDigitTokenizer()
	{
		Pattern = "[^0-9a-zA-Zа-яА-ЯёЁ]";
	}
}

May be regex in my analyzers are wrong?

Thanks!

These patterns does not seems harmful. Do you have the complete stacktrace? I would like to take a look in how it starts.

The full log of the failed node available on the following link: https://yadi.sk/d/AdR9wV4_3WkxHm

Unfortunately this log is not helping much and we would need to see the actual query to understand more. We suspect that the source of this is a really big query with thousands of words may be triggering this problem.

One thing that you can do that could help tracking queries is to enable slowlogs. This wouldn't allow logging the actual query since it would get killed before getting a chance to be logged, but at least would log the other queries would could be similar.

Thanks for reply!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.