Elasticsearch 7.7 crashing for Term Query if term text 200 - 300 char

Hi,
I have upgraded Elasticsearch 5.5 to 7.7 recently.
I have only 1 index of 30 fields and 6K data. Dataset is very simple in nature. [provided 1g memory in jvm options]
In the dataset i have a field description which has text around 250-400 chars.

I am using a search query having combination of bool, must, query_string, term.
When I perform a search operation for exact indexed description having 250 chars, search query takes long time to respond [15 sec] and Elasticsearch gets crashed.
If I use small search term of 20-25 chars ES works well.
All the words in search term are fuzzy term, we have appended ~ at end of each word.
In ES 5.5 above scenario was working pretty well with no issue and looks like something has broken in ES 7.7

  1. Could you please suggest how should I proceed on my issue?
  2. Is there any limit to input search term ?
  3. How much memory should I set in development and production ?

Could you share a bit more information about the crash? (e.g stacktrace)

Its out of memory. I could see lot of logs like
[gc][2119] overhead, spent [3.7s] collecting in the last [6s]
Do you feel Fuzzy is consuming lot of memory?
Above query without fuzzy taking 500ms and with fuzzy taking 11 sec with gc log and gets crashed.

I think you are hitting this Lucene issue:

https://issues.apache.org/jira/browse/LUCENE-9286

But it is difficult to tell without a heap dump. Maybe you can try to get the hot threads when running the query?

@Ignacio_Vera
[2020-07-16T01:44:40,518][WARN ][o.e.m.j.JvmGcMonitorService] [gc][2118] overhead, spent [1.1s] collecting in the last [1.1s]
[2020-07-16T01:44:46,589][WARN ][o.e.m.j.JvmGcMonitorService] [gc][2119] overhead, spent [3.7s] collecting in the last [6s]
[2020-07-16T01:44:46,170][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [HJL013760] fatal error in thread [elasticsearchsearch][T#11]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.ArrayUtil.growExact(ArrayUtil.java:302) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:311) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:770) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.UTF32ToUTF8.all(UTF32ToUTF8.java:251) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.UTF32ToUTF8.end(UTF32ToUTF8.java:231) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.UTF32ToUTF8.build(UTF32ToUTF8.java:194) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.UTF32ToUTF8.convertOneEdge(UTF32ToUTF8.java:137) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.UTF32ToUTF8.convert(UTF32ToUTF8.java:307) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:237) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:140) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.search.FuzzyTermsEnum.buildAutomata(FuzzyTermsEnum.java:154) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.apache.lucene.search.FuzzyQuery.(FuzzyQuery.java:111) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
at org.elasticsearch.index.mapper.StringFieldType.fuzzyQuery(StringFieldType.java:78) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.index.search.QueryStringQueryParser.getFuzzyQuerySingle(QueryStringQueryParser.java:466) ~[elasticsearch-7.7.0.jar:7.7.0]

Is this field tokenized? Can you share the mapping?
Generally, looking for similar texts would be done using tokenised fields and using the more like this query.

Searching long untokenized fields with fuzzy will be expensive and only allows for max 2 characters difference between search string and matched values.

1 Like

We have not modified or customizing _mapping. ES be default creating fields and datatypes for _mappings.
As per elasticsearch behaviour, it has created type as a text and keyword for field description.

Which of these 2 fields are you searching?
Seeing the mapping and query would help.

snippet of my query:
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Carl~ Rogers~ founder~ humanistic~ psychology~ movement~ revolutionized~ psychotherapy~ influence~ has~ become~ mainstream~ psychology~ and so on.",
"default_operator": "AND"
}
},
{
"term": {
"containerName.keyword": "Book Catalog"
}
.....
}

_mappings : description fields looks like below:
{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}

If i remove fuzzy ~ char from search terms query works fine. but with ~ char i am facing performance issue and es gets crashed.

OK. That's searching the tokenized field but using fuzzy on everything which is expensive.

If you do a lot of this type of fuzzy matching it's probably more efficient to use ngrams
e.g.

PUT my_index
{
  "settings": {
	"analysis": {
	  "analyzer": {
		"my_analyzer": {
		  "tokenizer": "my_tokenizer",
		  "filter": [
			"apostrophe",
			"lowercase"
		  ]
		}
	  },
	  "tokenizer": {
		"my_tokenizer": {
		  "type": "ngram",
		  "min_gram": 3,
		  "max_gram": 3,
		  "output_unigrams": true
		}
	  }
	}
  },
  "mappings": {
	  "properties": {
		"description": {
		  "type": "text",
		  "fields": {
			"keyword": {
			  "type": "keyword"
			},
			"ngram": {
			  "type": "text",
			  "analyzer": "my_analyzer"
			}
		  }
		}
	  }
  }
}

POST my_index/_doc/1
{
  "description":"Carl Rogers founder humanistic psychology movement revolutionized psychotherapy influence has become mainstream psychology "
}

POST my_index/_doc/_search
{
  "query": {
	"query_string": {
	  "query": "Carl Rogers founder humanistic sychology movement revolutionised psychotherapy influence has become mainstream",
	  "default_operator": "OR",
	  "default_field": "description.ngram"
	}
  }
}
2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.