Hi,
I have upgraded Elasticsearch 5.5 to 7.7 recently.
I have only 1 index of 30 fields and 6K data. Dataset is very simple in nature. [provided 1g memory in jvm options]
In the dataset i have a field description which has text around 250-400 chars.
I am using a search query having combination of bool, must, query_string, term.
When I perform a search operation for exact indexed description having 250 chars, search query takes long time to respond [15 sec] and Elasticsearch gets crashed.
If I use small search term of 20-25 chars ES works well.
All the words in search term are fuzzy term, we have appended ~ at end of each word.
In ES 5.5 above scenario was working pretty well with no issue and looks like something has broken in ES 7.7
Could you please suggest how should I proceed on my issue?
Is there any limit to input search term ?
How much memory should I set in development and production ?
Its out of memory. I could see lot of logs like [gc][2119] overhead, spent [3.7s] collecting in the last [6s]
Do you feel Fuzzy is consuming lot of memory?
Above query without fuzzy taking 500ms and with fuzzy taking 11 sec with gc log and gets crashed.
Is this field tokenized? Can you share the mapping?
Generally, looking for similar texts would be done using tokenised fields and using the more like this query.
Searching long untokenized fields with fuzzy will be expensive and only allows for max 2 characters difference between search string and matched values.
We have not modified or customizing _mapping. ES be default creating fields and datatypes for _mappings.
As per elasticsearch behaviour, it has created type as a text and keyword for field description.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.