NGRAM Tokenizer Problem - segment writing can't keep up

Adam_Whittaker · March 27, 2017, 6:11pm

Hello,

I enabled an NGRAM Tokenizer on a field in one of my indexes (im using ElasticSearch 5.2.0) and after the indexing was complete my ElasticSearch stopped responding.

In checking the logs i kept seeing this error:

segment writing can't keep up... (followed by my index name). Nothing would work until i manually removed the indexes data folder.

I tested this on my dev server prior to prod and it worked fine (dev server has SSD vs spinning on prod).

I have never used an NGRAM tokenizer before so i am not sure if it just didnt work due to the performance of the prod VM vs my dev environment or if the error message means somthing else?

I did set my ngram max and min a bit extreme:
.MinGram(1)
MaxGram(500)

I set it like this because i wanted to be able to match the entire string and also small things like version numbers (that are captured in the string e.g. string:

\server\folder\folder2\test-document-product-v.1.2.5.x

I wanted users to be able to search for prod 1. and get results or doc 1.2 and get results.

Am i going about this the wrong way?

Thanks,

Adam_Whittaker · March 28, 2017, 3:23pm

I changed my bulk loading from 1000 to 100 and also added MaxThreadCount = 1 for the merge scheduler. After these changes i was able to successfully import without crashing the server.

I am not sure which one resolved my issue (or if it was both).

Thanks,

system · April 25, 2017, 3:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch ngram tokenizer Elasticsearch	4	856	February 10, 2020
Possible issue with ngrams and highlighting? Gist included. (Was: Which is the best (right) use of NGrams?) Elasticsearch	3	336	July 6, 2017
nGram performance Elasticsearch	3	3632	July 6, 2017
Can't get nGram indexing / querying to work as expected Elasticsearch	10	394	July 6, 2017
0.90.9 in elasticsearch - indexing with ngrams, search returns 0 results Elasticsearch	5	391	July 6, 2017

NGRAM Tokenizer Problem - segment writing can't keep up

Related topics