NGRAM Tokenizer Problem - segment writing can't keep up

Hello,

I enabled an NGRAM Tokenizer on a field in one of my indexes (im using ElasticSearch 5.2.0) and after the indexing was complete my ElasticSearch stopped responding.

In checking the logs i kept seeing this error:

segment writing can't keep up... (followed by my index name). Nothing would work until i manually removed the indexes data folder.

I tested this on my dev server prior to prod and it worked fine (dev server has SSD vs spinning on prod).

I have never used an NGRAM tokenizer before so i am not sure if it just didnt work due to the performance of the prod VM vs my dev environment or if the error message means somthing else?

I did set my ngram max and min a bit extreme:
.MinGram(1)
MaxGram(500)

I set it like this because i wanted to be able to match the entire string and also small things like version numbers (that are captured in the string e.g. string:

\server\folder\folder2\test-document-product-v.1.2.5.x

I wanted users to be able to search for prod 1. and get results or doc 1.2 and get results.

Am i going about this the wrong way?

Thanks,

I changed my bulk loading from 1000 to 100 and also added MaxThreadCount = 1 for the merge scheduler. After these changes i was able to successfully import without crashing the server.

I am not sure which one resolved my issue (or if it was both).

Thanks,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.