NGram, timeout and errors

It is importantant that you can search for parts of words, so I'm setting up nGram-settings. It works nicely when I have min and max to 4, but when I tested to raise max to 6 I got a problem.

The problem occours when I run the initial indexing of all existing content. It is emails of various sizes and content. I take them in batches of 100 and send them for indexing, and after 500 (5 batches that is) I get:

Maximum timeout reached while retrying request. Call: Status code unknown from: POST /_bulk

Then the same on the next batch again. And then on 700 I get:

The remote server returned an error: (429) Too Many Requests.. Call: Status code 429 from: POST /_bulk. ServerError: Type: circuit_breaking_exception Reason: "[parent] Data too large, data for [<http_request>] would be [1018201304/971mb], which is larger than the limit of [986932838/941.2mb], real usage: [896683280/855.1mb], new bytes reserved: [121518024/115.8mb]"

It feels like something is running out of memory and that I send in too much at a time.
I have changed the batch size to 50 but I get the same error at the same place.

Any ideas?

Hi @DesireeNordlund,

I would think that this would help if you increased your heap size. 1GB is kind of small. You could look at heap size when indexing data and also the nodes stats API might have useful info on both heap and circuit breaker usage.

If you are using G1 GC, using the JVM settings from this PR might help: https://github.com/elastic/elasticsearch/pull/46169.

Hi
Thank you kindly for your time and reply. I will test this.
I rewrote the indexing program so it made sure not to take too much at the time and it works nicely now but take huge amount of time. It would be sweet if I could send in bigger batches.