Bulk indexing rejected threads

Hello, I am pretty new to elasticsearch I made my project and everything works really well except bulk indexing. The problem is I am getting alot of rejected write threads. I looked up thread pool and I got this:
node_name write 0 0 121044

My current configuration of index is 100 primary shards and 0 replicas. I am running this on single node(32gb heap size, database is stored on NVMe storage,8core/16thread AMD cpu).
my index:
{
"mapping": {
"_doc": {
"properties": {
"hash": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"uri": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"path": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
hash- 600-700 character string(the field I am searching using query_string)
url/path - 100 characters max.
each document is around 1-1.2kb, I will be having 800 million documents ~915GB data.
I am using NEST to do the BULK index with settings:
BackOffRetries 10
BackOffTime 5 seconds
MaxDegreeOfParallelism 5
Size 5000

With my experience I couldn't resolve it. I would appreciate if someone points me to right the direction :smiley:

That's way too many shards, why did you pick this?

I read article that I needed 1 shard per 5mil documents

EDIT: how many shards should I use for my use case?

Not sure what article that is, but I'd suggest it's not accurate, and you're wasting resources with such a large shard count.

Aim for 30-50GB per shard instead.

Alright understood I will reduce shard amount to 35. One small question tho, do I need replicas?

You only have a single node cluster, so it's not really relevant if you have them or not :slight_smile:

1 Like

Actually reducing shard amount resolved my issue. I am no longer getting rejected writes and I've indexed ~500 thousand document right now. Will be updating this thread after a day or two after testing. Thank you.

It still happens but it happens way less times. Where can I see logs so I can see whats happening?

Check Monitoring to see what is happening, are you hitting disk or CPU limits?

I barely saturate CPU and storage(raid 0 nvme). I can't setup HeartBeat I get the error:
[invalid_index_name_exception] Invalid index name [security], must not start with '', '-', or '+', with { index_uuid="na" & index="_security" }: Check the Elasticsearch Monitoring cluster network connection or the load level of the nodes.

Are you indexing new documents only or are you also performing updates?

before I bulk index I set update interval to -1 then bulk index with parameters mentioned above and after it's done I set back interval to 1s thats it. indexing MAY happen in parallel.

That does not answer my question.

What you mean in updates? if you mean updating existing ones no.