Bulk indexing rejected threads

Hello, I am pretty new to elasticsearch I made my project and everything works really well except bulk indexing. The problem is I am getting alot of rejected write threads. I looked up thread pool and I got this:
node_name write 0 0 121044

My current configuration of index is 100 primary shards and 0 replicas. I am running this on single node(32gb heap size, database is stored on NVMe storage,8core/16thread AMD cpu).
my index:
{
"mapping": {
"_doc": {
"properties": {
"hash": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"uri": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"path": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
hash- 600-700 character string(the field I am searching using query_string)
url/path - 100 characters max.
each document is around 1-1.2kb, I will be having 800 million documents ~915GB data.
I am using NEST to do the BULK index with settings:
BackOffRetries 10
BackOffTime 5 seconds
MaxDegreeOfParallelism 5
Size 5000

With my experience I couldn't resolve it. I would appreciate if someone points me to right the direction :smiley:

That's way too many shards, why did you pick this?

I read article that I needed 1 shard per 5mil documents

EDIT: how many shards should I use for my use case?

Not sure what article that is, but I'd suggest it's not accurate, and you're wasting resources with such a large shard count.

Aim for 30-50GB per shard instead.

Alright understood I will reduce shard amount to 35. One small question tho, do I need replicas?

You only have a single node cluster, so it's not really relevant if you have them or not :slight_smile:

1 Like

Actually reducing shard amount resolved my issue. I am no longer getting rejected writes and I've indexed ~500 thousand document right now. Will be updating this thread after a day or two after testing. Thank you.

It still happens but it happens way less times. Where can I see logs so I can see whats happening?

Check Monitoring to see what is happening, are you hitting disk or CPU limits?

I barely saturate CPU and storage(raid 0 nvme). I can't setup HeartBeat I get the error:
[invalid_index_name_exception] Invalid index name [security], must not start with '', '-', or '+', with { index_uuid="na" & index="_security" }: Check the Elasticsearch Monitoring cluster network connection or the load level of the nodes.

Are you indexing new documents only or are you also performing updates?

before I bulk index I set update interval to -1 then bulk index with parameters mentioned above and after it's done I set back interval to 1s thats it. indexing MAY happen in parallel.

That does not answer my question.

What you mean in updates? if you mean updating existing ones no.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.