Bulk indexing rejected threads

adrian_luighi · March 16, 2020, 1:29am

Hello, I am pretty new to elasticsearch I made my project and everything works really well except bulk indexing. The problem is I am getting alot of rejected write threads. I looked up thread pool and I got this:
node_name write 0 0 121044

My current configuration of index is 100 primary shards and 0 replicas. I am running this on single node(32gb heap size, database is stored on NVMe storage,8core/16thread AMD cpu).
my index:
{
"mapping": {
"_doc": {
"properties": {
"hash": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"uri": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"path": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
hash- 600-700 character string(the field I am searching using query_string)
url/path - 100 characters max.
each document is around 1-1.2kb, I will be having 800 million documents ~915GB data.
I am using NEST to do the BULK index with settings:
BackOffRetries 10
BackOffTime 5 seconds
MaxDegreeOfParallelism 5
Size 5000

With my experience I couldn't resolve it. I would appreciate if someone points me to right the direction

warkolm · March 16, 2020, 1:33am

That's way too many shards, why did you pick this?

adrian_luighi · March 16, 2020, 1:53am

I read article that I needed 1 shard per 5mil documents

EDIT: how many shards should I use for my use case?

warkolm · March 16, 2020, 2:00am

Not sure what article that is, but I'd suggest it's not accurate, and you're wasting resources with such a large shard count.

Aim for 30-50GB per shard instead.

adrian_luighi · March 16, 2020, 2:04am

Alright understood I will reduce shard amount to 35. One small question tho, do I need replicas?

warkolm · March 16, 2020, 2:06am

You only have a single node cluster, so it's not really relevant if you have them or not

adrian_luighi · March 16, 2020, 2:21am

Actually reducing shard amount resolved my issue. I am no longer getting rejected writes and I've indexed ~500 thousand document right now. Will be updating this thread after a day or two after testing. Thank you.

adrian_luighi · March 16, 2020, 5:35am

It still happens but it happens way less times. Where can I see logs so I can see whats happening?

warkolm · March 16, 2020, 6:47am

Check Monitoring to see what is happening, are you hitting disk or CPU limits?

adrian_luighi · March 16, 2020, 8:05am

I barely saturate CPU and storage(raid 0 nvme). I can't setup HeartBeat I get the error:
[invalid_index_name_exception] Invalid index name [security], must not start with '', '-', or '+', with { index_uuid="na" & index="_security" }: Check the Elasticsearch Monitoring cluster network connection or the load level of the nodes.

Christian_Dahlqvist · March 16, 2020, 8:37am

Are you indexing new documents only or are you also performing updates?

adrian_luighi · March 16, 2020, 8:54am

before I bulk index I set update interval to -1 then bulk index with parameters mentioned above and after it's done I set back interval to 1s thats it. indexing MAY happen in parallel.

Christian_Dahlqvist · March 16, 2020, 9:33am

That does not answer my question.

adrian_luighi · March 16, 2020, 9:41am

What you mean in updates? if you mean updating existing ones no.

system · April 13, 2020, 9:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES rejecting bulk messages when writing to indices with 40 shards Elasticsearch	12	5706	March 22, 2019
Bulk queue_size Elasticsearch	9	12703	July 5, 2017
Number of active threads for bulk thread_pool is equal to number of shards to which write is happening and not a single bulk request Elasticsearch	5	1492	August 9, 2018
Elasticsearch thread pool rejected Elasticsearch	6	1785	April 9, 2020
High Rejections - bulk api Elasticsearch	10	1361	February 20, 2020

Bulk indexing rejected threads

Related topics