Hello, I am pretty new to elasticsearch I made my project and everything works really well except bulk indexing. The problem is I am getting alot of rejected write threads. I looked up thread pool and I got this:
node_name write 0 0 121044
My current configuration of index is 100 primary shards and 0 replicas. I am running this on single node(32gb heap size, database is stored on NVMe storage,8core/16thread AMD cpu).
my index:
{
"mapping": {
"_doc": {
"properties": {
"hash": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"uri": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"path": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
hash- 600-700 character string(the field I am searching using query_string)
url/path - 100 characters max.
each document is around 1-1.2kb, I will be having 800 million documents ~915GB data.
I am using NEST to do the BULK index with settings:
BackOffRetries 10
BackOffTime 5 seconds
MaxDegreeOfParallelism 5
Size 5000
With my experience I couldn't resolve it. I would appreciate if someone points me to right the direction
Actually reducing shard amount resolved my issue. I am no longer getting rejected writes and I've indexed ~500 thousand document right now. Will be updating this thread after a day or two after testing. Thank you.
I barely saturate CPU and storage(raid 0 nvme). I can't setup HeartBeat I get the error:
[invalid_index_name_exception] Invalid index name [security], must not start with '', '-', or '+', with { index_uuid="na" & index="_security" }: Check the Elasticsearch Monitoring cluster network connection or the load level of the nodes.
before I bulk index I set update interval to -1 then bulk index with parameters mentioned above and after it's done I set back interval to 1s thats it. indexing MAY happen in parallel.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.