I have single server with ES 9.2.5.
It has 40cores/256GB RAM/12TB of SSD.
Thers are 2 aliases, ILM is rotating indices at 200GB (10 shards per index).
Everything works fine in terms of indexing/search speed, but I cannot reindex larger index (serious change to mapping is required).
I stop all activity on ES while reindexing, but allowed downtime is 2-3 days at max. So parallel reindexing is crucial.
“Smaller” alias (docs size is between 1-200kb and total size of 1TB) with underlying indices was successfully reindexed in 6 threads.
ES slicing was not really useful, manually starting 6 scripts was saturating hardware much better.
But “Larger” alias (docs size is between 1-400MB (yes, some docs are large) and total size of 10TB) fails at concurrency of 2-3-4.
I split reindexing of every index to three:
1 sub-500KB docs with batch size 200
2 500-5000KB - batch size 8
3 5000KB plus - batch size 1
technically any of reindex jobs shouldn’t consume more than 1-3GB of RAM.
But jobs fail rather fast with circuit breaker errors.
When I run 3 reindex jobs (one index at a time, all 3 categories) monitoring breaker.parent shows very high estimated_size.
I tried to set ES heap to 31GB, 165GB and even 240GB.
limit_size gets much more room, but still I couldn’t reliably run even 4x3 reindex jobs.
Any hints?
This option didn’t visibly help:
indexing_pressure.memory.limit: 20%
My next option is: moving data to cluster, but I really do not see any reliable proofs, that it will solve the problem.
My script:
#!/bin/bash
if [ -z "$1" ]; then
echo "Usage: $0 <index_suffix_number>"
exit 1
fi
# 2-digit zero padding
SUFFIX=$(printf "%02d" "$1")
SRC_INDEX="newattach-0000${SUFFIX}"
DST_INDEX="attach-0000${SUFFIX}"
KB_500=512000 # 500 KB
MB_5=5242880 # 5 MB
############################################
# Job 1: size < 500 KB
############################################
curl --noproxy '*' -u adm:HAHA -X POST \
"localhost:9200/_reindex?refresh&scroll=8h&wait_for_completion=false&pretty" \
-H "Content-Type: application/json" -d @- <<EOF
{
"source": {
"index": "${SRC_INDEX}",
"size": 400,
"query": {
"range": {
"size": {
"lt": ${KB_500}
}
}
}
},
"dest": {
"index": "${DST_INDEX}"
}
}
EOF
############################################
# Job 2: 500 KB <= size < 5 MB
############################################
curl --noproxy '*' -u adm:HAHA -X POST \
"localhost:9200/_reindex?refresh&scroll=8h&wait_for_completion=false&pretty" \
-H "Content-Type: application/json" -d @- <<EOF
{
"source": {
"index": "${SRC_INDEX}",
"size": 8,
"query": {
"range": {
"size": {
"gte": ${KB_500},
"lt": ${MB_5}
}
}
}
},
"dest": {
"index": "${DST_INDEX}"
}
}
EOF
############################################
# Job 3: size >= 5 MB
############################################
curl --noproxy '*' -u adm:HAHA -X POST \
"localhost:9200/_reindex?refresh&scroll=8h&wait_for_completion=false&pretty" \
-H "Content-Type: application/json" -d @- <<EOF
{
"source": {
"index": "${SRC_INDEX}",
"size": 1,
"query": {
"range": {
"size": {
"gte": ${MB_5}
}
}
}
},
"dest": {
"index": "${DST_INDEX}"
}
}
EOF