Indexing documents to elasticsearch monthly?

i5513 · August 24, 2016, 9:35am

Hello,

We have a 3 node ELK cluster with 83000000 documents and 53 gb

Our cluster is populated by logstash who create a index by day

We have this scheme since 2014 january

Now elasticsearch is going slow. It have 484 index, 1 replica by document and 4832 shards

Using cluster.stats() from elasticsearch python took more than 10 seconds
kibana show time to time timeouts of 3 seconds and of 30 seconds

What do you recomend to fix these issues?

I'm thinking to switch from daily index to monthly index, changing it on logstash.

I imagine I should migrate all old index to this new scheme reindexing old indexes. Not sure

Thank you very much
PD: For reference I researched before writing this post on:

magnusbaeck · August 24, 2016, 11:00am

Now elasticsearch is going slow. It have 484 index, 1 replica by document and 4832 shards

Woah. That's way too much. You must reduce the number of shards per index. Until your daily indexes reach a few tens of GB you shouldn't go beyond one shard per index.

i5513 · August 25, 2016, 6:47am

Hello @magnusbaeck,

I have resumed my indices they store logs from logstash.

Currently I recollect in the first index less than 1 gb by month, so I guess I can reshard from daily to monthly such index. And I suppose I can change from daily to monthly the creation of indices.

At the second index, we are currently storing only 1 month of logs (73000000 docs / 41 GB), I'm going to change to 1 shard on such index

My plan of such operation is:

Make snapshot of indices
Create one index with one shard by each month where we recollected data
Fill such index with daily index data, with elasticdump utility
Remove old daily indices

Do you think it is a good plan ? Would you have a better alternative? Would you recomend to skip monthly created indices and go with daily one ?

Thank you very much

magnusbaeck · August 25, 2016, 7:20am

That looks like a reasonable plan. I generally prefer daily indexes because a) correcting mapping mistakes is much faster and b) you can clean up older indexes with a higher resolution.

i5513 · August 26, 2016, 12:35pm

Thank you @magnusbaeck,

I scripted it, to httpd indice I prefer to wait, because I changed the config to generate new indices with only one shard (+ replica)

Here is the script (in spanish):

ano_actual=$(date +%Y)
mes_actual=$(date +%m)

if [ -z "$1" ]
then
	echo "Uso: $0 [indice]"
	echo "Ejemplos:
		$0 logstash
else
	indice="$1"
fi
curl -o /tmp/indices.txt -s -x '' 'http://localhost:9200/_cat/indices/'"$indice"'*'
if  [ -n "$2" ]
then
	meses="$2"
else
	meses=$(grep $indice- /tmp/indices.txt | 
		sed -n 's,.*-\(....\)\.\(..\)\....*,\1.\2,p' | 
		grep -v "$ano_actual\.$mes_actual" |  sort -u)
fi

for m in $meses
do
	dias=$(grep -F $indice-$m. /tmp/indices.txt | awk '{print $3}')
	if [ -n "$dias" ]
	then
		echo "Migrando $m ..."
		curl -X PUT -x '' http://localhost:9200/$indice-$m &> /var/log/elasticsearch/reshard/creacion-$m.log
	fi
	for d in $dias
	do
		echo "Pasando $indice-$d ..."
		elasticdump --input http://localhost:9200/$d --output http://localhost:9200/$indice-$m --limit 10000 &> /var/log/elasticsearch/reshard/reunificacion-$d.log
		if [ $? = 0 ]
		then
			curl -X DELETE -x '' http://localhost:9200/$d &> /var/log/elasticsearch/reshard/eliminacion-$d.log
		fi
	done
done

Topic		Replies	Views
Sharding by time Elasticsearch	16	1508	July 6, 2017
Recommended way to reduce overload on ES Elasticsearch	10	3694	July 6, 2017
Daily indices into monthly Logstash	5	5423	October 6, 2017
Migrating daily indices to monthly Logstash	6	1352	November 1, 2017
Getting Data To Persist Elasticsearch	9	364	July 6, 2017

Indexing documents to elasticsearch monthly?

Related topics