Indexing documents to elasticsearch monthly?


We have a 3 node ELK cluster with 83000000 documents and 53 gb

Our cluster is populated by logstash who create a index by day

We have this scheme since 2014 january

Now Elasticsearch is going slow. It have 484 index, 1 replica by document and 4832 shards

Using cluster.stats() from Elasticsearch python took more than 10 seconds
kibana show time to time timeouts of 3 seconds and of 30 seconds

What do you recomend to fix these issues?

I'm thinking to switch from daily index to monthly index, changing it on logstash.

I imagine I should migrate all old index to this new scheme reindexing old indexes. Not sure

Thank you very much
PD: For reference I researched before writing this post on:

Now elasticsearch is going slow. It have 484 index, 1 replica by document and 4832 shards

Woah. That's way too much. You must reduce the number of shards per index. Until your daily indexes reach a few tens of GB you shouldn't go beyond one shard per index.

Hello @magnusbaeck,

I have resumed my indices they store logs from logstash.

Currently I recollect in the first index less than 1 gb by month, so I guess I can reshard from daily to monthly such index. And I suppose I can change from daily to monthly the creation of indices.

At the second index, we are currently storing only 1 month of logs (73000000 docs / 41 GB), I'm going to change to 1 shard on such index

My plan of such operation is:

  1. Make snapshot of indices
  2. Create one index with one shard by each month where we recollected data
  3. Fill such index with daily index data, with elasticdump utility
  4. Remove old daily indices

Do you think it is a good plan ? Would you have a better alternative? Would you recomend to skip monthly created indices and go with daily one ?

Thank you very much

That looks like a reasonable plan. I generally prefer daily indexes because a) correcting mapping mistakes is much faster and b) you can clean up older indexes with a higher resolution.

Thank you @magnusbaeck,

I scripted it, to httpd indice I prefer to wait, because I changed the config to generate new indices with only one shard (+ replica)

Here is the script (in spanish):

ano_actual=$(date +%Y)
mes_actual=$(date +%m)

if [ -z "$1" ]
	echo "Uso: $0 [indice]"
	echo "Ejemplos:
		$0 logstash
curl -o /tmp/indices.txt -s -x '' 'http://localhost:9200/_cat/indices/'"$indice"'*'
if  [ -n "$2" ]
	meses=$(grep $indice- /tmp/indices.txt | 
		sed -n 's,.*-\(....\)\.\(..\)\....*,\1.\2,p' | 
		grep -v "$ano_actual\.$mes_actual" |  sort -u)

for m in $meses
	dias=$(grep -F $indice-$m. /tmp/indices.txt | awk '{print $3}')
	if [ -n "$dias" ]
		echo "Migrando $m ..."
		curl -X PUT -x '' http://localhost:9200/$indice-$m &> /var/log/elasticsearch/reshard/creacion-$m.log
	for d in $dias
		echo "Pasando $indice-$d ..."
		elasticdump --input http://localhost:9200/$d --output http://localhost:9200/$indice-$m --limit 10000 &> /var/log/elasticsearch/reshard/reunificacion-$d.log
		if [ $? = 0 ]
			curl -X DELETE -x '' http://localhost:9200/$d &> /var/log/elasticsearch/reshard/eliminacion-$d.log