ES snapshots

it2 · May 26, 2015, 6:12am

Hello there!
I have cluster with ES 1.3.8, kibana3 and Logstash 1.5.0 on board.
I also have a remote space for snapshots. It's mounted by sshfs for whole cluster.
but when I starting up a snapshot, my cluster is stopping. Logs are not collecting.
And only after snapshot and when I restart services, everything back in normal.
So when snapshont is making (~2 hours) - I havenot logs for this time interval and this is sad
Please, any advice. thanks!

warkolm · May 26, 2015, 7:25am

Are any of the APIs responding, _cat/nodes, hot_threads etc etc?

it2 · May 26, 2015, 10:53am

When I will make next snapshot (soon), I will see. THanks!

it2 · May 26, 2015, 11:21am

I started a snapshot and everything broked.
But cat says everything is ok
curl localhost:9200/_cat/nodes?h=n,ip,port,v,d,hp,m
data2 192.168.98.8 9300 1.3.2 398.6gb 71 -
data1 192.168.98.7 9300 1.3.2 385gb 42 -
logstash 127.0.0.1 9300 1.3.8 17.7gb 4 *
logstash-localhost.localdomain-2447-7946 127.0.0.1 9301 1.5.1 -1b 33 -

warkolm · May 26, 2015, 11:44pm

What state does your cluster go into?
Are you monitoring load on the nodes?

it2 · May 27, 2015, 6:59am

I monitor my cluster by kopf plugin.
When snapshot begin, nothing changes. Everything is green, and i havenot load on my nodes.

magnusbaeck · May 27, 2015, 10:13am

When snapshot begin, nothing changes. Everything is green, and i havenot load on my nodes.

What if you look at the snapshot status? Snapshotting doesn't necessarily place a heavy load on the nodes, and it definitely shouldn't affect the cluster's status (if that's what you were referring to with "green").

it2 · May 27, 2015, 1:59pm

curl -XGET "localhost:9200/_snapshot/4tbtest/15.03.20-22/_status"

{"snapshots":[{"snapshot":"15.03.20-22","repository":"4tbtest","state":"STARTED","shards_stats":{"initializing":4,"started":8,"finalizing":0,"done":0,"failed":0,"total":12},"stats":{"number_of_files":2326,"processed_files":185,"total_size_in_bytes":77395626630,"processed_size_in_bytes":394038158,"start_time_in_millis":1432734868342,"time_in_millis":0},"indices":{"logstash-2015.03.20":{"shards_stats":{"initializing":1,"started":3,"finalizing":0,"done":0,"failed":0,"total":4},"stats":{"number_of_files":881,"processed_files":76,"total_size_in_bytes":30777452788,"processed_size_in_bytes":92625050,"start_time_in_millis":1432734868088,"time_in_millis":257},"shards":{"0":{"stage":"INIT","stats":{"number_of_files":0,"processed_files":0,"total_size_in_bytes":0,"processed_size_in_bytes":0,"start_time_in_millis":0,"time_in_millis":0},"node":"v7d4Ya0jTcWzFObkh9UDJA"},"1":{"stage":"STARTED","stats":{"number_of_files":211,"processed_files":76,"total_size_in_bytes":10237498758,"processed_size_in_bytes":92625050,"start_time_in_millis":1432734868088,"time_in_millis":0},"node":"XNs4qFu2SNK126PyfAOZdg"},"2":{"stage":"STARTED","stats":{"number_of_files":329,"processed_files":0,"total_size_in_bytes":10268019904,"processed_size_in_bytes":0,"start_time_in_millis":1432734868345,"time_in_millis":0},"node":"v7d4Ya0jTcWzFObkh9UDJA"},"3":{"stage":"STARTED","stats":{"number_of_files":341,"processed_files":0,"total_size_in_bytes":10271934126,"processed_size_in_bytes":0,"start_time_in_millis":1432734868091,"time_in_millis":0},"node":"XNs4qFu2SNK126PyfAOZdg"}}},"logstash-2015.03.22":{"shards_stats":{"initializing":1,"started":3,"finalizing":0,"done":0,"failed":0,"total":4},"stats":{"number_of_files":775,"processed_files":109,"total_size_in_bytes":26597219323,"processed_size_in_bytes":301413108,"start_time_in_millis":1432734868342,"time_in_millis":0},"shards":{"0":{"stage":"STARTED","stats":{"number_of_files":328,"processed_files":0,"total_size_in_bytes":8877532129,"processed_size_in_bytes":0,"start_time_in_millis":1432734868094,"time_in_millis":0},"node":"XNs4qFu2SNK126PyfAOZdg"},"1":{"stage":"STARTED","stats":{"number_of_files":230,"processed_files":0,"total_size_in_bytes":8861607395,"processed_size_in_bytes":0,"start_time_in_millis":1432734868347,"time_in_millis":0},"node":"v7d4Ya0jTcWzFObkh9UDJA"},"2":{"stage":"INIT","stats":{"number_of_files":0,"processed_files":0,"total_size_in_bytes":0,"processed_size_in_bytes":0,"start_time_in_millis":0,"time_in_millis":0},"node":"XNs4qFu2SNK126PyfAOZdg"},"3":{"stage":"STARTED","stats":{"number_of_files":217,"processed_files":109,"total_size_in_bytes":8858079799,"processed_size_in_bytes":301413108,"start_time_in_millis":1432734868342,"time_in_millis":0},"node":"v7d4Ya0jTcWzFObkh9UDJA"}}},"logstash-2015.03.21":{"shards_stats":{"initializing":2,"started":2,"finalizing":0,"done":0,"failed":0,"total":4},"stats":{"number_of_files":670,"processed_files":0,"total_size_in_bytes":20020954519,"processed_size_in_bytes":0,"start_time_in_millis":1432734868098,"time_in_millis":246},"shards":{"0":{"stage":"INIT","stats":{"number_of_files":0,"processed_files":0,"total_size_in_bytes":0,"processed_size_in_bytes":0,"start_time_in_millis":0,"time_in_millis":0},"node":"XNs4qFu2SNK126PyfAOZdg"},"1":{"stage":"INIT","stats":{"number_of_files":0,"processed_files":0,"total_size_in_bytes":0,"processed_size_in_bytes":0,"start_time_in_millis":0,"time_in_millis":0},"node":"v7d4Ya0jTcWzFObkh9UDJA"},"2":{"stage":"STARTED","stats":{"number_of_files":332,"processed_files":0,"total_size_in_bytes":10000314469,"processed_size_in_bytes":0,"start_time_in_millis":1432734868098,"time_in_millis":0},"node":"XNs4qFu2SNK126PyfAOZdg"},"3":{"stage":"STARTED","stats":{"number_of_files":338,"processed_files":0,"total_size_in_bytes":10020640050,"processed_size_in_bytes":0,"start_time_in_millis":1432734868344,"time_in_millis":0},"node":"v7d4Ya0jTcWzFObkh9UDJA"}}}}}]}

and now my cluster looks like: http://prntscr.com/79zoa1

magnusbaeck · May 27, 2015, 8:03pm

curl -XGET "localhost:9200/_snapshot/4tbtest/15.03.20-22/_status"

That looks perfectly normal.

and now my cluster looks like: Screenshot by Lightshot

And you're saying that the drop in messages correlate with the start of the snapshot activity...?

it2 · May 28, 2015, 6:36am

And you're saying that the drop in messages correlate with the start of the snapshot activity...?

Yeap, you right. I cant associate it with something else.
Maybe something with my config?

cluster.name: elasticsearch2
node.name: "logstash"

script.disable_dynamic: true

#This forces the JVM to allocate all of ES_MIN_MEM immediately.
bootstrap.mlockall: true

node.master: true
node.data: false

index.number_of_shards: 4
index.number_of_replicas: 0
index.refresh_interval: 30s

# Search pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

# Bulk pool
threadpool.bulk.type: fixed
threadpool.bulk.size: 60
threadpool.bulk.queue_size: 300

# Index pool
threadpool.index.type: fixed
threadpool.index.size: 20
threadpool.index.queue_size: 100

# Indices settings
indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

# Cache Sizes
indices.fielddata.cache.size: 15%
indices.fielddata.cache.expire: 6h
indices.cache.filter.size: 15%
indices.cache.filter.expire: 6h

# Indexing Settings for Writes
index.refresh_interval: 30s
index.translog.flush_threshold_ops: 50000

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

http.cors.allow-origin: "*"
http.cors.enabled: true

it2 · May 28, 2015, 6:46am

I also want to describe the whole process.
I'm snapshoting my old indexes which are closed. So I'm opening 3-4 oldest indexes, and then starting to snapshot them.
So when snapshot finishes, i'm deleting these indexes and then rebooting elastic and logstash.
I also noticed if I\m not delete these indexes. services may not start. I mean they start, but messages not collecting.

Topic		Replies	Views
Starts snapshot from the begining Elasticsearch	6	542	October 3, 2019
First steps troubleshooting ES cluster crashes? Elasticsearch	9	3536	March 3, 2018
New ELK env. stopping working Elasticsearch	12	1967	July 5, 2017
Snapshot not registering on remote cluster Elasticsearch	11	168	June 20, 2024
ES Stops Works Randomly Elasticsearch	1	458	July 6, 2017

ES snapshots

Related topics