Data missing after ES restart


#1

After ES restart I can't see the old data in kibana anymore, just the data after restart and the saved dashboards, visualizes and queries are also gone. I started elasticsearch already with debug level, but there is no error.

curl localhost:9200/_cat/indices?v show also jast the data after restart
but in the data directory are the indices there

http://paste.ofcode.org/ynQ9fWbdA9zrLTmcediGGv

If you need some data, log or config please note..

restart debuglevel log:
http://pastebin.com/Fkpynu6k

Please help me!


(Ramy) #2

Did you try to restart Elasticsearch without any plugin? Probably there are problems with one of them...
Are you sure, that in your network only one cluster exists? Make sure to use a unique cluster name!


#3

Hi Ramy,

thank you for your answer. The system(linux) is a standalone localhost installation ELK is installed on the same machine. ELK is default installation downloaded from elastic.co and prepared for linux (rm all dll and exe). There is no extra plugin installed only curator what I start with cron.

best regards and thx for help


(Mark Walkom) #4

Check the data directory and make sure the data actually exists under the same node number?


#5

/opt/elasticsearch/elasticsearch/nodes/0/indices
-bash-4.1$ ls -altrh
total 124K
drwxr-xr-x. 5 elkuser elkgroup 4.0K Jul 31 18:48 logstash-2015.07.31
drwxr-xr-x. 6 elkuser elkgroup 4.0K Aug 1 02:00 logstash-2015.08.01
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 2 02:00 logstash-2015.08.02
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 3 02:00 logstash-2015.08.03
drwxr-xr-x. 4 elkuser elkgroup 4.0K Aug 3 08:39 .kibana
drwxr-xr-x. 4 elkuser elkgroup 4.0K Aug 3 10:27 logstash-2015.07.28
drwxr-xr-x. 5 elkuser elkgroup 4.0K Aug 3 10:28 logstash-2015.07.29
drwxr-xr-x. 5 elkuser elkgroup 4.0K Aug 3 10:28 logstash-2015.07.30
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 4 02:03 logstash-2015.08.04
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 5 02:00 logstash-2015.08.05
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 6 02:00 logstash-2015.08.06
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 7 02:00 logstash-2015.08.07
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 12 10:40 logstash-2015.08.12
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 13 02:00 logstash-2015.08.13
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 14 02:00 logstash-2015.08.14
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 17 13:45 logstash-2015.08.17
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 18 02:00 logstash-2015.08.18
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 19 02:00 logstash-2015.08.19
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 20 02:00 logstash-2015.08.20
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 21 02:00 logstash-2015.08.21
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 22 04:19 logstash-2015.08.22
drwxr-xr-x. 4 elkuser elkgroup 4.0K Aug 24 09:21 ..
drwxr-xr-x. 8 elkuser elkgroup 4.0K Aug 24 09:21 logstash-2015.08.24
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 25 05:28 logstash-2015.08.25
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 26 05:08 logstash-2015.08.26
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 27 02:00 logstash-2015.08.27
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 28 02:00 logstash-2015.08.28
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 29 02:00 logstash-2015.08.29
drwxr-xr-x. 7 elkuser elkgroup 4.0K Aug 31 10:15 logstash-2015.08.31
drwxr-xr-x. 31 elkuser elkgroup 4.0K Sep 1 02:00 .
drwxr-xr-x. 7 elkuser elkgroup 4.0K Sep 1 02:00 logstash-2015.09.01

-bash-4.1$ du -sh *
68K logstash-2015.07.28
136K logstash-2015.07.29
124K logstash-2015.07.30
16G logstash-2015.07.31
13G logstash-2015.08.01
15G logstash-2015.08.02
1.9G logstash-2015.08.03
14G logstash-2015.08.04
20G logstash-2015.08.05
1.7G logstash-2015.08.06
128K logstash-2015.08.07
128K logstash-2015.08.12
128K logstash-2015.08.13
280K logstash-2015.08.14
120K logstash-2015.08.17
124K logstash-2015.08.18
128K logstash-2015.08.19
316K logstash-2015.08.20
128K logstash-2015.08.21
6.0M logstash-2015.08.22
128K logstash-2015.08.24
23G logstash-2015.08.25
18G logstash-2015.08.26
30G logstash-2015.08.27
8.5G logstash-2015.08.28
8.4M logstash-2015.08.29
64K logstash-2015.08.31
267M logstash-2015.09.01


(Mark Walkom) #6

But is there anything else in there ^


#7

/opt/elasticsearch/elasticsearch/nodes
-bash-4.1$ ls -altrh
total 16K
drwxr-xr-x. 3 elkuser elkgroup 4.0K Jul 30 15:04 ..
drwxr-xr-x. 4 elkuser elkgroup 4.0K Aug 14 12:37 .
drwxr-x---. 2 elkuser elkgroup 4.0K Aug 14 12:50 1
drwxr-xr-x. 4 elkuser elkgroup 4.0K Aug 24 09:21 0
-bash-4.1$ cd 1/
-bash-4.1$ ls -altr
total 8
-rw-r-----. 1 elkuser elkgroup 0 Aug 14 12:37 node.lock
drwxr-xr-x. 4 elkuser elkgroup 4096 Aug 14 12:37 ..
drwxr-x---. 2 elkuser elkgroup 4096 Aug 14 12:50 .


(Mark Walkom) #8

Right, so you have two node directories there which means you have two nodes on the same host at some point.

I'd start two nodes on that host and then see if your data "returns".


#9

this is my ES config what i start
-bash-4.1$ cat /opt/elkapp/elasticsearch/config/elasticsearch.yml

###################################  Cluster ###################################

#  Cluster name identifies your cluster for auto-discovery. If you're running
#  multiple clusters on the same network, make sure you're using unique names.
# 
#  cluster.name: elasticsearch


####################################  Node #####################################

#  Node names are generated dynamically on startup, so you're relieved
#  from configuring them manually. You can tie this node to a specific name:
# 
node.name: "ELK Prod"

#  Every node can be configured to allow or deny being eligible as the master,
#  and to allow or deny to store the data.
# 
#  Allow this node to be eligible as a master node (enabled by default):
# 
node.master: true
# 
#  Allow this node to store data (enabled by default):
# 
node.data: true

#  You can exploit these settings to design advanced cluster topologies.
# 
#  1. You want this node to never become a master node, only to hold data.
#     This will be the "workhorse" of your cluster.
# 
#  node.master: false
#  node.data: true
# 
#  2. You want this node to only serve as a master: to not store any data and
#     to have free resources. This will be the "coordinator" of your cluster.
# 
#  node.master: true
#  node.data: false
#  
#  3. You want this node to be neither master nor data node, but
#     to act as a "search load balancer" (fetching data from nodes,
#     aggregating results, etc.)
#  
#  node.master: false
#  node.data: false

#  Use the Cluster Health API [http://localhost:9200/_cluster/health], the
#  Node Info API [http://localhost:9200/_nodes] or GUI tools
#  such as <http://www.elasticsearch.org/overview/marvel/>,
#  <http://github.com/karmi/elasticsearch-paramedic>,
#  <http://github.com/lukas-vlcek/bigdesk> and
#  <http://mobz.github.com/elasticsearch-head> to inspect the cluster state.

#  A node can have generic attributes associated with it, which can later be used
#  for customized shard allocation filtering, or allocation awareness. An attribute
#  is a simple key value pair, similar to node.key: value, here is an example:
#  
#  node.rack: rack314

#  By default, multiple nodes are allowed to start from the same installation location
#  to disable it, set the following:
#  node.max_local_storage_nodes: 1


####################################  Index ####################################

index.store.type: niofs
index.store.fs.memory.enabled: true
index.gateway.type: none
gateway.type: none

####################################  Paths ####################################

#  Path to directory containing configuration (this file and logging.yml):
#  
#  path.conf: /path/to/conf

#  Path to directory where to store index data allocated for this node.

path.data: /opt/elasticsearch
path.work: /opt/elasticsearch/tmp
path.logs: /opt/elasticsearch/logs

##############################  Network And HTTP ###############################

transport.tcp.port: 9300
transport.tcp.compress: true
http.port: 9200

##################################  Slow Log ##################################

#  Shard level query and fetch threshold logging.

#  index.search.slowlog.threshold.query.warn: 10s
#  index.search.slowlog.threshold.query.info: 5s
#  index.search.slowlog.threshold.query.debug: 2s
#  index.search.slowlog.threshold.query.trace: 500ms

#  index.search.slowlog.threshold.fetch.warn: 1s
#  index.search.slowlog.threshold.fetch.info: 800ms
#  index.search.slowlog.threshold.fetch.debug: 500ms
#  index.search.slowlog.threshold.fetch.trace: 200ms

#  index.indexing.slowlog.threshold.index.warn: 10s
#  index.indexing.slowlog.threshold.index.info: 5s
#  index.indexing.slowlog.threshold.index.debug: 2s
#  index.indexing.slowlog.threshold.index.trace: 500ms

##################################  GC Logging ################################

#  monitor.jvm.gc.young.warn: 1000ms
#  monitor.jvm.gc.young.info: 700ms
#  monitor.jvm.gc.young.debug: 400ms

#  monitor.jvm.gc.old.warn: 10s
#  monitor.jvm.gc.old.info: 5s
#  monitor.jvm.gc.old.debug: 2s

#  
#  http.jsonp.enable: true
http.cors.allow-origin: "/.*/"
http.cors.enabled: true

(Mark Walkom) #10

It's better if you can quote that in code tags to make it easier to read. Also, why aren't you using a deb/rpm install?

But, I'd still do what I suggested and see what happens.


#11

i can't use deb/rpm because policy.... and i need every time patterns, config and cert deliver with the pkg....


(Mark Walkom) #12

Let us know how it goes with starting up that other node then.


#13

there is no other node i start only one ES. with this.

#! /bin/bash
#
### BEGIN INIT INFO
# Provides:          elasticsearch
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: elasticsearch Server 
# Description:       Enable elasticsearch Server  provided by daemon.
### END INIT INFO

# Source function library.
. /etc/rc.d/init.d/functions


PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="elasticsearch Server "
NAME=elasticsearch
ES_HOME=/opt/app/$NAME
PIDFILE=/var/run/$NAME.pid
elasticsearch_home="/opt/app/$NAME"
elasticsearch_bin="${elasticsearch_home}/bin/$NAME"
elasticsearch_log="/opt/$NAME/$NAME.log"
DATA_DIR=/opt/$NAME
WORK_DIR=/$DATA_DIR/tmp
LOG_DIR=$DATA_DIR/logs
CONFIG_FILE=$ES_HOME/config/elasticsearch.yml
HOST=$(hostname)
case "$HOST" in
          testhost01)
                                export ES_HEAP_SIZE="16g"
                                export NODENAME="elastic_test"
          ;;
                  prodhost01)
                                export ES_HEAP_SIZE="30g"
                                export NODENAME="elastic_prod"
                  ;;
                  *)
                  echo "Please define HEAP parameter for $HOST"
                  ;;
esac
DAEMON=${elasticsearch_home}/bin/$NAME
DAEMON_ARGS="-d -Des.index.store.type=memory --node.name=$NODENAME -Des.config=$CONFIG_FILE -Des.path.home=$ES_HOME -Des.path.logs=$LOG_DIR -Des.logger.level=DEBUG -Des.path.data=$DATA_DIR -Des.path.work=$WORK_DIR"
#DAEMON_ARGS="-d -Des.index.store.type=memory --node.name=$NODENAME -Des.config=$CONFIG_FILE -Des.path.home=$ES_HOME -Des.path.logs=$LOG_DIR -Des.path.data=$DATA_DIR -Des.path.work=$WORK_DIR"
DAEMON_USER=elkuser


# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0



start() {
        echo -n $"Starting $NAME: "
        daemon --user="$DAEMON_USER" --pidfile="$PIDFILE" "$DAEMON $DAEMON_ARGS" >/dev/null 2>&1
        RETVAL=$?
        pid=$(ps -ef | grep $NAME | grep $NAME.yml | grep -v "grep"| awk '{print $2}')
        if [ -n "$pid" ]; then
                echo $pid > "$PIDFILE"
        fi
        echo
        return $RETVAL
}
stop() {
     echo -n $"Stopping $NAME: "
     if [ -f $PIDFILE ];then
        if [ $(ps -ef | grep $(echo $PIDFILE) | wc -l) -gt 0 ];then
        pid=$(ps -ef | grep $NAME | grep $NAME.yml | grep -v "grep"| awk '{print $2}')
            if [ -n "$pid" ]; then
                echo $pid > "$PIDFILE"
            fi
        else
           echo "No process found!"
           exit 0;
        fi
     else
       echo "Pid not found!"
           exit 0;
     fi
        killproc -p "$PIDFILE" -d 10 "$DAEMON"
        RETVAL="$?"
        echo
        [ $RETVAL = 0 ] && rm -f "$PIDFILE"
        return "$RETVAL"
}

case "$1" in
  start)
        start
        ;;
  stop)
        stop
        ;;
  restart)
        stop
        start
        ;;
  status)
        status -p $PIDFILE "$NAME"
                ;;
  *)
        echo "Usage: $NAME {start|stop|restart}" >&2
        exit 1
        ;;
esac

exit $RETVAL

(Mark Walkom) #14

There was. See how there is a 0 and 1 there, that means there was two nodes running on that host at some point.


#15

1 is empty there is no data, can i remove the dir?

/opt/elasticsearch/elasticsearch/nodes/1
-rw-r-----. 1 elkuser elkgroup 0 Aug 14 12:37 node.lock
drwxr-xr-x. 4 elkuser elkgroup 4096 Aug 14 12:37 ..
drwxr-x---. 2 elkuser elkgroup 4096 Aug 14 12:50 .


(Mark Walkom) #16

If it's empty, yes.


#17

ok then how can it be that all of the data is in /opt/elasticsearch/elasticsearch/nodes/0 there but if i ask over localhost:9200 then ES doesn't see anything just the data what the current running process saved.

if i restart the ES the created default index pattern is gone [logstash-]YYYY.MM.DD. and if i create again after restart i cant see the old data.


#18

hi i changed now the elasticsearch.yml a bit and now i got some change in result of curl query

curl 'localhost:9200/_cat/indices?v'
health status index               pri rep docs.count docs.deleted store.size pri.store.size
red    open   logstash-2015.08.02   5   1          0            0       576b           576b
red    open   logstash-2015.07.29   5   1          0            0       288b           288b
yellow open   logstash-2015.08.21   5   1          0            0       720b           720b
yellow open   logstash-2015.08.06   5   1          0            0       720b           720b
yellow open   logstash-2015.08.04   5   1          0            0       720b           720b
yellow open   logstash-2015.08.14   5   1          0            0       720b           720b
yellow open   logstash-2015.08.22   5   1          0            0       720b           720b
red    open   logstash-2015.08.03   5   1          0            0       576b           576b
yellow open   logstash-2015.08.18   5   1          0            0       720b           720b
green  open   logstash-2015.09.02   5   0     318131            0    232.9mb        232.9mb
yellow open   logstash-2015.08.20   5   1          0            0       720b           720b
yellow open   logstash-2015.08.07   5   1          0            0       720b           720b
red    open   logstash-2015.07.30   5   1          0            0       288b           288b
yellow open   logstash-2015.08.05   5   1          0            0       720b           720b
red    open   logstash-2015.08.01   5   1          0            0       432b           432b
yellow open   logstash-2015.08.13   5   1          0            0       720b           720b
yellow open   logstash-2015.08.12   5   1          0            0       720b           720b
red    open   logstash-2015.07.28   5   1          0            0       144b           144b
yellow open   logstash-2015.08.17   5   1          0            0       720b           720b
yellow open   .kibana               1   1          2            0     13.3kb         13.3kb
yellow open   logstash-2015.08.19   5   1          0            0       720b           720b
yellow open   logstash-2015.08.24   5   1          0            0       720b           720b
red    open   logstash-2015.07.31   5   1          0            0       288b           288b

(Mark Walkom) #19

What did you change?


#20
index.number_of_replicas: 0
and

# Shard level query and fetch threshold logging.
#
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
#
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
#
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
#
################## GC Logging ################
#
monitor.jvm.gc.young.warn: 1000ms
monitor.jvm.gc.young.info: 700ms
monitor.jvm.gc.young.debug: 400ms
#
monitor.jvm.gc.old.warn: 10s
monitor.jvm.gc.old.info: 5s
monitor.jvm.gc.old.debug: 2s