Elasticsearch Missing Data


(Eric Luellen) #1

Hello,

I've had my elasticsearch instance running for about a week with no issues,
but last night it stopped working. When I went to look in Kibana, it stops
logging around 20:45 on 1/7/14. I then restarted the service on both both
elasticsearch servers and it started logging again and back pulled some
logs from 07:10 that morning, even though I restarted the service around
10:00. So my questions are:

  1. Why did it stop working? I don't see any obvious errors.
  2. When I restarted it, why didn't it go back and pull all of the data and
    not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "my-elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 40,
"active_shards" : 80,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

Are there any additional queries or logs I can look at to see what is going
on?

On a slight side note, when I restarted my 2nd elasticsearch server it
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't
creating the node name correctly or putting the data files in the spot I
have configured. I'm using CentOS and doing everything via
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

a couple of things:

  1. Did you check the log files? Most likely in /var/log/elasticsearch if
    you use the packages. Is there anything suspicious at the time of your
    outage? Please check your master node as well, if you have one (not sure if
    it is a master or client node from the cluster health).
  2. Why should elasticsearch pull your data? Any special configuration you
    didnt mention? Or what exactly do you mean here?
  3. Happy to debug your issue with the init script. The elasticsearch.yml
    file should be in /etc/elasticsearch/ and not in /etc - anything manually
    moved around? Can you still reproduce it?

--Alex

On Wed, Jan 8, 2014 at 8:10 PM, Eric Luellen eric.luellen@gmail.com wrote:

Hello,

I've had my elasticsearch instance running for about a week with no
issues, but last night it stopped working. When I went to look in Kibana,
it stops logging around 20:45 on 1/7/14. I then restarted the service on
both both elasticsearch servers and it started logging again and back
pulled some logs from 07:10 that morning, even though I restarted the
service around 10:00. So my questions are:

  1. Why did it stop working? I don't see any obvious errors.
  2. When I restarted it, why didn't it go back and pull all of the data and
    not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "my-elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 40,
"active_shards" : 80,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

Are there any additional queries or logs I can look at to see what is
going on?

On a slight side note, when I restarted my 2nd elasticsearch server it
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't
creating the node name correctly or putting the data files in the spot I
have configured. I'm using CentOS and doing everything via
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8EOWdC5esVkfZ5hogocQkgreJBQUbF2zE7s-gGCt4NdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Eric Luellen) #3

Alexander,

  1. The only odd log entry was at 19:00 on 1/7/14, which was about 1 hr.
    before logs stopped. These logs are on the master and She-Hulk is the only
    other node.

[2014-01-07 19:00:02,947][DEBUG][indices.recovery ] [ElasticSearch
Server1] [logstash-2014.01.08][0] recovery completed from
[She-Hulk][_MtrVsSmQIaM-BErhEtg9w][inet[/10.1.11.111:9300]], took[333ms]
phase1: recovered_files [1] with total_size of [71b], took [68ms],
throttling_wait [0s]
: reusing_files [0] with total_size of [0b]
phase2: start took [13ms]
: recovered [17] transaction log operations, took [12ms]
phase3: recovered [0] transaction log operations, took [164ms]
[2014-01-07 19:00:03,375][DEBUG][indices.recovery ] [ElasticSearch
Server1] [logstash-2014.01.08][2] recovery completed from
[She-Hulk][_MtrVsSmQIaM-BErhEtg9w][inet[/10.1.11.111:9300]], took[502ms]
phase1: recovered_files [1] with total_size of [71b], took [30ms],
throttling_wait [0s]
: reusing_files [0] with total_size of [0b]
phase2: start took [6ms]
: recovered [6] transaction log operations, took [38ms]
phase3: recovered [13] transaction log operations, took [20ms]
[2014-01-07 19:00:06,898][INFO ][cluster.metadata ] [ElasticSearch
Server1] [logstash-2014.01.08] update_mapping [logs] (dynamic)

Also, on She-Hulk I got an error stating that the master_left at 20:52
because it wasn't pingable, but not sure why.

2.I am not sure. I was thinking that the shard should still be there but
just unassigned and once it came back up, it'd start processing it.
3. On both my master and my 2ndary, the config is in
/etc/elasticsearch/elasticsearch.yml and it is ran by
/etc/init.d/elasticsearch. On the master, it works fine and make the
correct node name, cluster name, data directory, etc. It is an identical
setup on the 2ndary but it only grabs the cluster name. Everything else
defaults to some other location.On the secondary, the only data location is
in /var/lib/elasticsearch/node-name. In the config I tell it to go to
/etc/elasticsearch/data. On the master it is in the correct location of
/etc/elasticsearch/data.

So overall, I guess the first issue was something weird happened to my
server and not much I can do about that. I'm more interested in the 3rd
question now since I still don't know why it's not reading that full config
file but obviously part of it since it's part of my cluster.

On Thursday, January 9, 2014 3:30:40 AM UTC-5, Alexander Reelsen wrote:

Hey,

a couple of things:

  1. Did you check the log files? Most likely in /var/log/elasticsearch if
    you use the packages. Is there anything suspicious at the time of your
    outage? Please check your master node as well, if you have one (not sure if
    it is a master or client node from the cluster health).
  2. Why should elasticsearch pull your data? Any special configuration you
    didnt mention? Or what exactly do you mean here?
  3. Happy to debug your issue with the init script. The elasticsearch.yml
    file should be in /etc/elasticsearch/ and not in /etc - anything manually
    moved around? Can you still reproduce it?

--Alex

On Wed, Jan 8, 2014 at 8:10 PM, Eric Luellen <eric.l...@gmail.com<javascript:>

wrote:

Hello,

I've had my elasticsearch instance running for about a week with no
issues, but last night it stopped working. When I went to look in Kibana,
it stops logging around 20:45 on 1/7/14. I then restarted the service on
both both elasticsearch servers and it started logging again and back
pulled some logs from 07:10 that morning, even though I restarted the
service around 10:00. So my questions are:

  1. Why did it stop working? I don't see any obvious errors.
  2. When I restarted it, why didn't it go back and pull all of the data
    and not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "my-elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 40,
"active_shards" : 80,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

Are there any additional queries or logs I can look at to see what is
going on?

On a slight side note, when I restarted my 2nd elasticsearch server it
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't
creating the node name correctly or putting the data files in the spot I
have configured. I'm using CentOS and doing everything via
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d7a0967b-1e86-4b95-a28f-d703362c992a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #4

Hey,

regarding the config file... wondering if your naming or your indentation
maybe is wrong somewhere? Can you copy the config files and make sure their
structure is the same?

--Alex

On Thu, Jan 9, 2014 at 7:01 PM, Eric Luellen eric.luellen@gmail.com wrote:

Alexander,

  1. The only odd log entry was at 19:00 on 1/7/14, which was about 1 hr.
    before logs stopped. These logs are on the master and She-Hulk is the only
    other node.

[2014-01-07 19:00:02,947][DEBUG][indices.recovery ] [ElasticSearch
Server1] [logstash-2014.01.08][0] recovery completed from
[She-Hulk][_MtrVsSmQIaM-BErhEtg9w][inet[/10.1.11.111:9300]], took[333ms]
phase1: recovered_files [1] with total_size of [71b], took [68ms],
throttling_wait [0s]
: reusing_files [0] with total_size of [0b]
phase2: start took [13ms]
: recovered [17] transaction log operations, took [12ms]
phase3: recovered [0] transaction log operations, took [164ms]
[2014-01-07 19:00:03,375][DEBUG][indices.recovery ] [ElasticSearch
Server1] [logstash-2014.01.08][2] recovery completed from
[She-Hulk][_MtrVsSmQIaM-BErhEtg9w][inet[/10.1.11.111:9300]], took[502ms]
phase1: recovered_files [1] with total_size of [71b], took [30ms],
throttling_wait [0s]
: reusing_files [0] with total_size of [0b]
phase2: start took [6ms]
: recovered [6] transaction log operations, took [38ms]
phase3: recovered [13] transaction log operations, took [20ms]
[2014-01-07 19:00:06,898][INFO ][cluster.metadata ] [ElasticSearch
Server1] [logstash-2014.01.08] update_mapping [logs] (dynamic)

Also, on She-Hulk I got an error stating that the master_left at 20:52
because it wasn't pingable, but not sure why.

2.I am not sure. I was thinking that the shard should still be there but
just unassigned and once it came back up, it'd start processing it.
3. On both my master and my 2ndary, the config is in
/etc/elasticsearch/elasticsearch.yml and it is ran by
/etc/init.d/elasticsearch. On the master, it works fine and make the
correct node name, cluster name, data directory, etc. It is an identical
setup on the 2ndary but it only grabs the cluster name. Everything else
defaults to some other location.On the secondary, the only data location is
in /var/lib/elasticsearch/node-name. In the config I tell it to go to
/etc/elasticsearch/data. On the master it is in the correct location of
/etc/elasticsearch/data.

So overall, I guess the first issue was something weird happened to my
server and not much I can do about that. I'm more interested in the 3rd
question now since I still don't know why it's not reading that full config
file but obviously part of it since it's part of my cluster.

On Thursday, January 9, 2014 3:30:40 AM UTC-5, Alexander Reelsen wrote:

Hey,

a couple of things:

  1. Did you check the log files? Most likely in /var/log/elasticsearch if
    you use the packages. Is there anything suspicious at the time of your
    outage? Please check your master node as well, if you have one (not sure if
    it is a master or client node from the cluster health).
  2. Why should elasticsearch pull your data? Any special configuration you
    didnt mention? Or what exactly do you mean here?
  3. Happy to debug your issue with the init script. The elasticsearch.yml
    file should be in /etc/elasticsearch/ and not in /etc - anything manually
    moved around? Can you still reproduce it?

--Alex

On Wed, Jan 8, 2014 at 8:10 PM, Eric Luellen eric.l...@gmail.com wrote:

Hello,

I've had my elasticsearch instance running for about a week with no
issues, but last night it stopped working. When I went to look in Kibana,
it stops logging around 20:45 on 1/7/14. I then restarted the service on
both both elasticsearch servers and it started logging again and back
pulled some logs from 07:10 that morning, even though I restarted the
service around 10:00. So my questions are:

  1. Why did it stop working? I don't see any obvious errors.
  2. When I restarted it, why didn't it go back and pull all of the data
    and not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "my-elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 40,
"active_shards" : 80,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

Are there any additional queries or logs I can look at to see what is
going on?

On a slight side note, when I restarted my 2nd elasticsearch server it
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't
creating the node name correctly or putting the data files in the spot I
have configured. I'm using CentOS and doing everything via
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d7a0967b-1e86-4b95-a28f-d703362c992a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8KiWAfFuh88oQyBWcj8yK7g%2BBSOS6WqxTMZGdSQKWBcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5